Archive for the ‘Feed Scraping / Scrapers’ Category

Victory In My Battle Against Feed Scraping Content Thief 4Comedy.com (Updated)

Monday, June 11th, 2007

Last week when I was bitching about 4Comedy.com’s stealing my humor blog’s content, I had no idea I was dealing with a “feed scraper” site. All I knew was that I was being ripped off — every time I posted in this humor blog, my entire post appeared on 4Comedy.com within minutes.  Needless to say, I was a very pissed off blogger.

So what did I do?  First I posted comments at 4Comedy.com demanding that its thievery stop.  At least I tried to.  But not surprisingly, my comments never appeared there.

Next I reported its copyright violations to Google AdSense, via a link at 4comedy.com’s site.  The following day I received an email telling me how to formally report a Google AdSense DMCA (Digital Millennium Copyright Act) Infringement Complaint. 

Unfortunately, the procedure involved tons of time-consuming work, requiring me to assemble all sorts of documentation of 4Comedy.com’s  many infringements.   Then (I’m told) this documentation is forwarded to the alleged infringer.  And after that, heaven-only-knows-what happens.

I responded to Google’s email with a request for some accelerated action.  My justification was that in this post http://4comedy.com/?p=463, 4comedy.com had reproduced this post, in which I called it a content thief.

I didn’t receive any response to my email, but apparently it got their attention.  How do I know?  Because today the Google AdSense text ads disappeared from the top of 4Comedy.com’s site.  Hallelujah!

But I’m getting ahead of myself. While I was waiting to hear from Google, I did some research on how to deal with RSS feed scraping content thieves.  And I found some great resources,  including:

1. AntiLeech, a plugin that “helps prevent content theft by sploggers” and  a detailed article explaining the benefits of AntiLeech Splog Stopper: Fighting Back Against Content Thieves;

2. An interesting article with the enticing title How to stop rss scrapers from stealing your content plus revenge; 

3. A tutorial, Blocking bad bots and site rippers (aka offline browsers);

4. An article entitled How you can stop dirty feedscrapers in 3 easy steps; and

5. This article about Attacking scrapers and content thieves legally.

I found the material posted at all of those links very informative, and I’m planning to give that AntiLeech plugin a try.  But I was feeling a bit lazy and I was looking for some instant gratification.  And, happily, I found it: A commenter named Robert posted the following suggestion here:

Alternatively, to curl up even less unproductive work, add this line to .htaccess:
Deny from 74.52.58.162
Which would even allow you to block a whole range of IP addresses in case it proves necessary…

Armed with this simple-sounding solution, I decided to try it.  My first step was to identify 4Comedy.com’s IP.  So I checked its trackback data, which identified its IP as  74.53.110.146.  I then confirmed the IP number by pinging 4comedy.com, using my computer’s Run function:  ping 4comedy.com.  Next I checked my logs and verified that 4comedy.com’s IP was routinely showing up there.

Now that I had the infringer’s IP, I added this code to my .htaccess file:

Deny from 74.53.110.146

Finally, I FTPed the revised .htaccess file and, like magic, 4Comedy.com’s content thievery came to a halt: MadKane.com has been freed from the slings and arrows of 4Comedy.com’s feed scraping infringements.

Of course, that low-life feed scraper is still taking material from other sites like Comedy Central. Hey Comedy Central!  Try this. You’ll like it.

UPDATE WITH ADDITIONAL RESOURCES: As I learn about additional good resources on this topic, I’ll be adding them here.  Feel free to make suggestions via a comment to this post.

a: What To Do When Someone Steals Your Content is a must read.

b: Dnsstuff.com is a source of many fine tools, including a “DNS Lookup” tool — helpful in ascertaining IP addresses.

c. Plagiarism Today is an excellent source of information about plagiarism, content theft, and copyright issues online.

UPDATE 2: 4Comedy.com seems to have disappeared.  I can only hope it stays that way. 

UPDATE 3: 4Comedy.com is now resolving to a different domain — domainnamesbusiness.info/4comedy/index.php. But the IP is the same, so my blog should still be protected from its feed scraping.