Archive for the ‘Feed Scraping / Scrapers’ Category

Victory In My Battle Against Feed Scraping Content Thief (Updated)

Monday, June 11th, 2007

Last week when I was bitching about’s stealing my humor blog’s content, I had no idea I was dealing with a “feed scraper” site. All I knew was that I was being ripped off — every time I posted in this humor blog, my entire post appeared on within minutes.  Needless to say, I was a very pissed off blogger.

So what did I do?  First I posted comments at demanding that its thievery stop.  At least I tried to.  But not surprisingly, my comments never appeared there.

Next I reported its copyright violations to Google AdSense, via a link at’s site.  The following day I received an email telling me how to formally report a Google AdSense DMCA (Digital Millennium Copyright Act) Infringement Complaint. 

Unfortunately, the procedure involved tons of time-consuming work, requiring me to assemble all sorts of documentation of’s  many infringements.   Then (I’m told) this documentation is forwarded to the alleged infringer.  And after that, heaven-only-knows-what happens.

I responded to Google’s email with a request for some accelerated action.  My justification was that in this post, had reproduced this post, in which I called it a content thief.

I didn’t receive any response to my email, but apparently it got their attention.  How do I know?  Because today the Google AdSense text ads disappeared from the top of’s site.  Hallelujah!

But I’m getting ahead of myself. While I was waiting to hear from Google, I did some research on how to deal with RSS feed scraping content thieves.  And I found some great resources,  including:

1. AntiLeech, a plugin that “helps prevent content theft by sploggers” and  a detailed article explaining the benefits of AntiLeech Splog Stopper: Fighting Back Against Content Thieves;

2. An interesting article with the enticing title How to stop rss scrapers from stealing your content plus revenge; 

3. A tutorial, Blocking bad bots and site rippers (aka offline browsers);

4. An article entitled How you can stop dirty feedscrapers in 3 easy steps; and

5. This article about Attacking scrapers and content thieves legally.

I found the material posted at all of those links very informative, and I’m planning to give that AntiLeech plugin a try.  But I was feeling a bit lazy and I was looking for some instant gratification.  And, happily, I found it: A commenter named Robert posted the following suggestion here:

Alternatively, to curl up even less unproductive work, add this line to .htaccess:
Deny from
Which would even allow you to block a whole range of IP addresses in case it proves necessary…

Armed with this simple-sounding solution, I decided to try it.  My first step was to identify’s IP.  So I checked its trackback data, which identified its IP as  I then confirmed the IP number by pinging, using my computer’s Run function:  ping  Next I checked my logs and verified that’s IP was routinely showing up there.

Now that I had the infringer’s IP, I added this code to my .htaccess file:

Deny from

Finally, I FTPed the revised .htaccess file and, like magic,’s content thievery came to a halt: has been freed from the slings and arrows of’s feed scraping infringements.

Of course, that low-life feed scraper is still taking material from other sites like Comedy Central. Hey Comedy Central!  Try this. You’ll like it.

UPDATE WITH ADDITIONAL RESOURCES: As I learn about additional good resources on this topic, I’ll be adding them here.  Feel free to make suggestions via a comment to this post.

a: What To Do When Someone Steals Your Content is a must read.

b: is a source of many fine tools, including a “DNS Lookup” tool — helpful in ascertaining IP addresses.

c. Plagiarism Today is an excellent source of information about plagiarism, content theft, and copyright issues online.

UPDATE 2: seems to have disappeared.  I can only hope it stays that way. 

UPDATE 3: is now resolving to a different domain — But the IP is the same, so my blog should still be protected from its feed scraping.