Feed Scraping / Scrapers « MAD KANE'S HUMOR BLOG

Archive for the ‘Feed Scraping / Scrapers’ Category

Victory In My Battle Against Feed Scraping Content Thief 4Comedy.com (Updated)

Monday, June 11th, 2007

Last week when I was bitching about 4Comedy.com’s stealing my humor blog’s content, I had no idea I was dealing with a “feed scraper” site. All I knew was that I was being ripped off — every time I posted in this humor blog, my entire post appeared on 4Comedy.com within minutes. Needless to say, I was a very pissed off blogger.

So what did I do? First I posted comments at 4Comedy.com demanding that its thievery stop. At least I tried to. But not surprisingly, my comments never appeared there.

Next I reported its copyright violations to Google AdSense, via a link at 4comedy.com’s site. The following day I received an email telling me how to formally report a Google AdSense DMCA (Digital Millennium Copyright Act) Infringement Complaint.

Unfortunately, the procedure involved tons of time-consuming work, requiring me to assemble all sorts of documentation of 4Comedy.com’s many infringements. Then (I’m told) this documentation is forwarded to the alleged infringer. And after that, heaven-only-knows-what happens.

I responded to Google’s email with a request for some accelerated action. My justification was that in this post http://4comedy.com/?p=463, 4comedy.com had reproduced this post, in which I called it a content thief.

I didn’t receive any response to my email, but apparently it got their attention. How do I know? Because today the Google AdSense text ads disappeared from the top of 4Comedy.com’s site. Hallelujah!

But I’m getting ahead of myself. While I was waiting to hear from Google, I did some research on how to deal with RSS feed scraping content thieves. And I found some great resources, including:

1. AntiLeech, a plugin that “helps prevent content theft by sploggers” and a detailed article explaining the benefits of AntiLeech Splog Stopper: Fighting Back Against Content Thieves;

2. An interesting article with the enticing title How to stop rss scrapers from stealing your content plus revenge;

3. A tutorial, Blocking bad bots and site rippers (aka offline browsers);

4. An article entitled How you can stop dirty feedscrapers in 3 easy steps; and

5. This article about Attacking scrapers and content thieves legally.

I found the material posted at all of those links very informative, and I’m planning to give that AntiLeech plugin a try. But I was feeling a bit lazy and I was looking for some instant gratification. And, happily, I found it: A commenter named Robert posted the following suggestion here:

Alternatively, to curl up even less unproductive work, add this line to .htaccess:
Deny from 74.52.58.162
Which would even allow you to block a whole range of IP addresses in case it proves necessary…

Armed with this simple-sounding solution, I decided to try it. My first step was to identify 4Comedy.com’s IP. So I checked its trackback data, which identified its IP as 74.53.110.146. I then confirmed the IP number by pinging 4comedy.com, using my computer’s Run function: ping 4comedy.com. Next I checked my logs and verified that 4comedy.com’s IP was routinely showing up there.

Now that I had the infringer’s IP, I added this code to my .htaccess file:

Deny from 74.53.110.146

Finally, I FTPed the revised .htaccess file and, like magic, 4Comedy.com’s content thievery came to a halt: MadKane.com has been freed from the slings and arrows of 4Comedy.com’s feed scraping infringements.

Of course, that low-life feed scraper is still taking material from other sites like Comedy Central. Hey Comedy Central! Try this. You’ll like it.

UPDATE WITH ADDITIONAL RESOURCES: As I learn about additional good resources on this topic, I’ll be adding them here. Feel free to make suggestions via a comment to this post.

a: What To Do When Someone Steals Your Content is a must read.

b: Dnsstuff.com is a source of many fine tools, including a “DNS Lookup” tool — helpful in ascertaining IP addresses.

c. Plagiarism Today is an excellent source of information about plagiarism, content theft, and copyright issues online.

UPDATE 2: 4Comedy.com seems to have disappeared. I can only hope it stays that way.

UPDATE 3: 4Comedy.com is now resolving to a different domain — domainnamesbusiness.info/4comedy/index.php. But the IP is the same, so my blog should still be protected from its feed scraping.

Tags: 4Comedy.com, AntiLeech Plugin, Bad Bots, Content Thief, Copyright Violations, Digital Millenneum Copyright Act, DMCA, Feed Scrapers, Feed Scraping / Scrapers, Google AdSense, htaccess File, Plagiarism, RSS Feed Scraping, SEO
Posted in Copyright Infringement, Feed Scraping / Scrapers | 23 Comments »

MAD KANE'S HUMOR BLOG

Archive for the ‘Feed Scraping / Scrapers’ Category

FORTNIGHTLY LIMERICK & HUMOR NEWSLETTER

Author

Essentials

Pages

Archives

Categories

Elsewhere On MadKane