Victory In My Battle Against Feed Scraping Content Thief 4Comedy.com (Updated)

Last week when I was bitching about 4Comedy.com’s stealing my humor blog’s content, I had no idea I was dealing with a “feed scraper” site. All I knew was that I was being ripped off — every time I posted in this humor blog, my entire post appeared on 4Comedy.com within minutes.  Needless to say, I was a very pissed off blogger.

So what did I do?  First I posted comments at 4Comedy.com demanding that its thievery stop.  At least I tried to.  But not surprisingly, my comments never appeared there.

Next I reported its copyright violations to Google AdSense, via a link at 4comedy.com’s site.  The following day I received an email telling me how to formally report a Google AdSense DMCA (Digital Millennium Copyright Act) Infringement Complaint. 

Unfortunately, the procedure involved tons of time-consuming work, requiring me to assemble all sorts of documentation of 4Comedy.com’s  many infringements.   Then (I’m told) this documentation is forwarded to the alleged infringer.  And after that, heaven-only-knows-what happens.

I responded to Google’s email with a request for some accelerated action.  My justification was that in this post http://4comedy.com/?p=463, 4comedy.com had reproduced this post, in which I called it a content thief.

I didn’t receive any response to my email, but apparently it got their attention.  How do I know?  Because today the Google AdSense text ads disappeared from the top of 4Comedy.com’s site.  Hallelujah!

But I’m getting ahead of myself. While I was waiting to hear from Google, I did some research on how to deal with RSS feed scraping content thieves.  And I found some great resources,  including:

1. AntiLeech, a plugin that “helps prevent content theft by sploggers” and  a detailed article explaining the benefits of AntiLeech Splog Stopper: Fighting Back Against Content Thieves;

2. An interesting article with the enticing title How to stop rss scrapers from stealing your content plus revenge; 

3. A tutorial, Blocking bad bots and site rippers (aka offline browsers);

4. An article entitled How you can stop dirty feedscrapers in 3 easy steps; and

5. This article about Attacking scrapers and content thieves legally.

I found the material posted at all of those links very informative, and I’m planning to give that AntiLeech plugin a try.  But I was feeling a bit lazy and I was looking for some instant gratification.  And, happily, I found it: A commenter named Robert posted the following suggestion here:

Alternatively, to curl up even less unproductive work, add this line to .htaccess:
Deny from 74.52.58.162
Which would even allow you to block a whole range of IP addresses in case it proves necessary…

Armed with this simple-sounding solution, I decided to try it.  My first step was to identify 4Comedy.com’s IP.  So I checked its trackback data, which identified its IP as  74.53.110.146.  I then confirmed the IP number by pinging 4comedy.com, using my computer’s Run function:  ping 4comedy.com.  Next I checked my logs and verified that 4comedy.com’s IP was routinely showing up there.

Now that I had the infringer’s IP, I added this code to my .htaccess file:

Deny from 74.53.110.146

Finally, I FTPed the revised .htaccess file and, like magic, 4Comedy.com’s content thievery came to a halt: MadKane.com has been freed from the slings and arrows of 4Comedy.com’s feed scraping infringements.

Of course, that low-life feed scraper is still taking material from other sites like Comedy Central. Hey Comedy Central!  Try this. You’ll like it.

UPDATE WITH ADDITIONAL RESOURCES: As I learn about additional good resources on this topic, I’ll be adding them here.  Feel free to make suggestions via a comment to this post.

a: What To Do When Someone Steals Your Content is a must read.

b: Dnsstuff.com is a source of many fine tools, including a “DNS Lookup” tool — helpful in ascertaining IP addresses.

c. Plagiarism Today is an excellent source of information about plagiarism, content theft, and copyright issues online.

UPDATE 2: 4Comedy.com seems to have disappeared.  I can only hope it stays that way. 

UPDATE 3: 4Comedy.com is now resolving to a different domain — domainnamesbusiness.info/4comedy/index.php. But the IP is the same, so my blog should still be protected from its feed scraping.

Tags: , , , , , , , , , , , ,

23 Responses to “Victory In My Battle Against Feed Scraping Content Thief 4Comedy.com (Updated)”

  1. Lorelle says:

    Good on you to put a stop to this. Too few let it slide thinking there is nothing you can do. I also recommend What Do You Do When Someone Steals Your Content to add to your list of resources. Good work.

  2. markbnj says:

    Great Job Mad!

    glad it worked so simply.

    And I’m almost amazed someone like you knows what an FTP upload is..

    *hey I know because I’ve been in technology for 30 years!*

  3. bvllets says:

    Thanks for the help! You rock :)

  4. madkane says:

    Lorelle, thanks for your kind words and your resource suggestion. I’ve updated my post with a link to that excellent article and I welcome more resource suggesions.

    Thanks markbnj. I can hardly believe that I know that stuff myself because I’m a recovering technophobe. But like that say, necessity is the mother of … uh … prevention.

    Thanks bvllets. I was happy to help and am glad that your content has now been blocked from the evil 4Comedy.com.

  5. steve says:

    Am I crazy? This does not seem like stealing at all.. Its RSS..
    RSS = Really Simple SYNDICATION….
    Would your problem not be solved by not posting your full story to RSS like most sites do. Only provide a description. Problem solved and you get lots of links to your articles.

  6. Robert Wetzlmayr says:

    Mad, glad you found my comment useful.

    I’ve employed this simple “Deny from xxx” .htaccess trick with great results for all of the most annoying scrapers at my site, and it still is my favourite tool, as I don’t want to invest piles of labour just to get rid of those s*ckers.

    There’s a bunch of web based tools dedicated to resolving a certain host’s IP address,too, just in case ping requests are not honoured by the offending party. I use DNSstuff’s “DNS Lookup” tool.

  7. madkane says:

    Steve says:

    “Am I crazy? This does not seem like stealing at all.. Its RSS..
    RSS = Really Simple SYNDICATION…. Would your problem not be solved by not posting your full story to RSS like most sites do. Only provide a description. Problem solved and you get lots of links to your articles.”

    Steve, using RSS feeds (a necessity these days in order to have a successful blog) grants people permission to read your content through a reader. It does not grant republication permission, let alone permit other blogs to replicate the entire contents of your blog.

    Moreover, in my case most of my posts are very short humor pieces that don’t lend themselves to summaries or excerpts.

    Also, in my opinion, links from spammy sites like that have minimal value. And the duplicate content problems such sites cause can create search engine results problems, as well.

  8. madkane says:

    Robert said:
    “Mad, glad you found my comment useful. I’ve employed this simple “Deny from xxx” .htaccess trick with great results for all of the most annoying scrapers at my site, and it still is my favourite tool, as I don’t want to invest piles of labour just to get rid of those s*ckers.”

    Thanks so much, Robert. Your trick’s great and a lot easier than other suggestions I ran across. Fast and easy — I love it!

    And thanks for the DNS info resource, which I’ve added to the additional resource update of my post.

  9. Congratulations on your victory. Rest easy though that gathering the information and sending the next DMCA notice is much easier. Once you have the template in place, the process speeds up. It now takes me less than 5 minutes to do each one.

    What you described is pretty much par for the course with Adsense. They seem to be fairly on the ball, especially when there is clear cut spamming, but it does take some time.

    Hopefully they’ll stay down! Let me know if they come back as we can next file DMCA notices with their host.

  10. Paul says:

    Thank you for posting your solution to the problem. I am new at blogging and now know what to do if I have the situation you described. Glad you got it solved.

  11. Keith says:

    If you visit http://4comedy.com/ now [2007-06-12, 15:39 GMT] you get a list of files. It’s a typical WordPress install. All the directories I tried are open to listing. They really ought to fix that and deny directory listing. Until they do, happy fishing.

  12. quixote says:

    madkane, thank you, thank you, thank you. I’ve had (am having?) the same thing happen. I too got the asinine “here, spend your life assembling paperwork for us to throw out” bullshit.

    Unlike you, who found something intelligent to do about it, I just got depressed. I’ve been trying to avoid finding out whether the creeps stealing my content are still doing it or not.

    There’s one aspect of it all that doesn’t matter to anyone but me, but that doesn’t stop it from depressing me even more. I’m sure these jerks are using my content purely as coherent filler to prevent their linkfarm from being easily blocked. Real sentences work better than word salad. So, here’s all my carefully thought out and brilliant social commentary and it’s not even being stolen because it’s so good. It’s stolen purely for stuffing. Adding insult to injury!

    Well, now you’ve given me something effective to do, and I can’t tell you how much better I feel. As I said, thank you.

  13. Batocchio says:

    Cool! Congratulations, and thanks for the info!

  14. madkane says:

    Jonathan, thanks so much for the info and offer. I’ll keep it all in mind … while I’m keeping my fingers crossed that 4Comedy.com remains out of commission. In the meantime, I’ve added your site to the post’s resource list.

    Paul, you’re welcome, and thanks.

    Keith, thanks. Yes it’s very odd that they’ve left all their data hanging out there.

    Quixote, you’re very welcome. I’m so sorry to hear you’ve had similar problems and wish you luck solving them. I hope what I did works for you. Please let me know how you make out.

    Thanks Batocchio and you’re welcome.

  15. quixote says:

    My old blog used to be on blogspot, and that was the one being snaffled. I complained to the site’s ISP, Google, the works. The ISP at least answered messages, for what that was worth. Google was too busy even for that. I couldn’t have done an htaccess file. (When I saw that solution I had a big “DOH!” moment. Why didn’t I think of that? I should’ve known.)

    However, I was moving my blog to wordpress on my own site anyway when all this flared up because I didn’t care for the way all the Big Players terms of service were trending. (I.e. you own your work, but we can use it whenever we like for whatever we like.)

    So all I could think to do with the old stuff on blogspot was to replace everything with a message that the content was being stolen, and then stick my head in the sand. (It works amazingly well. I can see why the Shrub does it all the time.)

    As for my WordPress blog, after reading this, I raced right over and installed Antileech. I also felt optimistic enough to check on the scrape feeders … and, lo and behold, they’re off the web. Error 404, site not found. Woot!

  16. JayneM says:

    That’s great to know!

    I had a site that the entire thing was being stolen by a scraper. It was a ratings and reviews site with lots of images, so I was able to go into my control panel and block any off-site website from using images from my domain.

    It didn’t stop them stealing content, but it made their pages look like crap because each one had 50+ broken image links all over it. LOL

    Wow… came here because I’d searched for humor blogs (I have one too) and found some really useful information! Thanks!

  17. madkane says:

    Quixote, thanks for sharing your saga. So glad it had a happy ending!

    Jayne, it’s always nice to meet another humor blogger. So glad you found my info helpful. And thanks for sharing your image solution. :)

  18. VoIP Lowdown » Carnival of Emerging Technologies #4 says:

    […] Madeleine Begun Kane presents Victory In My Battle Against Feed Scraping Content Thief 4Comedy.com posted at Mad Kane’s Humor Blog. Madkane emerged victorious in her battle against the one of the RSS feed scraping content thieves. The issue is technical. A must read for everyone. […]

  19. Kai's personal space says:

    The Blogger’s Mardi-gras…

    Welcome to the July 2, 2007 edition of bloggers mardi gras. We have a wonderful mixed bag this month and lots of great, and interesting reading for y’all to read. Over the coming months we’re going to be going on a small blog tour of…

  20. […] Madeleine Begun Kane of Mad Kane’s Humor Blog presents Victory in My Battle Against Feed Scraping Content Thief 4Comedy.com. I recommend reading this. Feed scraping blogs are becoming increasingly common, and I believe it’s safe to say that most bloggers don’t like them. […]

  21. gary viray says:

    I used this trick you mentioned here and it worked like a charm!

    Thank you very much.

    Now, the annoying s*ckers are gone and dead. Good for them.

  22. YellowSEO says:

    I also always add links back to the original blog post and mention the websites name in it always in the content. So even id the scraper end up copying everything and getting indexed its still building links to my website till I get the the scrapping site removed.

  23. barackoli says:

    cool….great info. thanks for sharing