Bots I have known

I administer a number of websites. Some of those sites occasionally get a high volume of web traffic from people, but I’m noticing that I’m getting more and more traffic from web crawlers. Those crawlers can be damn annoying, because:

  • They don’t keep cookies. That’s right, even though they are browsing, they ignore cookies. Meaning that any attempt I make to identify a user with a cookie fails on these bots.
  • They whack my sites from multiple IP addresses, meaning that I can’t identify the bots by IP. Google is particularly bad for this one.

So far, I’ve seen (as identified by their bot homes):

http://www.turnitin.com/robot/crawlerinfo.html

http://www.almaden.ibm.com/cs/crawler

http://www.google.com/bot.htm

http://sp.ask.com/docs/about/tech_crawling.html

http://www.dnsgroup.com/

http://help.yahoo.com/help/us/ysearch/slurp

http://www.picsearch.com/bot.html

Update The following bots have gotten in on the action:
LinkWalker (No URL provided)

http://search.msn.com/msnbot.htm

Reply

You can use these HTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

If your website is claim enabled, it will be notified that you have posted here.