Bots I have known
I administer a number of websites. Some of those sites occasionally get a high volume of web traffic from people, but I’m noticing that I’m getting more and more traffic from web crawlers. Those crawlers can be damn annoying, because:
- They don’t keep cookies. That’s right, even though they are browsing, they ignore cookies. Meaning that any attempt I make to identify a user with a cookie fails on these bots.
- They whack my sites from multiple IP addresses, meaning that I can’t identify the bots by IP. Google is particularly bad for this one.
So far, I’ve seen (as identified by their bot homes):
http://www.turnitin.com/robot/crawlerinfo.html
http://www.almaden.ibm.com/cs/crawler
http://www.google.com/bot.htm
http://sp.ask.com/docs/about/tech_crawling.html
http://www.dnsgroup.com/
http://help.yahoo.com/help/us/ysearch/slurp
http://www.picsearch.com/bot.html
Update The following bots have gotten in on the action:
LinkWalker (No URL provided)
http://search.msn.com/msnbot.htm
