From another perspective here’s my two penneth worth:
A spider, also known as a robot (bot) or a crawler, is actually just a program that follows, or "crawls", links throughout the Internet, grabbing content from sites and adding it to search engine indexes. It was designed and is supposed to be a good useful tool to incorporate into any website, particularly ones which has content is constantly changing or being updated (like forums), it’s a free way of “spreading the word”. But like everything these days it has been exploited, so there is a downside…but actually only a small one if you know how to use them. This list shows the breakdown of all internet use, only 7% of all the internet use is generally regarded as bad (a bit like the humans that use it I suppose).
49% real people traffic = mostly v important lol
51% non-human – bot/spider traffic = mostly v important
The breakdown of that 51% is
20% site monitoring = good
5% scrapers = web harvesting or web data extraction = good and bad
19% spy tools = good and bad
5% attempted hacking = bad
2% spammers = bad
Spiders follow links from one page to another and from one site to another. That is the primary reason why links to your site (inbound links) are so important. Links to your website from other websites will give the search engine spider more "food" to chew on. The more times they find links to your site, the more times they will stop by and visit a site. Other uses include making sure the page is still up, and that the content's topic is still the same etc. Google especially relies on its spiders to create their vast index of listings. So the older site with a larger database/more inbound links distributed across the web will usually have more spiders/bots seen crawling it at any one time.
For UK based sites, probably the most unnecessary spider to have is the Baidu spider, this is a bot of Baidu Chinese search engine, China’s equivalent to Google, but it still has it’s uses as all the spiders/bots are constantly sharing information. The most important are the Googlebot, MSNbot, and Yahoo Slurp.
As Leighton says, it’s upto the webmaster of the site, weather you take advantage of the bots/spiders, you can choose to turn most of them off (the official ones have to comply), the most "reputable" spiders will obey a directive given by the robots.txt file. This file is the txt document listed in your website’s root dir, that tells spiders what they may and may not index. You can also instruct spiders not to follow one specific link by placing other commands in the root dir, this will reduce the outgoing number of links and help you to maintain your pagerank.
So you make your decision and take your chances.
Don't mean to stick my nose in, just trying to elaborate on Leighton's explanation ......