Guidelines on Securing Public Web Servers
A cookie is a small piece of information that may be written to the user's hard drive when he
or she visits a Web site. The intent of cookies is to allow servers to recognize a specific
browser (user). In essence, they add state to the stateless HTTP protocol. Unfortunately
cookies are usually sent in the clear and are stored in the clear on the user's host and so are
vulnerable to compromise. There are known vulnerabilities in certain versions of Internet
Explorer for example that allow a malicious Web site to remotely collect all a visitor's cookies
without the visitor's knowledge. Therefore, cookies should never contain data that can be used
directly by an attacker (e.g., username, password).
5.2.4 Controlling Web Bots Impact on Web Servers
Web bots (a.k.a., agents or spiders) are software applications used to collect, analyze and index
Web content. Web bots are used by a numerous organizations for many purposes. Some
examples are as follows:
Scooter, Slurp, and Googlebot slowly and carefully analyze, index, and record Web
sites for Web search engines such as AltaVista and Google.
ArchitextSpider gathers Internet statistics.
Hyperlink validators are used by Webmasters to automatically validate the
hyperlinks on their Web site.
EmailSiphon and Cherry Picker are bots specifically designed to crawl Web sites for
electronic mail (e mail) addresses to add to unsolicited advertising e mail ( spam )
lists. These are a common example of a bot that may have a negative impact on a
Web site or it users.
Unfortunately, bots can present a challenge to Webmasters and their servers:
Web servers often contain directories that do not need to be indexed.
Organizations might not want part of their site appearing in search engines.
Web servers often contain temporary pages that should not be indexed.
Organizations operating the Web server are paying for bandwidth and want to exclude
robots and spiders that do not benefit their goals.
Bots are not always well written or well intentioned and can hit a Web site with
extremely rapid requests, causing a reduction in or outright DoS for legitimate users.
Bots may uncover information that the Webmaster would prefer would remain secret
or at least unadvertised (e.g., e mail addresses).
Fortunately, there is a way for Web administrators or the Webmaster to influence the behavior
of most bots on their Web site. A series of agreements called the Robots Exclusion Standard
(REP) has been created. Although REP is not an official Internet standard, it is supported by
most well written and well intentioned bots, including those used by most major search
engines.
32