Home | Search Engines
The Robots Exclusion Protocol or robots.txt protocol is a convention to prevent cooperating web spiders and other web robots from accessing all or part of a website which is, otherwise, publicly viewable. Robots are often used by search engines to categorize and archive web sites, or by webmasters to proofread source code. A robots.txt file on a website will function as a request that specified robots ignore specified files or directories in their search. This might be, for example, out of a preference for privacy from search engine results, or the belief that the content of the selected directories might be misleading or irrelevant to the categorization of the site as a whole, or out of a desire that an application only operate on certain data. The protocol, however, is purely advisory. It relies on the cooperation of the web robot, so that marking an area of a site out of bounds with robots.txt does not guarantee privacy. Some web site administrators have tried to use the robots file to make private parts of a website invisible to the rest of the world, but the file is necessarily publicly available and its content is easily checked by anyone with a web browser. Here are a few examples: This example allows all robots to visit all files because the wildcard "*" specifies all robots and the Disallow is blank: User-agent: * Disallow: This example keeps all robots out because the Disallow points to the root directory: User-agent: * Disallow: / The next is an example that tells all crawlers not to enter into four directories of a website: User-agent: * Disallow: /cgi-bin/ Disallow: /images/ Disallow: /tmp/ Disallow: /private/ This example tells a specific crawler not to enter one specific directory: User-agent: GoogleBot Disallow: /private/ Example that tells all crawlers not to enter one specific file: User-agent: * Disallow: /directory/file.html Note that all other files in the specified directory will be processed. Example demonstrating how comments can be used - everything after the number sign (#) will be ignored by the robot: # Comments appear after the "#" symbol at the start of a line, or after a directive User-agent: * # match all bots Disallow: / # keep them out
Article Source: http://www.seoarticleexchange.com
Contributed by the Aquo Marketing Team.
Comment on this Article
Rate this Article
5 out of 54 out of 53 out of 52 out of 51 out of 5
Not yet Rated
Article Dashboard Installation by Aquo