Easily one of the more overlooked tools in any SEO arsenal. A Robots.txt file is not so much the road map that a sitemap is but more the footnotes of your website.
“What is robots.txt?” you ask?
The robots.txt file is a text file that instructs the search engine robots on how to crawl pages on the website. The robots.txt file is a part of the the robots exclusion protocol (REP) – a group of rules that regulate how robots crawl, access and index content. The REP also includes things like meta robot tags and explains how search engines should treat links, such as “follow” or “nofollow”.
All robots.txt rules result in one of the following three outcomes:
- Full allow: All content may be crawled.
- Full disallow: No content may be crawled.
- Conditional allow: The directives in the robots.txt determine the ability to crawl certain content.
General must-knows about robots.txt:
- In order to be found, the robots.txt file must be placed in a website’s top-level directory
- Robots.txt is case-sensitive: file must be named “robots.txt” (not Robots.txt, robots.TXT, etc).
- It’s a best practice to indicate the location of a sitemap associated with this domain at the bottom of the robots.txt file.
- Improper usage of the robots.txt file can hurt your ranking (An incorrect robots.txt file can block search engines from indexing your page)
- The robots.txt file controls how search engine spiders see and interact with your web pages however some robots may choose to ignore your robots.txt file