What are Robots?
Robots is one of 3 different ways to tell crawlers what they should and should not crawl and index on your site.
You should only use one of these 3 methods for a page:
- Robots.txt file
- Robots meta tag
- X-Robots HTTP Header
The process crawlers take when they visit your website is as follows:
- A crawler comes along and before it accesses any pages on your site, it looks for a robots.txt file
- If it finds a robots.txt file, it does or doesn’t crawl based on those directives
- If it doesn’t find a robots.txt file, then it looks for a robots metatag or x-robots header to tell it whether or not to index a page and follow the links on it to other pages.
- If it doesn’t find a metatag or x-robots header, it indexes and follows a page anyway. This is the default.
Whereas robots.txt file directives give bots suggestions for how to crawl a website’s pages, robots meta directives provide more firm instructions on how to crawl and index a page’s content.
In most cases, using a meta robots tag with parameters “noindex, follow” should be employed as a way to to restrict crawling or indexation instead of using robots.txt file disallows.
The robots.txt file is used to guide a search engine as to which directories and files it should crawl. It does not stop content from being indexed and listed in search results.
The noindex robots meta tag tells search engines not to include content in search results and, if the content has already been indexed before, then they should drop the content entirely. It does not stop search engines from crawling content.