Robots Implementation Guide for Developers

Back to article list

What are Robots?

Robots is one of 3 different ways to tell crawlers what they should and should not crawl and index on your site. 

You should only use one of these 3 methods for a page:

  • Robots.txt file
  • Robots meta tag
  • X-Robots HTTP Header

The process crawlers take when they visit your website is as follows:

  • A crawler comes along and before it accesses any pages on your site, it looks for a robots.txt file
  • If it finds a robots.txt file, it does or doesn’t crawl based on those directives
  • If it doesn’t find a robots.txt file, then it looks for a robots metatag or x-robots header to tell it whether or not to index a page and follow the links on it to other pages. 
  • If it doesn’t find a metatag or x-robots header, it indexes and follows a page anyway. This is the default. 

Whereas robots.txt file directives give bots suggestions for how to crawl a website’s pages, robots meta directives provide more firm instructions on how to crawl and index a page’s content.

In most cases, using a meta robots tag with parameters “noindex, follow” should be employed as a way to to restrict crawling or indexation instead of using robots.txt file disallows.

The robots.txt file is used to guide a search engine as to which directories and files it should crawl. It does not stop content from being indexed and listed in search results.

The noindex robots meta tag tells search engines not to include content in search results and, if the content has already been indexed before, then they should drop the content entirely. It does not stop search engines from crawling content.


Try this next article:
What is OpenAI’s latest web crawler: the GPTBot?
All articles

More from Sleeping Giant Media:
Threads – A Guide To Twitter’s New Rival Platform
YouTube Shorts: What Are They? & How to Create Them
Canonical Tags Implementation Guide for Developers