Robots.txt is a text file that website owners create to instruct search engine robots to scan pages on their websites. The robots.txt file comes under the the robot's exclusion protocol (REP). REP is a group of web standards that operate how robots scan the web, access and index content, and serve that content to users.
Practically, robots.txt files find whether certain user/ software can or cannot scans parts of a website. The instructions are specified by “disallowing” or “allowing” the behavior of user agents.
Syntax:
User-agent: [user-agent name]Disallow: [URL string not to be crawled]
The robots.txt file is publicly available. To see the robots.txt file of any website just add “/robots.txt” to the end of any root domain to see that website’s directives. Anyone can see what pages you do or don’t want to be visited, so don’t hide private user information in the robots.txt file. Every subdomain on a root domain uses separate robots.txt files. robots.txt is case sensitive file, which means, the file must be named “robots.txt” and not “Robots.txt”, “robots.TXT”, or otherwise. A robots.txt file must be placed in a website’s top-level directory to be found easily by users.
Robots.txt files control visiting access to certain areas of your site. While this can be very dangerous if you accidentally disallow Googlebot from visiting your entire site, there are some situations in which a robots.txt file can be very useful
Some common use cases include:
Search engines have two main jobs:
After arriving at a website, the search user looks for a robots.txt file. If it finds one, it will read that file first before continuing through the page. Because the robots.txt file contains insightful information about how the search engine should analyze the website, the information found there will instruct further action of the visitor on this particular site. If the site does not have a robots.txt file, it will proceed to search for other information on the site.