When it comes to the SEO positioning of a website, SEO OnPage factors such as the link profile, image optimization or keyword search are most often considered. However, there is another part of search engine optimization that is often neglected, such as: B. Sitemaps, or even robots.txt files, and it shouldn’t. We’ll explain why.
The robots.txt file tells search engine robots and algorithms which pages they can access and which should not be indexed
Robots.txts are an important part of SEO as they are responsible for opening or closing the doors to the various search engines’ robots. These files are used to Improvement in spider navigation and the search algorithms that direct them to the pages that need to be indexed and “block” their access to those that do not wish to be indexed.
The robots.txt file, as the name suggests, is a TXT file that can be created with Windows or Linux editor and uses a standard format for the Robots Exclusion Protocol, a series of commands (through commands), with where robots and algorithms know whether or not to access a page on a website. Once created, this file must be saved in the website’s root folder.
Since this file is stored in the root folder, it is very easy to access a website’s robots.txt file by simply entering the website’s address and adding “/robots.txt” at the end. That way, you can learn what the structures of these files are, and take ideas from the top pages to create your own robots.txt files.
The main commands of robots.txt
These are the main commands that can be added to the robots.txt file
-User agent: This command is used to give specific instructions to each search robot. You can check the name of each robot in the web robot database. For example, Google’s name is “Googlebot”. To give Google commands, the command is: «User-Agent: Googlebot». If you want to enter a general command for all bots, the command is: «User-Agent: *».
-Enable: Allow determines the pages to be indexed. Although all pages on a website are automatically indexed (unless otherwise specified), the Allow command indicates that a page in a folder that is not indexed must be indexed. For example:
«Do not allow: / Biblioteca /»
«Allow: / Library / Horror Books /»
-Disallow: This command specifies the pages of the website that should not be included in search results. For example, to restrict bot access to a website’s Library page, the correct command would be: “Do not allow: / library /”.
-Page directory: The sitemap command tells robots where sitemaps are on a website. This command is out of date due to the use of the Google Search Console and its direct upload sitemap tool.
What is the robots.txt file for?
The robots.txt file can have two main functions:
-Block search engines: The most common use of robots.txt is to block some search engines from accessing the website. This can be useful to save space so that searches do not overwhelm the server.
– Hide images and other elements from search engines: Pages that create their own images as infographics or post their own photos and want to protect them from search engines so they don’t appear in Google Images can block Google from accessing them. This also applies to other items that Google doesn’t want to track.