Site icon

Setting up the robots.txt site – How to properly configure the Robots file


  • Weight – no more than 32 KB.
  • The name consists entirely of lowercase letters “robots.txt”. Other options, such as capital letters, are not accepted by bots.
  • The content is exclusively Latin. All entries must be in Latin, including the website address: if it is Cyrillic, it must be convert to Punycode. For example, the website entry “okna.rf” looks like this after conversion: “xn--80atjc.xn--p1ai”. It should be used in teams.
  • An exception to the previous rule is webmaster comments. They can be written in any language, as the specialist leaves them to himself and his colleagues and not to the search bots. The symbol “#” is used to identify comments. Robots ignore anything after the “#”. Therefore, make sure that important instructions do not end up there accidentally.
    • Number of robots.txt files – there should be one common file for the entire website including subdomains.
    • Location – root directory. For subdomains, the file should be the same, except that it must be placed in the root of each subdomain.
    • The link to the file is https://example.com/robots.txt (instead of https://example.com you must specify the address of your website).
    • The link to robots.txt should return a server response code of 200 OK.

    Read detailed recommendations for robots.txt from Yandex Herefrom Google – Here.

    Next, let’s look at how you can give recommendations to bots.

    How to write robots.txt correctly

    The file consists of a list of commands (instructions) indicating the pages to which they apply and the recipients – the names of the bots to which the commands apply.

    The clean-param statement is perceived only by Yandex search robots, but otherwise the commands for Google and Yandex bots in 2021 are the same.

    Basic names

    User agent – which bot should react to the command. After the colon, either a specific bot is specified, or all of them are generalized with the * symbol.

    Example. User agent: * – all existing robots, user agent: Googlebot – only Google Bot.

    Don’t allow it — Scanning ban. After the slash, they indicate what the prohibition order applies to.

    Example:

    An empty field in “Disallow” means that the entire website can be crawled:

    And this entry prohibits all robots from scanning the entire website:

    If this is a new site, make sure this directive does not remain in robots.txt after the developers upload the site to the working domain.

    This entry allows the Google search bot to crawl but denies everything else:

    There is no need to register permissions separately. Anything you haven’t closed is considered accessible.

    In records, the trailing slash is important; its presence or absence changes the meaning:

    Disallow: /about/ – The entry closes the “About Us” section, available at https://example.com/about/

    Disallow: /about – closes all links beginning with “/about”, including the https://example.com/about/ section, the https://example.com/about/company/ page, and others.

    Each ban has its own line; It is impossible to list multiple rules at the same time. Here Incorrect entry:

    Format them separately, each correctly on a new line:

    Allow means scan permission; This directive makes it convenient to register exceptions. For example, the entry prohibits all bots from scanning the entire album, but makes an exception for one photo:

    And here is a separate command for Yandex – Clean param. The directive is used to eliminate duplicates that may occur due to GET parameters or UTM tags. Clean-param is only recognized by Yandex bots. Instead, you can use “Disallow”; This command is also understood by Googlebots.

    Let’s say there is a page on the site page=1 and it can have the following parameters:

    https://example.com/index.php?page=1&sid=2564126ebdec301c607e5df

    https://example.com/index.php?page=1&sid=974017dcd170d6c4a5d76ae

    Each resulting address in the index is not needed; it is enough that there is a common main page there. In this case, you need to set Clean-param in Robots and specify that links with suffixes after “sid” in pages on “/index.php” do not need to be indexed:

    If there are multiple parameters, list them separated by an ampersand:

    Clean parameters: sid&utm&ref /index.php

    Lines cannot be longer than 500 characters. Such long strings are rare, but due to the enumeration of parameters it can happen. If the instruction turns out to be complex and long, it can be divided into several. You will find examples in Yandex help.

    Sitemap — Link to sitemap. If there is no sitemap, registration is not required. The map itself is not necessary, but if the site is large, it is better to create one and provide a link in Robots so that search bots can understand the structure more easily.

    Sitemap: https://example.com/sitemap.xml

    We also designate two important special characterswhich are used in robots:

    * – assumes any string after this character;

    $ – indicates that this item needs to be stopped.

    Example. This entry:

    Prevents the robot from indexing site.com/catalog/category1, but does not prevent the robot from indexing site.com/.Catalog/Category1/Product1.

    It’s better not to assemble teams manually; There are services that work online and free of charge. Tool to generate robots.txt collects the necessary commands for free: open or close the site for bots, specify the path to the site map, set a restriction on visiting selected sections and set a visit delay.

    There are other free file generators that can help you create robots quickly and avoid mistakes. Popular engines have plugins that make assembling a file even easier. We’ll talk about it below.

    How to check if robots.txt is correct

    After you create the file and add it to the root directory, you should check whether bots can see it and whether the entry is incorrect. Search engines have their own tools:

    • Find errors when filling out robots – Tool from Yandex. Specify the site and enter the contents of the file in the field.
    • Check availability for search robots – Google tool. Enter the URL link with your robots.txt file.
      • Determine the presence of robots.txt in the root directory and the availability of the site for indexing – Website analysis by PR-CY. The service has another 70+ tests that check SEO, technical parameters, links and more.

      Important Events shows the dates on which the file was modified.

      Correct robots.txt for different CMS: examples of a finished file

      The robots.txt file is located in the root folder of the site. To create or edit it, you must connect to the site using FTP access. Some management systems (e.g. Bitrix) provide the ability to edit a file in the management panel.

      Let’s see what options for editing robots.txt there are in Popular CMS.

      WordPress

      WP has many free plugins that generate robots.txt. This option is provided as part of general SEO plugins Yoast SEO And All in one SEObut there are also separate ones that are responsible for creating and editing the file, for example:

      Example robots.txt for a WordPress content project

      This is a file option for blogs and other projects without personal account and trash functions.

      Example robots.txt for an online shop on WordPress

      A similar file, but with the specifics of an online store on the platform WooCommerce based on WordPress. We close the same as in the previous example, plus the shopping cart itself, as well as the separate pages for adding to the shopping cart and for the user to place an order.

      1C Bitrix

      In the “Search Engine Optimization” module of this CMS, starting from version 14.0.0, you can configure robot management through the website administration window. The required section is located in the Marketing > Search Engine Optimization > Robots.txt Settings menu.

      Example robots.txt for a site on Bitrix

      A similar set of instructions with additions that imply that the site has a user’s personal account.

      OpenCart

      This engine has an official module Edit robots.txt Opencart to work with the file directly from the admin panel.

      Example robots.txt for a shop on OpenCart

      The CMS OpenCart is commonly used as the basis for an online store, so the Robots example is tailored to the needs of e-commerce.

      User agent: *

      Disallow: /*route=account/

      Disallow: /*route=affiliate/

      Disallow: /*route=checkout/

      Disallow: /*route=product/search

      Disallow: /index.php?route=product/product*&manufacturer_id=

      Disallow: /admin

      Disallow: /catalog

      Disallow: /system

      Disallow: /*?sort=

      Disallow: /*&sort=

      Disallow: /*?order=

      Disallow: /*&order=

      Disallow: /*?limit=

      Disallow: /*&limit=

      Disallow: /*?filter=

      Disallow: /*&filter=

      Disallow: /*?filter_name=

      Disallow: /*&filter_name=

      Disallow: /*?filter_sub_category=

      Disallow: /*&filter_sub_category=

      Disallow: /*?filter_description=

      Disallow: /*&filter_description=

      Disallow: /*?tracking=

      Disallow: /*&tracking=

      Disallow: *page=*

      Disallow: *search=*

      Disallow: /shopping cart/

      Disallow: /forgot password/

      Disallow: /login/

      Disallow: /compare-products/

      Disallow: /add-return/

      Disallow: /Vouchers/

      Sitemap: https://example.com/sitemap.xml

      Joomla

      There are no separate extensions associated with generating robots.txt for this CMS. During installation, the control system automatically generates a file containing all the necessary prohibitions.

      Example robots.txt for a Joomla site

      The file contains closed plugins, templates and other system solutions.

      User agent: *

      Disallow: /administrator/

      Disallow: /cache/

      Disallow: /components/

      Disallow: /component/

      Disallow: /includes/

      Disallow: /installation/

      Disallow: /Language/

      Disallow: /libraries/

      Disallow: /media/

      Disallow: /modules/

      Disallow: /plugins/

      Disallow: /templates/

      Disallow: /tmp/

      Disallow: /*?start=*

      Disallow: /xmlrpc/

      Allow: *.css

      Allow: *.js

      Sitemap: https://example.com/sitemap.xml

      Search robots perceive instructions in robots.txt as recommendations that can be followed or not followed. However, if the file has no inconsistencies and there are no inbound links to closed sections, bots have no reason to ignore the rules. Use our guides and examples to ensure that only the pages on your site that users actually need appear in search results.

      IN PromoPult SEO module Before the promotion, they check the technical condition of the site, including the robots.txt file, and create a checklist with internal optimization tasks. They can be carried out by platform specialists, the resource owner himself or his employees. PromoPult has a simple user interface. The most complex technical processes are carried out by AI tools, from semantic selection to technical audit, saving time and budget. You can test SEO in PromoPult Free for 2 weeksand you will find successful cases in your niche Here.

      Advertising. LLC “Klik.ru”, INN: 7743771327, ERID: 2VtzqwXqNpZ



    Source link

    Exit mobile version