Troubleshooting Warning Indexed Although Blocked By Robots.txt

Troubleshooting Warning Indexed Although Blocked By Robots.txt

Troubleshooting Warning Indexed Although Blocked By Robots.txt

Troubleshooting Warning Indexed Although Blocked By Robots.txt

Troubleshooting Indexed Warning Although Blocked By Robots.txt - In the Google Search Console (new console version) an indexing warning problem arises, even though it is blocked by robots.txt especially for blogs that use the Blogger platform.

If we check all indexed URLs, even though they are blocked by robots.txt these are all Search pages, that is, for the Label Search page and for the old post navigation page.

As shown that these pages are indexed, even if they are blocked by robots.txt. That's because bloggers use robots.txt as follows :

User-agent: Mediapartners-Google

Disallow:
User-agent: *
Disallow: /search

Allow: /
Sitemap: https://www.yourdomain.com/sitemap.xml

The robots.txt above shows that all Search pages are not allowed to be banned.

However, because these search pages are linked to blogs such as breadcrumbs , menus , or widget labels or next prev navigation , then these pages are still crawled by bots.

To overcome this problem, it is recommended that these pages be allowed to be crawled by bots and displayed in search results.

Please replace robots.txt with the following code if you are using robots.txt as above.

User-agent: *
Disallow:

Sitemap: https://www.yourdomain.com/sitemap.xml
Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: https://www.yourdomain.com/feeds/posts/default
Sitemap: https://www.yourdomain.com/sitemap-pages.xml

Please replace the code marked with your blog domain.

For the following code, make a new line if your blog post is above 500.

Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=501&max-results=500

And so on, if the post is above 1000, then make a new line again as follows:

Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=1001&max-results=500

Then please save the following noindex meta tag code in the <head> section of the blog to block the bot on the archive, search, label page and not display it on Google's search results page.

<b:if cond='data:view.isArchive'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
<b:if cond='data:blog.searchQuery'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
<b:if cond='data:blog.searchLabel'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>

And make sure you don't use the blogger archive widget.

Then please follow my 2 previous posts so as not to cause data structure errors.


After all of the above is done, please submit your new robots.txt in the robots.txt testing tool so that Google can quickly recognize your new robots.txt.

Then enter the Console and validate the Indexed warning, even if it is blocked by robots.txt and please continue to monitor Search Console.

For other results I will update this post later.

Buka Komentar

Advertiser