Troubleshooting Indexed Warning Although Blocked By
Robots.txt - In the Google Search Console (new console version) an indexing
warning problem arises, even though it is blocked by robots.txt especially for
blogs that use the Blogger platform.
If we check all indexed URLs, even though they are
blocked by robots.txt these are all Search pages, that is, for the Label Search
page and for the old post navigation page.
As shown that these pages are indexed, even if they are
blocked by robots.txt. That's because bloggers use robots.txt as follows :
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: https://www.yourdomain.com/sitemap.xml
The robots.txt above shows that all Search pages are not allowed to be banned.
However, because these search pages are linked to blogs
such as breadcrumbs , menus , or widget labels or next
prev navigation , then these pages are still crawled by bots.
To overcome this problem, it is recommended that these
pages be allowed to be crawled by bots and displayed in search results.
Please replace robots.txt with the following code if you
are using robots.txt as above.
User-agent: *
Disallow:
Sitemap: https://www.yourdomain.com/sitemap.xml
Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: https://www.yourdomain.com/feeds/posts/default
Sitemap: https://www.yourdomain.com/sitemap-pages.xml
Please replace the code marked with your blog domain.
For the following code, make a new line if your blog post
is above 500.
Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=501&max-results=500
And so on, if the post is above 1000, then make a new line again as follows:
Sitemap: https://www.yourdomain.com/atom.xml?redirect=false&start-index=1001&max-results=500
Then please save the following noindex meta tag code in the <head> section of the blog to block the bot on the archive, search, label page and not display it on Google's search results page.
<b:if cond='data:view.isArchive'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
<b:if cond='data:blog.searchQuery'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
<b:if cond='data:blog.searchLabel'>
<meta content='noindex,noarchive' name='robots'/>
</b:if>
And make sure you don't use the blogger archive widget.
Then please follow my 2 previous posts so as not to cause
data structure errors.
After all of the above is done, please submit your new
robots.txt in the robots.txt testing tool so that Google can quickly recognize
your new robots.txt.
Then enter the Console and validate the Indexed warning,
even if it is blocked by robots.txt and please continue to monitor Search Console.
For other results I will update this post later.