“Indexed, though blocked by robots.txt” Warnings in Google Search Console?
A new client messaged us in Great Crosby about a problem they were seeing on their site. They had a pretty large site with over 2000 pages and none of them had been receiving traffic. When they checked Google Search Console they saw the warning indexed though blocked by robots txt. Along with some supporting information
The majority of these pages are filter parameters, these pages are mostly:
- blocked within the robots.txt
- have a “noindex” tag on the page
- Canonical to the master URL
- Are NOT included within the sitemap
- Are blocked using the GSC Parameter tool
Scary Warning
It’s a scary issue when you see this in Google Search Console for the first time. You think you have been hit by a Google penalty and that all your traffic will disappear. But, luckily, its usually not that serious. In most cases, it can be fixed pretty easily.
If you see this message in Google Search Console, something on your site or server is blocking Google from reading and indexing the site properly. This can be caused by a few different things:
1. Your robots.txt file is blocking Google from indexing your site
2. You have a “noindex” tag on your pages
3. Your pages are canonicalized to the wrong URL
4. Your pages are not included in your sitemap
5. JS conflict issues
6. Canonical issues
Try to find out when this happened.
If you have just launched the website then it might be as simple as clicking a button to turn off the no index setting.
Maybe it just happened, then we would suggest looking over any changes to the site no matter how small.
Things like:
- Adding plugins
- Updating plugins
- Changing themes
- Changing server (even if it’s just an upgrade)
If you are having issues with Google indexing or seeing the “Indexed, though blocked by robots.txt” warning on your site feel free to reach out to us and we would be happy to help!