Crawl errors and how they are Stopping your website for a Better Rank
Table of Contents
What does crawl error mean
During crawls, search engines encounter errors that prevent them from accessing your page. The bots that index your pages will not be able to read your content due to these errors.
In the legacy version of Google Search Console, crawl errors are reported in a report called Crawl Errors.
Two main sections make up the Crawl Errors report:
- Site errors: Googlebot is unable to access your entire site due to these errors.
- URL errors: Googlebot cannot access a certain URL when it encounters this error.
As of the latest Google Search Console version, errors will be displayed per URL under Reports, Index Coverage.
It also displays how many indexings have taken place over time, according to the new Search Console Index Coverage section.
- Issues they’ve run into and whether they’ve been resolved by you
- Google’s index of valid pages
- Pages not indexed by Google
- When Google indexes some valid pages but finds some errors
Now Let’s elaborate on the types of the crawl error report.
A website’s crawl errors block your site from being accessed by the search engine bot. The most common reasons are:
- DNS Errors
You can’t communicate with a search engine if this happens. Your website could not be accessed if it is down, for instance. Most of the time, this issue is temporary. If Google doesn’t crawl your site right away, it will do so later. Google probably has tried a couple of times and hasn’t been able to crawl your site after seeing crawl errors in your Google Search Console.
- Server errors
This means the bot couldn’t access your website if your search console results show server errors. A timeout could have occurred. The website was unable to load so quickly that the search engine presented an error message. The page may not load due to flaws in your code. The server might also be overwhelmed by all the requests from your site.
- Robots failure
To find out if there are any parts of your website you don’t want to be indexed, Googlebot crawls your robots.txt file before crawling your website. The crawl will be delayed if that bot can’t reach the robots.txt file. Be sure to always have it accessible.
There you have it, explaining a bit more about your site’s crawl errors. We will now look at how specific pages might result in crawl errors.
In a nutshell, URL errors result from crawl errors when bots attempt to spider a particular webpage. Whenever we talk about URL errors, we usually begin by discussing 404 Not Found errors.
These types of errors should be checked frequently (using Google Search Console or Bing Webmaster tools) and fixed. You can use the 410 page if the page/subject has been removed from your website and is never expected to return.
Please use a 301 redirect instead of a similar page if your content is similar on another page. As well as ensuring your sitemap is up to date, make sure your internal links are working.
The most common cause of these URL errors, by the way, is internal links. Consequently, you are responsible for many of these issues. You can also adjust or remove inbound links to the removed page if you remove the page from your site at some point. These links are no longer relevant or useful.
This link remains the same, so it will be found and followed by a bot, but it will fail to return results (404 Not Found). This should appear on your site. Keep your internal links up to date!
An error labeled ‘submitted URL’ is another common URL error. When Google detects inconsistent behavior, it displays this error. On the one hand, you are telling Google that you want to index the page, which means you submitted the URL for indexing.
Another reason could be that your robots.txt file is blocking Google from indexing this page. By using a noindex meta tag or HTTP header, the page is marked as non-indexed. Your URL will not be indexed if you do not fix the inconsistent message.
One example of a common error would be when DNS fails or when a server fails to provide the requested URL. If the error still exists, try checking that URL later. If you are using Google Search Console as your main monitoring tool, make sure you mark the errors as fixed in the console.
Very specific URL errors
Occasionally, URL errors appear on certain websites only. To show them separately, I’ve listed them below:
- URL errors specific to mobile devices
Mobile device crawl errors are based on page-specific errors. Mobile devices crawl errors usually do not surface on responsive websites. You may just want to disable Flash content for the time being. By maintaining a separate mobile subdomain like m.example.com, you may encounter more errors. Your desktop site might be redirecting to your mobile site through an incorrect redirect. It is even possible to block parts of these mobile sites by adding a robots.txt file.
- Viruses and malware errors
This means that Google or Bing has discovered malicious software on that URL, if you encounter malware errors in your webmaster tools. In other words, it could mean that software has been discovered that is being used, such as, “for gathering data or to interfere with their operations.”(Wikipedia). Remove the malware found on that page.
- There are errors in Google News
Certain Google News errors. It is possible for your website to receive these crawl errors if it is in Google News. Google documents these errors quite well. Your website may contain errors ranging from the absence of a title to the fact that no news article seems to be present. Make sure to examine your site for such errors.
How do you fix a crawl error
Ready to Chat About
Drop us a line today!
1. Using robots meta tag to prevent the page from being indexed
During this process, your page’s content will not even be seen by the search bot, which moves directly to the next page.
If your page contains the following directive, you can detect this issue:
2. Links with Nofollow
In this case, the content of your page will be indexed by the crawler but links will not be followed.
3. Blocking the pages from indexing through robots.txt
The robots start by looking at your robots.txt file. Here are some of the most frustrating things you can find:
The website’s pages will not be indexed since all of them are blocked.
The site may be blocked only on some pages or sections, for example:
As a result, no product descriptions will be indexed in Google for pages in the Products subfolder.
Users, as well as crawlers, are adversely affected by broken links. A crawl budget is spent every time a search engine indexes a page (or tries to index it). Broken links mean that the bot won’t be able to reach relevant and quality pages because it will be wasting its time indexing broken links.
4. Problems with the URL
The most common cause of URL errors is a typo in the URL you add to your page. Check all the links to be sure they are correctly typed, and spelled correctly.
5. Out-of-date URLs
It’s important that you double-check this issue if you’ve recently upgraded to a new website, removed bulk data, or changed the URL structure. Ensure that none of your website’s pages reference deleted or old URLs.
6. Restricted pages
There is a chance that these pages are only accessible to registered users if many of your website’s pages return, for instance, a 403 error code. So that crawl budget is not wasted on these links, mark them as nofollow.
7. Problems with the server
There may be server problems if several “500” errors (for example, 502) occur. The person responsible for the development and maintenance of the website can fix them by providing the list of pages with errors. Bugs or site configuration issues that lead to server errors will be handled by this person.
8. Limited capacity of servers
Overloaded servers may be unable to handle requests from users and bots. The “Connection timed out” message is displayed when this occurs. Only a website maintenance specialist can solve the problem, since he or she will estimate whether additional server capacity is necessary.
9. Misconfigured web server
There are many complexities involved in this issue. While you can see the site properly as a human, the site crawlers receive an error message, and all of the pages cease to be crawled. Certain server configurations can cause this: A web application firewall will block Google bot and other search bots by default. To summarize, this problem must be solved by a specialist, with regard to all its related aspects.
Crawlers base their first impressions on the Sitemap and robots.txt. By providing a sitemap, you are telling search engines how you would like them to index your web page. Here are a few things that can go wrong once your sitemap(s) are indexed by the search engine.
10. Errors in format
A format error can be due to an invalid URL, for instance, or to a missing tag. The sitemap file may also be blocked by robots.txt (at the very beginning). The bots were therefore unable to access the sitemap’s content.
11. Sitemap contains incorrect pages
Getting to the point, let’s go over the content. The relevance of URLs in a sitemap can still be estimated, even if you aren’t a web developer. Review your sitemap very carefully and ensure that each URL in the sitemap is: relevant, current, and correct (no typos or misspellings). If bots cannot crawl the entire website due to a limited crawl budget, sitemap indications can guide them towards the most valuable pages.
Don’t put misleading instructions in the sitemap: make sure that robots.txt or meta directives are not preventing the bots from indexing the URLs in your sitemap.
This category of problems is the most challenging to resolve. As a result, we suggest you complete the previous steps before you proceed with the next step.
Crawlers may become disoriented or blocked by these problems in the site architecture.
12. Problems with internal linking
A correctly structured website allows the crawlers to easily access each page by forming an indissoluble chain.
- There is no other page on the website linking to the page you want to rank. Search bots will not be able to find and index it this way.
- An excessive number of transitions leading from the main page to the page you want to be ranked. There’s a possibility that the bot will not find it if a transition has more than four links.
- In excess of 3000 links on a single page (too many links to crawl for a crawler).
- Link locations are hidden behind inaccessible elements of the site: forms to fill out, frames, plugins (Java and Flash first of all).
There is rarely a quick fix for an internal linking problem. Working with website developers requires us to look at the site structure in depth.
13. Incorrect redirects
A redirect is needed to direct visitors to a more appropriate page (or, better yet, the one the website owner feels is appropriate). Here are some things you may overlook regarding redirect:
- Using 302 and 307 redirects instead of permanent ones is a signal to the crawler to keep returning to the page repeatedly, wasting the crawl budget. As a result, when using the 301 (permanent) redirect, the original page doesn’t need to be indexed anymore, use the 301 (permanent) redirect for it.
- Two pages may be redirected to each other in a redirect loop. Thus, the crawl budget is wasted as the bot gets caught in a loop. Look for possible mutual redirection and remove it if it exists.
14. Slow loading time
You will see your crawler go through your pages faster if your pages load quickly. Every millisecond counts. Load speed is also correlated to the position of a website on SERP.
Website performance can be slow due to server-side factors – the available bandwidth isn’t adequate anymore. Please consult your price plan description to find out how much bandwidth you have available.
A very common issue is inefficient code on the front-end. You are at risk if the website contains a large number of scripts or plug-ins. Be sure to check regularly that your photos, videos, and content related to them load quickly, and that the page doesn’t load slowly.
15. Poor website architecture leading to duplicate pages
“11 Most Common On-site SEO Issues” by SEMrush reveals that duplicate content is the cause of 50% of site failures. This is one of the main reasons you run out of the crawl budget. A website is only given a certain amount of time by Google, so it’s not appropriate to index the same content over and over. Additionally, the site crawlers aren’t aware of which page to trust more, so the wrong copy may be given priority unless you use canonicals to reverse the process.
There are several ways you can fix the problem by identifying duplicate pages and preventing them from crawling:
- Eliminate duplicate pages
- Parameterize robots.txt as necessary
- Meta tags should contain the necessary parameters
- Put a 301 redirect in place
- Make use of rel=canonical
17. Content created in Flash
The use of Flash can be problematic for SEO (most mobile devices do not support Flash files) and for user experience. Flash elements are not likely to be indexed by crawlers due to their text content and links.
Therefore, we recommend that you don’t use it on your website.
18. Fragments in HTML
There is both good and bad news when it comes to your site having frames. This is probably a sign of how mature your site is. Since HTML frames are extremely outdated and poorly indexed, you should replace them as soon as possible.
About the Author
My name’s Semil Shah, and I pride myself on being the last digital marketer that you’ll ever need. Having worked internationally across agile and disruptive teams from San Fransico to London, I can help you take what you are doing in digital to a whole next level.