Fluctuations in Pages Crawled
Frequently Asked Questions
What's Covered?
In this guide you’ll learn more about what can cause fluctuations in the number of pages crawled in your Site Crawl and how to investigate them in Moz Pro.
Quick Links
Overview of How We Crawl and What Causes Fluctuations In Pages Crawled
Our Site Crawl bot, Rogerbot, finds pages by crawling all of the HTML links on the homepage of your site. It then moves on to crawl all of those pages and the HTML links and so on. Rogerbot continues like that until all of the pages we can find are crawled for your site, subdomain, or subfolder that was entered when you created your Campaign.
Usually, if a page is linked to from the homepage, it should end up getting crawled. If it doesn't, it may be a sign that those pages aren't as accessible as they could be to search engines.
Here are some things that can affect our ability to crawl your site:
- Broken or lost internal links
- If your site is built primarily with Javascript, especially if your links are in Javascript we won't be able to parse those links
- Meta tags or robots.txt telling rogerbot not to crawl certain areas of the site
- Lots of 5xx or 4xx errors in your crawl results
Below we’ll talk about how to investigate a few of these issues using Moz Pro.
How to Monitor Crawl Fluctuations
If you're seeing your number of pages crawled fluctuate it can take some investigation to find the cause.
To get started, it can help to identify which pages are being included or excluded in your crawl report by exporting your weekly Site Crawl data to CSV. To do so head to Site Crawl > All Crawled Pages > Export CSV (located on the right-hand side).
When examining your reports, take note of any pages you’re expecting to see included which aren’t. Additionally, make sure to note any pages that have unusual URLs, extra long URLs, or ones you’re not expecting to be included in your crawl report.
After investigating, hold onto these reports so you can use them to compare future crawls and investigate issues if necessary.
Below you’ll find common causes of fluctuations in pages crawled and how to investigate them.
Broken or Lost Internal Links
If you’re seeing a drop in the number of pages crawled for your site, or you’re not seeing as many pages crawled as you’re expecting, it is a good idea to check in on your broken and/or lost internal links.
Within the Site Crawl section of your Campaign, you can find links that are redirecting to 4xx errors in the Critical Crawler Issues tab.
If an internal link is redirect to a 4xx error, our crawler won’t be able to move past that 4xx to find more links and pages.
Meta Tags Banning Rogerbot
Within the Site Crawl section of your Campaign, you can find pages that are marked as nofollow in the Crawler Warnings tab.
If a page on your site is marked as nofollow, this is telling our crawler not to follow and crawl any links on, or beyond, that page. So for example, if you have a page with 10 new pages linked on it but the page is marked as nofollow in the meta tag or x-robots tag, those 10 new pages will not be crawled, and therefore not added to your site crawl data.
Robots.txt file banning Rogerbot
If there are pages you’re expecting to be in the crawl which aren’t, it’s recommended that you check your robots.txt file to make sure that our crawler isn’t being blocked from accessing those.
If there are subfolders or your site blocking crawlers by a wild card directive or a user-agent specific directive for rogerbot, our crawler will not be able to access and crawl pages within that subfolder or any pages beyond it.
4xx or 5xx Errors Limiting the Crawl
Within the Site Crawl section of your Campaign, you can find pages that returned a 5xx or 4xx error to our crawler in the Critical Crawler Issues tab.
5xx and 4xx errors returned in your Site Crawl can be a sign that something is amiss with your site or server. Additionally, if our crawler encounters one of these errors, it’s not able to crawl any further. This means, if you have pages that are normally linked to on a page but that page returns an error to our crawler, our crawler will not find any links or pages beyond that error.
Related Articles
Was this article helpful?
Yes! Amazing! Yes! It was what I needed. Meh. It wasn’t really what I was looking for. No, it wasn’t helpful at all.
Thanks for the feedback.