Skip to content

Cyber Sale: Save big on Moz Pro! Sign up by Dec. 6, 2024

Moz tools 824528c

May Mozscape Index Update:164 Billion URLs

Rand Fishkin

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Table of Contents

Rand Fishkin

May Mozscape Index Update:164 Billion URLs

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

It's that time once again! Mozscape's latest index update is live as of today (and new data is in OSE, the mozBar and PRO by tomorrow). This new index is our largest yet, at 164 Billion URLs, however that comes with a few caveats. The major one is that we've got a smaller-than-normal number of domains in this index, so you may see link counts rising, while unique linking root domains shrink. I asked the team why this happened, and our data scientist, Matt, provided a great explanation:

We schedule URLs to be crawled based on their PA+external mozRank to crawl the highest quality pages. Since most high PA pages are on a few large sites this naturally biases to crawling fewer domains. To enforce some domain diversity the V2 crawlers introduced a set of domain mozRank limits that limit the crawl depth on each domain. However, this doesn't guarantee a diverse index when the crawl schedule is full (as we had for Index 52).

In this case, many lower quality domains with low PA/DA are cut from the schedule and out of the index. This is the same problem we ran into when we first switched to the V2 crawlers last year and the domain diversity dropped way down. We've since fixed the problem by introducing another hard constraint that always schedules a few pages from each domain, regardless of PA. This was implemented a few weeks ago and the domain numbers for Index 53 are going back up to 153 million.

Thankfully, the domains affected should be at the far edges of the web - those that aren't well linked-to or important. Still, we recognize this is important and thus are focused on balancing these moving forward.

Several other points may be of interest as well:

  • Last index took nearly 13 weeks to process, this one's only 7 weeks. This means relatively fresher data, though not as fresh as we'd like. The oldest information will be from February and the newest from mid-April.
  • Of all the URLs on which data was requested in the last month, this update has data for 88.56% of them (this is only very slightly lower than last index's 88.80%) 
  • This index still has very high correlations with rankings. Below are a few samples of Spearman correlations with higher rankings in Google.com (US):
    • Page Authority (PA) - 0.38
    • Domain Authority (DA) - 0.26
    • URL MozRank (mR) - 0.20
    • URL MozTrust (mT) - 0.22
    • Linking Root Domains to the URL - 0.29
    • Total # of Links to the URL - 0.22

This bit is important: Next index, we're going back down to between 70-90 billion URLs, and focusing on getting back to much fresher updates (we're even aiming to get to updates every 2 weeks, though this is a challenging goal, not a guarantee). The 150 billion+ page indices are an awesome milestone, but as you've likely noticed, the extra data does not equate with hugely better correlations nor even with massively higher amounts of data on the URLs most of our customers care about (as an example, in index 50, we had ~53 billion pages and 82.09% of URLs requested had data). That said, once our architecture is more stable, we will be aiming to get to both huge index sizes and dramatically better freshness. Look for tons of work and improvements over the summer on both fronts.

Below are the stats for Index 52: 

  • 164,569,893,828 (164 billion) URLs
  • 1,222,033,252 (1.22 billion) Subdomains
  • 117,444,355 (117 million) Root Domains
  • 1,784,256,496,532 (1.7 trillion) Links
  • Followed vs. Nofollowed
    • 2.57% of all links found were nofollowed
    • 64.91% of nofollowed links are internal
    • 35.09% are external
  • Rel Canonical - 11.33% of all pages now employ a rel=canonical tag
  • The average page has 85.12 links on it
    • 74.38 internal links on average
    • 10.74 external links on average

Feedback is greatly appreciated - this index should help with Penguin link data identification substantively more than our prior one, and the next one should be even more useful for that. Do remember that since this index stopped crawling and began processing in mid-April, link additions/removals that have happened since won't be reflected. Our next index will, hopefully, be out with 5 or fewer weeks of processing, to enhance that freshness. We're excited to see how this affects correlations and data quality.

Back to Top

With Moz Pro, you have the tools you need to get SEO right — all in one place.

Read Next

Coming Soon: An All-New Moz Local

Coming Soon: An All-New Moz Local

Nov 12, 2024
Accelerate Your SEO Knowledge: New Webinar From Moz

Accelerate Your SEO Knowledge: New Webinar From Moz

Nov 11, 2024
Announcing MozBar V4 - A New Era for Moz’s SEO Browser Extension

Announcing MozBar V4 - A New Era for Moz’s SEO Browser Extension

Oct 23, 2024

Comments

Please keep your comments TAGFEE by following the community etiquette

Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.