Skip to content
Moz news 2 649533a

Announcing SEOmoz's Index of the Web and the Launch of our Linkscape Tool

Rand Fishkin

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Table of Contents

Rand Fishkin

Announcing SEOmoz's Index of the Web and the Launch of our Linkscape Tool

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

After 12 long months of brainstorming, testing, developing, and analyzing, the wait is finally over. Today, I'm ecstatic to announce some very big developments here at SEOmoz. They include:
  • An Index of the World Wide Web – 30 billion pages (and growing!), refreshed monthly, built to help SEOs and businesses acquire greater intelligence about the Internet's vast landscape

  • Linkscape – a tool enabling online access to the link data provided by our web index, including ordered, searchable lists of links for sites & pages, and metrics to help judge their value.

  • A Fresh Design – that gives SEOmoz a more usable, enjoyable, and consistent browsing experience

  • New Features for PRO Membership – including more membership options, credits to run advanced Linkscape reports (for all PRO members), and more.

Since there's an incredible amount of material, I'll do my best to explain things clearly and concisely, covering each of the big changes. If you're feeling more visual, you can also check out our Linkscape comic, which introduces the web index and tool in a more humorous fashion:

Check out the Linkscape Comic

SEOmoz's Index of the Web

For too long, data that is essential to the practice of search engine optimization has been inaccessible to all but a handful of search engineers. The connections between pages (links) and the relationship between links, URLs, and the web as a whole (link metrics) play a critical role in how search engines analyze the web and judge individual sites and pages. Professional SEOs and site owners of all kinds deserve to know more about how their properties are being referenced in such a system. We believe there are thousands of valuable applications for this data and have already put some effort into retrieving a few fascinating statistics:

  • Across the web, 58% of all links are to internal pages on the same domain, 42% point to pages off the linking site.

  • 1.83% of all links on the web are nofollowed and of these, 61% are external-pointing, while 39% link to pages on their own site. While those percentages may seem small, that's a massive number (~2 billion links) that are leveraging nofollow for link juice "sculpting."

  • While 0.08% of pages on the web use the 301 redirect, 0.12% (nearly twice as many) employ 302 redirects. Another 0.005% use the meta refresh.

  • About 1.5% of all pages use the meta noindex tag (which is a lot of content the engines don't get to see) and 0.87% of all pages use the meta nofollow tag.

  • From our entire index of pages, the median page received about 77 links (both internal and external), while the average page gets 32. If your pages have more than 32 links, congratulations! You're above average :-)

Over time, we hope to answer hundreds of questions that the major engines, due to their penchant for secrecy, have kept under wraps. We'll also be offering custom data reports for companies that would like to retrieve more specific information from our index.


Along with all the exciting possibilities for leveraging this resource comes an understanding of its strengths and weaknesses. SEOmoz's index is by no means perfect or complete, but I have been shocked, time and again, at the degree to which the data has provided exceptional valuable. Some things to be aware of, however, include:

  • Domain Diversity over Domain Breadth
    Our crawl biases towards having pages and data from as many domains as possible, rather than intensely and exhaustively indexing every URL on a single domain. Over time, we hope to do both, but in order to provide site owners with valuable data early on, this was our initial focus.

  • Concentration on the “Center” of the Web
    As others who've invested energy into crawling the web in academia have noted, the Internet's pages fit a bow tie-like pattern of a well-connected center (where many links exist between sites and pages) and two external sides where links largely flow one way (either in, towards the center, or out from it). Both as a result of this pattern and because we feel that the most valuable data comes from the most important and well-connected (and connecting) pages, our crawl biases towards this “center” of the well-linked web.

  • Index Freshness
    Our process for crawling the web and making our data available requires significant processing resources (as a comparison, back in 2002, when Google's stated index comprised fewer than 5 billion URLs, they appeared to only compute data once each month, resulting in what SEOs termed the “Google Dance”). Thus, SEOmoz's index generally contains crawl information between 10-50 days in age. Moving forward, we'll continue to refine freshness and, hopefully, have enough commercial success with the product to invest in better and faster crawling and processing.

  • Index Size
    Over the past few weeks, we've run thousands of tests, and come to the general conclusion that SEOmoz's index contains between 1/3 to 1/5 the URLs of the major search engines. When comparing link numbers or data counts, this should be expected. Fortunately, it appears that nearly universally, the SEOmoz index contains the more important, well-linked-to pages and sites, so the missing portions in a comparison are unlikely to be popular, valuable resources.

  • Subtle Differences with the Major Search Engines
    In comparing our crawls against the engines, we have noticed a small number of sites and pages that “cloak” or display content in different ways to different crawlers. While this represents an infinitesimally small percentage of URLs, it's worth noting that Googlebot (and to a lesser extent, Yahoo!'s Slurp and Live's MSNbot) see a slightly different web than other crawlers.

Over the next few weeks, we'll be releasing more information about our crawl and asking for your feedback, too. Until then, we've got some additional, in-depth resources in the Linkscape education center.


Linkscape: Online Access to the Web's Link Graph

Linkscape is the tool I've been lusting for since first getting into the SEO world. It's a truly extensible, usable, fully-featured link research system accompanied by some impressive link-based metrics.

Linkscape Tool Homepage

The primary metric Linkscape exposes is mozRank (abbreviated mR), which we've been using internally with great success for the last few months. Like other link popularity metrics (Google's PageRank, Yahoo!'s old WebRank, Live's StaticRank, etc.), mozRank relies on the intuition that links are votes and that links from more important sources should carry more weight. As of today, mozRank isn't perfect, but it does include substantive differences from the algorithms discussed above (and those mentioned in academic papers) that helps mozRank to reward natural linking and discount many of the more flagrant manipulative link behaviors we found.


Linkscape also features lots of other valuable metrics, including mozTrust (abbreviated mT and inspired by the TrustRank paper), a link popularity metric similar to mozRank, differing only in that it has a built-in bias towards trusted links and those that earn them); Domain mozRank (DmR) & Domain mozTrust (DmT), which calculate mozRank and mozTrust on the domain level (rather than just for individual pages) to learn about which domains carry the most link popularity and trust. There's also a host of individual attributes like image links, links with nofollow, links in noscript tags, links from the same IP address or C-block of IPs and many more. A full list of link attributes is available here.


What does this mean? It means that I can perform a search like this one:

Linkscape Advanced Link Intelligence Report for BlueHatSEO.com

This shows me only those links that come from pages with "seo" in the anchor text and then sorts to show only the links which are embedded in images or come from the same IP C-block or ... well, lots of stuff.


This type of advanced link information has, in my opinion, always been critical to the SEO process, both for self-examination and for competitive analysis. It's almost a crime that we've had to perform link-related SEO tasks without it, so as much as I'm excited to offer this tool to other SEOs, I'm equally thrilled to finally have it for our own clients and projects. It even shows the distribution of anchor text, like this list of anchor text links pointing to SEOmoz's SEO Expert Quiz:

Linkscape Anchor Text Distribution

I could go on about Linkscape for ages, and I probably will in future blog posts, pointing out all the shady links we've uncovered, which types of badges are most likely to be adopted from viral campaigns, and why the search engines might be ranking particular sites and pages where they do, but for now, I suggest you explore the tool on your own. The only final note I'll add is that Linkscape is still in beta, and this means it's somewhat rough right now - the index size, the values of mozRank and mozTrust, the depth of the crawl, and many more items will all be receiving upgrades over the weeks and months to come.


The SEOmoz Redesign

As you've probably noticed, the SEOmoz website has a new look and feel. We might be ironing out kinks for a few days, but I'm very happy with the new layout. We've moved to a wider width as our site stats indicate an extremely low percentage of users visiting on anything under a 1024x768 resolution. For our YOUmoz contributors, this means images can now be up to 630px in width and still fit into blog posts without breaking the formatting.


The re-design also includes a new toolbar for PRO members that provides quick access to all the PRO features when you're logged in:

PRO Toolbar

Pages like our tools, guides (formerly articles), blog, YOUmoz and home page have all received their own overhauls, and we'd love to get your feedback.


Changes to PRO Membership

As I noted a few weeks ago, the price for PRO membership is rising. Starting today, PRO membership will cost $79/month or $799/year. This increase is primarily to help us support the Linkscape project, which (as you can imagine) has been, and will continue to be, an extremely expensive endeavor. PRO is still "risk-free" to try, and if you're unhappy with the service, you can cancel anytime in the first 30 days at no charge. For our 3,000+ legacy members, PRO membership will remain at the price it was when you signed up for as long as you remain a member.


We're also offering two new levels of membership - PRO Plus and PRO Elite, which feature greater access to Linkscape, the Q+A service and SEO Analytics (and planned access to new tools and features in the future). You can learn more about all the different levels on the Go PRO page.


As I mentioned in my previous post, folks who've signed up at the old rates are locked in - no need to worry, your subscription pricing won't rise. However, the current pricing is "introductory" and we are planning to raise the rate for PRO to $99/month, $999/year in December, when we launch... (see below)


The Future

There's so many exciting things we're planning to do, but maybe none of them are more valuable to SEOs than this:


Screenshot of Linkscape Toolbar


We're still in the planning stages, but expect to have a beta version of a toolbar that plugs into Linkscape's API (and leverages many other SEO data sources and tools) available before 2009. There's much more to come, including a sister project to Linkscape (probably launching in Q1 of 2009) and lots more data in Linkscape itself, as well as refining the metrics and growing the index. We expect our first major update around Halloween (Oct. 31), and according to my sources, it should make the currently awesome data 10X awesomer (and yes, awesomer is a word - and a good one at that).


Special Thanks

A debt of gratitude is owed, first and foremost, to the incredible team at SEOmoz. There are 16 fantastic men and women putting up with me (up from only 7 a year ago!) and they have all invested not only a tremendous amount of effort, but a dedication and passion that shines through in the new site and tool. Deserving of specific thanks are Nick Gerner and Ben Hendrickson, ex-Microsofties and founders of their own startups who were excited enough by this project to set aside their pursuits and join our team. Together, they architected the remarkable web indexing and Linkscape projects and have produced something that is, in my opinion, revolutionary – truly disruptive technology for an arena sorely in demand.


Jeff Pollard, our CTO, also deserves a special shout out. Over the past year he has really evolved into a great leader and invaluable asset to this company. He has put in long hours, not just towards Linkscape, but towards rebuilding the site, managing the dev team, answering site support questions, fixing tools, and providing solid input on various projects.


Also, huge thanks to all of our beta testers. We had tons of people volunteer their time to either physically come in and test the tool at our office or sign up as remote beta testers. You all provided us with fantastic feedback to make Linkscape better, stronger, faster. We couldn't have released our groundbreaking new tool in its current state without your help!


I'd also like to extend a big thank you to the families, friends, husbands, wives, girlfriends and boyfriends of the team here at SEOmoz. I know you haven't seen much of them over the past few months, and I hope that you're as proud of their accomplishments as I am. I can't promise things will slow down immediately, but we'll try to be a little less demanding of their time in the months to come.


Lastly, a debt of gratitude to Mystery Guest (whom I married just 3 weeks ago). Her constant support (she even edited this post at 1am) and unequivocal forgiveness of my every late night, nearly complete absence from the wedding planning process and substantial unavailability have been inspiring. I'm a very lucky guy. 


BTW - For those hoping to give specific feedback about Linkscape, we've now got an official feedback thread on YOUmoz. You can also always email us - [email protected]. Scott will also be posting three videos on the blog later today that help to explain more about Linkscape (and to help make up for last week's lack of a Whiteboard Friday).

Back to Top

Snag your MozCon video bundle for even more SEO insights.

Read Next

The MozCon 2024 Video Bundle Has Arrived! (Bonus: Our 2023 Videos are FREE!)

The MozCon 2024 Video Bundle Has Arrived! (Bonus: Our 2023 Videos are FREE!)

Jul 24, 2024
That's a Wrap: The MozCon 2024 Day Two Recap

That's a Wrap: The MozCon 2024 Day Two Recap

Jun 05, 2024
Diving Into the Future of Digital Marketing: The MozCon 2024 Day One Recap

Diving Into the Future of Digital Marketing: The MozCon 2024 Day One Recap

Jun 04, 2024

Comments

Please keep your comments TAGFEE by following the community etiquette

Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.