August 24, 2015

User Behaviour Data as a Ranking Signal

Advanced SEO | Search Engines | User Experience (UX)

Question: How does a search engine interpret user experience?
Answer: They collect and process user behaviour data.

Types of user behaviour data used by search engines include click-through rate (CTR), navigational paths, time, duration, frequency, and type of access.

Click-through rate

Click-through rate analysis is one of the most prominent search quality feedback signals in both commercial and academic information retrieval papers. Both Google and Microsoft have made considerable efforts towards development of mechanisms which help them understand when a page receives higher or lower CTR than expected.

Position bias

CTR values are heavily influenced by position because users are more likely to click on top results. This is called “position bias,” and it’s what makes it difficult to accept that CTR can be a useful ranking signal. The good news is that search engines have numerous ways of dealing with the bias problem. In 2008, Microsoft found that the "cascade model" worked best in bias analysis. Despite slight degradation in confidence for lower-ranking results, it performed really well without any need for training data and it operated parameter-free. The significance of their model is in the fact that it offered a cheap and effective way to handle position bias, making CTR more practical to work with.

Result attractiveness

Good CTR is a relative term. A 30% CTR for a top result in Google wouldn't be a surprise, unless it’s a branded term; then it would be a terrible CTR. Likewise, the same value for a competitive term would be extraordinarily high if nested between “high-gravity” search features (e.g. an answer box, knowledge panel, or local pack).

I've spent five years closely observing CTR data in the context of its dependence on position, snippet quality and special search features. During this time I've come to appreciate the value of knowing when deviation from the norm occurs. In addition to ranking position, consider other elements which may impact the user’s choice to click on a result:

Snippet quality
Perceived relevance
Presence of special search result features
Brand recognition
Personalisation

Practical application

Search result attractiveness is not an abstract academic problem. When done right, CTR studies can provide a lot of value to a modern marketer. Here's a case study where I take advantage of CTR average deviations in my phrase research and page targeting process.

Google's title bolding study

Google is also aware of additional factors that contribute to result attractiveness bias, and they've been busy working on non-position click bias solutions .

They show strong interest in finding ways to improve the effectiveness of CTR-based ranking signals. In addition to solving position bias, Google's engineers have gone one step further by investigating SERP snippet title bolding as a result attractiveness bias factor. I find it interesting that Google recently removed bolding in titles for live search results, likely to eliminate the bias altogether. Their paper highlights the value in further research focused on the bias impact of specific SERP snippet features.

URL access, duration, frequency, and trajectory

Logged click data is not the only useful user behaviour signal. Session duration, for example, is a high-value metric if measured correctly. For example, a user could navigate to a page and leave it idle while they go out for lunch. This is where active user monitoring systems become useful.

There are many assisting user-behaviour signals which, while not indexable, aid measurement of engagement time on pages. This includes various types of interaction via keyboard, mouse, touchpad, tablet, pen, touch screen, and other interfaces.

Google's John Mueller recently explained that user engagement is not a direct ranking signal, and I believe this. Kind of. John said that this type of data (time on page, filling out forms, clicking, etc) doesn't do anything automatically.

At this point in time, we're likely looking at a sandbox model rather than a live listening and reaction system when it comes to the direct influence of user behaviour on a specific page. That said, Google does acknowledge limitations of quality-rater and sandbox-based result evaluation. They’ve recently proposed an active learning system, which would evaluate results on the fly with a more representative sample of their user base.

"Another direction for future work is to incorporate active learning in order to gather a more representative sample of user preferences."

Google's result attractiveness paper was published in 2010. In early 2011, Google released the Panda algorithm. Later that year, Panda went into flux, indicating an implementation of one form of an active learning system. We can expect more of Google's systems to run on their own in the future.

The monitoring engine

Google has designed and patented a system in charge of collecting and processing of user behaviour data. They call it "the monitoring engine", but I don't like that name—it's too long. Maybe they should call it, oh, I don't know... Chrome?

The actual patent describing Google's monitoring engine is a truly dreadful read, so if you're in a rush, you can read my highlights instead.

MetricsService

Let's step away from patents for a minute and observe what's already out there. Chrome's MetricsService is a system in charge of the acquisition and transmission of user log data. Transmitted histograms contain very detailed records of user activities, including opened/closed tabs, fetched URLs, maximized windows, et cetera.

Enter this in Chrome: chrome://histograms/
(Click here for technical details)

Here are a few external links with detailed information about Chrome's MetricsService, reasons and types of data collection, and a full list of histograms.

Use in rankings

Google can process duration data in an eigenvector-like fashion using nodes (URLs), edges (links), and labels (user behaviour data). Page engagement signals, such as session duration value, are used to calculate weights of nodes. Here are the two modes of a simplified graph comprised of three nodes (A, B, C) with time labels attached to each:

In an undirected graph model (undirected edges), the weight of the node A is directly driven by the label value (120 second active session). In a directed graph (directed edges), node A links to node B and C. By doing so, it receives a time-label credit from the nodes it links to.

In plain English, if you link to pages that people spend a lot of time on, Google will add a portion of that “time credit” towards the linking page. This is why linking out to useful, engaging content is a good idea. A “client behavior score” reflects the relative frequency and type of interactions by the user.

What's interesting is that the implicit quality signals of deeper pages also flow up to higher-level pages.

Reasonable surfer model

“Reasonable surfer” is the random surfer's successor. The PageRank dampening factor reflects the original assumption that after each followed link, our imaginary surfer is less likely to click on another random link, resulting in an eventual abandonment of the surfing path. Most search engines today work with a more refined model encompassing a wider variety of influencing factors.

For example, the likelihood of a link being clicked on within a page may depend on:

Position of the link on the page (top, bottom, above/below fold)
Location of the link on the page (menu, sidebar, footer, content area, list)
Size of anchor text
Font size, style, and colour
Topical cluster match
URL characteristics (external/internal, hyphenation, TLD, length, redirect, host)
Image link, size, and aspect ratio
Number of links on page
Words around the link, in title, or headings
Commerciality of anchor text

In addition to perceived importance from on-page signals, a search engine may judge link popularity by observing common user choices. A link on which users click more within a page can carry more weight than the one with less clicks. Google in particular mentions user click behaviour monitoring in the context of balancing out traditional, more manipulative signals (e.g. links).

In the following illustration, we can see two outbound links on the same document (A) pointing to two other documents: (B) and (C). On the left is what would happen in the traditional "random surfer model,” while on the right we have a link which sits on a more prominent location and tends to be a preferred choice by many of the pages' visitors.

This method can be used on a single document or in a wider scope, and is also applicable to both single users (personalisation) and groups (classes) of users determined by language, browsing history, or interests.

Pogo-sticking

One of the most telling signals for a search engine is when users perform a query and quickly bounce back to search results after visiting a page that didn't satisfy their needs. The effect was described and discussed a long time ago, and numerous experiments show its effect in action. That said, many question the validity of SEO experiments largely due to their rather non-scientific execution and general data noise. So, it's nice to know that the effect has been on Google's radar.

Address bar

URL data can include whether a user types a URL into an address field of a web browser, or whether a user accesses a URL by clicking on a hyperlink to another web page or a hyperlink in an email message. So, for example, if users type in the exact URL and hit enter to reach a page, that represents a stronger signal than when visiting the same page after a browser autofill/suggest or clicking on a link.

Typing in full URL (full significance)
Typing in partial URL with auto-fill completion (medium significance)
Following a hyperlink (low significance)

Login pages

Google monitors users and maps their journey as they browse the web. They know when users log into something (e.g. social network) and they know when they end the session by logging out. If a common journey path always starts with a login page, Google will add more significance to the login page in their rankings.

"A login page can start a user on a trajectory, or sequence, of associated pages and may be more significant to the user than the associated pages and, therefore, merit a higher ranking score."

I find this very interesting. In fact, as I write this, we're setting up a login experiment to see if repeated client access and page engagement impacts the search visibility of the page in any way. Readers of this article can access the login test page with username: moz and password: moz123.

The idea behind my experiment is to have all the signals mentioned in this article ticked off:

URL familiarity, direct entry for maximum credit
Triggering frequent and repeated access by our clients
Expected session length of 30-120 seconds
Session length credit up-flow to home page
Interactive elements add to engagement (export, chart interaction, filters)

Combining implicit and traditional ranking signals

Google treats various user-generated data with different degrees of importance. Combining implicit signals such as day of the week, active session duration, visit frequency, or type of article with traditional ranking methods improves reliability of search results.

Impact on SEO

The fact that behaviour signals are on Google's radar stresses the rising importance of user experience optimisation. Our job is to incentivise users to click, engage, convert, and keep coming back. This complex task requires a multidisciplinary mix, including technical, strategic, and creative skills. We're being evaluated by both users and search engines, and everything users do on our pages counts. The evaluation starts at the SERP level and follows users during the whole journey throughout your site.

"Good user experience"

Search visibility will never depend on subjective user experience, but on search engines' interpretation of it. Our most recent research into how people read online shows that users don't react well when facing large quantities of text (this article included) and will often skim content and leave if they can't find answers quickly enough. This type of behaviour may send the wrong signals about your page.

My solution was to present all users with a skeletal content form with supplementary content available on-demand through use of hypotext. As a result, our test page (~5000 words) increased the average time per user from 6 to 12 minutes and bounce rate reduced from 90% to 60%. The very article where we published our findings shows clicks, hovers, and scroll depth activity of double or triple values to the rest of our content. To me, this was convincing enough.

Google's algorithms disagreed, however, devaluing the content not visible on the page by default. Queries contained within unexpanded parts of the page aren't bolded in SERP snippets and currently don't rank as well as pages which copied that same content but made it visible. This is ultimately something Google has to work on, but in the meantime we have to be mindful of this perception gap and make calculated decisions in cases where good user experience doesn't match Google's best practices.

Relevant papers

Ranking documents based on user behavior and/or feature data, Google, 2010
Active Exploration for Learning Rankings from Clickthrough Data, Cornell, 2007
Beyond Position Bias, Google, 2010
An Experimental Comparison of Click Position-Bias Models, Microsoft, 2008
Improving Searcher Models Using Mouse Cursor Activity, Microsoft, 2012
Inferring Search Behaviors Using Partially Observable Markov (POM) Model, Microsoft, 2010
A Dynamic Bayesian Network Click Model for Web Search Ranking, Yahoo!, 2009
Modifying search result ranking based on implicit user feedback and a model of presentation bias, Google, 2015

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

User Behaviour Data as a Ranking Signal

Table of Contents

User Behaviour Data as a Ranking Signal