A Better Approach for Filtering Webspam in Google Analytics

This YouMoz entry was submitted by one of our community members. The author’s views are entirely their own (excluding an unlikely case of hypnosis) and may not reflect the views of Moz.

By: Werner Bastianen

January 5, 2016

A Better Approach for Filtering Webspam in Google Analytics

SEO Analytics

This YouMoz entry was submitted by one of our community members. The author’s views are entirely their own (excluding an unlikely case of hypnosis) and may not reflect the views of Moz.

"Don't throw the baby out with the bathwater" is a popular saying that's been around since the 16th century, but is no less relevant today, especially when considered against the backdrop of webspam in Google Analytics. In fact, frustrations caused by the spam issue have led to the loss of genuine data.

Spam in analytics could be the single most irritating thing in online marketing. Numerous blog posts have been written on the topic.

One particular solution consistently surfaces as the fastest way to get rid of spam: Set up one or two filters in your analytics, and you're free of spam forever. This strategy is based on including only valid hostnames to filter out ghost spammers, the most aggressive type of spammers.

Even though implementing this solution it is a seemingly valid option, it is also the most risky one, for you are likely to lose valuable data and insights in the process.

Why is this seemingly valid option risky?

Using the two-filters option is risky because it uses inclusion instead of exclusion; and also because it marks an unset hostname as spam.

Inclusion versus exclusion

Inclusion: only allows data from known genuine sources
Exclusion: only filters data from known spam sources

What's a hostname?

The hostname always tells you which domain your website was visited from.

This can be any (sub)domain you claimed, like www.mydomain.com, mydomain.com, blog.mydomain.com or mydomain.co.uk. However, the hostname could also be the domain of translation, cache, or shopping services like translate.googleusercontent.com or paypal.com.

This strategy is perfect to use in a vacuum. In real life, however, we have seen too many cases where using this strategy could have gone terribly wrong:

Over a span of months or years, you work with multiple people and agencies. They don't always know what was previously set up.
The internet and your business will evolve and more genuine sources will appear. Who will make sure they are always included from day one?
Plus, a minor technical error in your code may cause your hostname to be "not set." This would make your genuine data appear as spam. It wouldn't pass the inclusion filter, and you'd never even know it.

Real-life data needs a real-life solution

With the inclusion strategy any of above real-life scenarios causes you to lose genuine data.

In fact, one of our clients would have deleted all of the brand's conversion data if they'd used the two-filter solution, solely because of a third-party plug-in that was implemented by another agency.

The plug-in created a new session without the hostname data instead of the real session:

AAEAAQAAAAAAAAV-AAAAJGEyOGRlNDcxLTRlMDgt

What's the best alternative?

Only filter spam when you're 100 percent sure it's spam. Working with exclusion has its downsides, of course:

You have to make sure your exclusion filters are always up-to-date with the latest spammers.
You will allow some spam to enter—for instance, visits with an unset hostname that actually are spammers.

Based on the data in our clients' accounts, these spammers account for 0.4 percent, on average, of all traffic.

This means your analytics, on average, would retain 99.6 accuracy without risking losing genuine traffic.

Back to you

So, what's your take on dealing with spam in analytics?

If you're like us, you'd rather filter real spam while lessening the likelihood that real data is included.

A Better Approach for Filtering Webspam in Google Analytics

Table of Contents

A Better Approach for Filtering Webspam in Google Analytics

Why is this seemingly valid option risky?

Inclusion versus exclusion

Real-life data needs a real-life solution

What's the best alternative?

Back to you

With Moz Pro, you have the tools you need to get SEO right — all in one place.

Read Next

Common Analytics Assumptions — Whiteboard Friday

5 Reasons Your Direct Traffic Can Suddenly Drop

Essential Tips for Directional Reporting in GA4 — Whiteboard Friday

Comments

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved