Combining SEO Data to Make Smarter Marketing Decisions — Whiteboard Friday
The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.
Click on the whiteboard image above to open a high-resolution version!
Hello, Happy Friday. Today's Whiteboard Friday is all about combining all of your SEO data. So we're talking about all of the different data sources that you have access to: your logs, your crawl data, Search Console, analytics, third-party data like Moz, AdWords, Rank, all of that good stuff.
Why is this a valuable thing to do? Why do people talk about this as if this is something super important for your website?
We're going to go through that, we're going to go through the why, and we're going to talk through the how.
A quick bit of background on myself: I'm Dominic. I run Piped Out. We're a service that helps SEO and marketing teams handle their data problems, and a lot of what we do involves joining these things together. So that's my sort of background for talking to you today, and hopefully, what I can do is inspire you to do this for your own website.
We're going to discuss six hypothetical problems, showing how each is easier when we combine our data sources and how each gives us something.
Why join your data together?
And the two big things that we get, the two single biggest reasons to join your data together, number one is priority. It's going to change the order that you do things. Whatever you might be doing, you have a limited amount of time. That's often the biggest constraint in resourcing some of these things. And joining your data sources together is going to change the priority, so change the order that you take these decisions.
Secondly, combined metrics. There are just some things that cannot be done unless you join two things together.
Priority
So let's do priority first because this is the big one.
Canonicals
Firstly, we're going to take things off in the canonicals. Suppose that we have some canonical problems across our website. Let's say we run a big e-commerce site and we have some canonical problems on our category pages, on our product pages.
How do we decide the priority in which they should be solved? We can raise them as individual tickets. We certainly shouldn't raise them all in one go.
How do we segment them? The easiest answer for many technical problems is traffic, which is to join it with your traffic data. By joining in with, for example, either the organic sessions or the conversion data that you're going to get from analytics, you can immediately weight these. You can't necessarily weigh how likely Google is to fix your canonical, but because you don't know that, weighting by the traffic that those pages and templates receive is going to be a pretty good way to get some prioritization data into the tickets that you're making.
So that's the first one. We join our technical things, and we join it with some source of traffic in order to get some prioritization.
Picking your content
The second one is probably one that people are most common with, which is when you're sitting there, and you're writing content, you need to pick which content piece you're going to go do next, and often the way we do that is we join with third-party metrics. We go and join, and we get things like Search Volume, and we get things like Difficulty that you'd find in Moz. That sort of stuff you're going to pull into here, and it's going to help you prioritize.
But you can also pull in other parts when you're sitting there and going through. You can quite easily pull in rank where, for example, you could see what the featured snippets look like and what the search features that appear on those particular keywords look like, and that might cause you to change your priority for what you're going for.
You could pull in AdWords, and that's going to, again, give you another different sense of, essentially, if your business is spending a huge amount of money on a keyword, you might be more likely to write some content for it.
Try Moz's brand new API endpoints
Combine SEO data to gain better insights into your website's performance and be one of the first to gain an advantage with Moz's new API endpoints.
Technical problems
Hopping on to a third example, suppose you've got another technical problem. This time, we've got a dropdown on our website, a language dropdown, and we're fairly certain from crawling the website that it's creating an infinite loop screen with an infinite number of pages.
But how important is this? Obviously, an infinite number of pages sounds very bad, and we'd certainly slap it a big old A in a tech audit.
But if we join with another piece of data, for example, logs, we can see what Google is actually doing. And at that point, we might discover that yes, this is an immediate problem. Google has already found and crawled and is currently crawling a lot of these pages.
Or we might just discover that no, actually, it's really not visited any of them at all, and this is a problem, but it's maybe not a problem we have to solve right now. We've got a bit of a window. It doesn't need to go into this sprint or the next immediate one.
As you can see, all of these things change in order as we add in these other data sources.
Removing pages
Let's speed through a couple of other ones here. So, suppose we're removing pages from our website. We're trying to decide which pages should be removed.
Again, we join back with analytics, but we don't just pull in organic sessions, for example, because if you're trying to decide which pages to remove, that's sort of the loaded bit when it comes to stripping down and thinning out a website. Remove could mean no index. It could mean 404. It could mean redirect to another page.
Whether or not and how you should behave to those things partly matters to organic traffic, but also all traffic matters. I've absolutely seen people go, "Yes, this page should be 404'ed because it gets no organic sessions." It doesn't get any organic sessions, but the email team keeps sending traffic to it, so they absolutely shouldn't 404 it. It should be no index or the equivalent.
By joining in multiple different metrics that you're going to get from one source, it's going to, again, give you some insight into the prioritization of it and how quickly you should go about doing it.
Ranking fluctuations
Take on one in the middle. The big thing that you get with Google Search Console (Search Console) is you get all of your keywords. The big thing that you get with rank is you get a far richer SERP model than you would get just from Search Console, which gives us a very reduced SERP model.
However, you have to pay for rank data, and you don't have to pay for Search Console data. This typically means most people track a subset of their ranks, and they track they have all of their keyword data or as much as Search Console will give you over in Search Console.
Where this falls down is when you're trying to look at things, perhaps you're looking into your Search Console, and you're going, "Okay, great, I know that we lost traffic, but I don't quite know why, and I'm struggling to unpick how rank is being put through in Search Console."
If we've joined in with our rank tracking data here, we're immediately going to get all that information about the SERP feature, for example, that are popping up. And because all of those are just naturally rolled up into rank and Search Console, it's super illuminating to have joined your rank tracking data with your Search Console data.
And you're not going to have tracked all of them, but you're going to have tracked a representative sample. And you can say, "Okay, perhaps all of this entire section of the product has gone down, but we can see at least on 100 of those examples." That is because Google has gone and changed the product snippet layout as it did relatively recently in the US, and that has caused this other one to happen.
Paid vs. organic
On to our final example here, joining together your AdWords with Search Console can also be super valuable, and joining your AdWords and your Search Console and your rank, these are all things that play in the keyword world, but they give you different bits of data.
So again, your rank gives us that rich SERP data, and we can get far better things and things we can use to model click-through rate better. We could get rank, but we could also get pixel height.
And then we've got AdWords, which tells us how much our business is spending. At a really basic level, we can just go, "Okay, great, we don't rank very well for this keyword, but we are spending a ton of money on it. So let's go refocus our effort. Let's have a little sit-down with the team and try to work out how we can bump those keywords up." Again, it's changing our prioritization.
But we can even do more beyond that. Once we have all that data, we can start to change this in other ways. We can say, "Okay, great, we've got AdWords data and we've got our Search Console data. We can find the difference where we're performing very well organically." But maybe we're still spending a ton of AdWords money and go, "Okay, can we lower the number of ads that we're putting there and instead spend that ad money just somewhere else on other keywords where we don't organically rank as well because we'll probably pick up more of that than we will relative to other keywords?"
So that's priority.
Combined metrics
The final one is combined metrics. That's the other big thing. Sometimes, you literally just can't get the answer without joining two things together.
UA and GA4
The one that many people will have bumped into here are UA and GA4. So obviously, Universal Analytics sunset last year. And if you want to have year-on-year traffic, if you want to have the year-on-year numbers for forecasting, you basically have to join the two of them together unless you were really, really on top of it and set up GA4 early, but in my experience that has not been the case. That's a case where you cannot get that number unless you don't go and join those two together.
Indexation
Another example here is indexation. There is no exact way; Google doesn't give you a way to go get all of the indexation for your website. It gives you small ways that you can go and get chunks of this data.
You can use the API that they give you, but you get 2,000 pages a day. You can crawl your website, but it doesn't technically tell if Google's indexed. It just tells it if you are saying to Google, I should index this.
You could pay to rank track every single keyword for every single page to check if your pages are indexed, but that's going to be quite expensive.
So we often kind of go, "Okay, how can we put together this sort of metric that's very important, indexation, but how can we put it together?" We can get the best version from joining all of these data sources together.
By pulling in our logs, we can say, "Okay, if a page has been crawled one time, it's probably not indexed," I would say from our experience. But when you're getting up to sort of three times, it's a pretty decent shout at that point that this page is indexed unless, of course, we've got a no-index tag on it. So, by joining the two of these together, we've made our indexation metric a little bit more accurate.
For those ones that are really important, we can run the Search Console Indexation API, and we can override these ones, and we can end up with this combined number of indexation that is far more accurate than any of these could do individually unless we pay to check every single page every single time we want to know this, which we could also do. We could just throw money at the problem.
So those are the big whys. That's the big thing that you get from joining these. The biggest one you get is priority.
It will change the order you do everything. It will also mean that more stuff gets done in the business because businesses, at the end of the day, care about numbers, and being able to put numbers and values on things just makes people take you more seriously.
The second one is combined metrics. Sometimes, you just cannot do something with one number unless you spend a ton of money, and even then, it's just very hard.
How do you do this?
Let us then step over a little bit and look at the how. How do you do this? So this is going to be very top level, so full apologies for this in advance. But the specifics of this depend quite a lot on which of these data sources you're doing.
Generically, the process that we're running is we're extracting data and then we're joining data. That's the two big things. We're taking data out of somewhere, so we've got multiple bits of it together, and then we're going to smush those two things together.
Joining is something like this. So we might say, "Okay, in our Search Console, we have a value of GBR, and in our analytics, we have a value of United Kingdom." These are the same thing. So that's where we're going to join these two pieces of data. Or desktop equals desktop; sometimes, it'll just be the same value.
That's the process of joining. Again, very top-level. So yes, we're going to extract and we're going to join.
How do things rank up in terms of options here?
UI
So UI for most tools, sometimes the UI is okay, but for a lot of these, particularly sort of SEO-focused things, things like Search Console, getting stuff out of analytics, things like that, the UI is not great.
It will give you some pretty hard limitations on how much you can get out of it. On average, I'd say it ranks about a D. Some things are going to be easier. Obviously, getting your Screaming Frog data out is not a D, but again, a lot of these are really, really quite painful. In terms of blending, you literally can't blend here, so it's going to get an F. In terms of setup, it's an A because those tools just exist and are ready, and you can just go and use them.
The only real blocker here is learning how to use it.
Looker Studio
Looker Studio is the next big place that I see a lot of people go. This gets up to a B because often these places which limit how much data you can get, things like your Search Console, your analytics, your AdWords; sometimes, the Looker Studio Connector will give you all of the data when you click that triple dot and export in the top right-hand corner of your table. You will just get all of it for the date period, bearing in mind sampling cardinality and a couple of things like that.
Either way, you get vastly more than you get from the UI. So we're bumping ourselves up to a B. It doesn't work for everything. It's not going to work for your logs. But for a lot of these sources, this is at least going to get you something. It also has blend functionality into it. The blend functionality is not great, and it's a little bit complicated sometimes.
It's also quite slow. But if you have no other ways to blend, this is a way to blend. The setup, honestly, is not too painful in most of these cases. There's a little bit of knowledge needed to get set up with Looker Studio, and there's a little bit of things that can catch you out, but by and large, you'll probably be okay.
Excel
What this brings us, naturally, is to the queen of how you should be doing all these things, which is you're just downloading them, putting them through Excel and VLOOKUPs.
That is, from what I've said, certainly, for a good 70% of my career while I was agency side, this is how people did everything. There are absolutely other ways of doing it, but this is the gold standard.
When all of these fail and you can't figure out this and you're not sure how to blend it, download it, put it in Excel, and join it with the VLOOKUP.
The one final thing that we want to just run through here is the ideal way of doing this.
A lot of these data sources are quite big, and trying to join them, again, can be quite a fiddly process. There are a lot of things to watch out for, for example, stuff like this where it's like, okay, yes, Search Console will give you a country exported in a three-letter country code. By default, if you have just set up your report with country, as many people do on analytics, they will have the name of the country.
Obviously, these are not going to perfectly line up, and your join won't work.
Big Query
The ideal place to be doing a lot of this stuff and to deal with the size and the scope of this data in many of these problems is a data warehouse. So this is something like BigQuery, Redshift, Snowflake, all of that sort of stuff.
This is fantastic for extracting and putting data into. Once you've got data into it, getting out of it is very, very easy.
It is also brilliant for blending and joining on things. There is a bit of an initial hurdle to get over, you have to learn SQL.
On the plus side, things like ChatGPT have made that far more accessible than it ever used to be. And in terms of functionality, it's fantastic.
However, and if we just temporarily lean out of the camera shot, and then come back, it gets a D or C for the setup because this thing is not easy to set up. And we're also kind of rolling in there the difficulties around learning some of the SQL.
You get huge benefits from getting down to this tier, and eventually, you're going to want to end up here because it's probably where your internal business data is as well.
But it is absolutely harder, make no mistake. The reason I give it a D/C is many of the Google sources, your Search Console, your analytics, and your AdWords will have native exporters to BigQuery that are very easy to set up, which really reduces a lot of this setup time.
Wrapping up
So yes, that's our top-level run-through.
Hopefully, you have been convinced by the why, and you've gone, "There are some things that I'd like to get out of this process."
You've looked at some stuff that you've done, and you've gone, "Oh, yes, I prioritized stuff badly in the past. I can do a hell of a better job about it now."
There also might be some biometrics that you want or need to have. When it comes to the how, I just said, generically, extract your data.
Looker Studio is often a good place to go and get that raw extract from. You're blending; you're never going to go wrong with Excel and your VLOOKUPs. But you can do some of it in Looker Studio.
And when you get really serious about this, you want to run down towards a data warehouse. It is harder to set up, but the benefits are far, far more substantial, and you can get some easier setup for your Google sources, your analytics, AdWords, and Search Console because they will natively export to BigQuery.
So, hopefully, that's been helpful and has at least inspired you to some uses.
If you have any questions, throw them over on Twitter.
And yes, we'll see you all next time.
Transcription by Speechpad