Correlation Data for SEO and Social Media Analysis - Part 1
One of the most helpful aids in doing SEO is knowing what factors actually affect your rankings. It seems obvious on its face, but not everyone prioritizes their SEO work with the knowledge of how changes to a site and link profile actually affect the SERPs. It's important to at least have some heuristics to use in pursuit of higher rankings, and while it's not always easy, it is possible to correlate optimization techniques with positive (or negative) movement in SERPs.
SEOmoz tries to establish these correlations in our bi-annual Search Engine Ranking Factors project by running tests and consulting with professional SEOs; for instance, in 2009, we discovered that keyword-focused anchor text from external links was highly correlated with positive rankings (we're currently working on a new iteration of the report for 2011, so keep your eyes peeled!). As you probably know, and as Rand spends a little time explaining in this week's Whiteboard Friday, correlation is not causation. That being said, correlations are still important and useful information! In today's post, Rand begins a two-part series on how to use correlation data in your SEO and social media research.
Video Transcription
Howdy, SEOmoz fans. Welcome to another edition of Whiteboard Friday. Today we've got a great topic, really exciting topic. We're going to be talking about correlation data and how you can use it in SEO and social media analysis.
A lot of you already use correlation data, and many of you have probably seen on our blog and in our early release of data from our Ranking Factors Survey this year that we're going to be presenting a bunch of correlation data. We've done this a number of times over the past few years. It's always very, very interesting to look at. Sometimes it's potentially controversial because some of the data is really interesting and surprising.
In this case, correlation data is something that I know many folks in the SEO field and the social media field who don't have a substantive background in statistics, like me. I mean, let's face it. I think I got a D in my statistics class in college, and I'm pretty sure I skipped the last three classes and didn't even go to the final because I was sure I was going to fail. Then somehow I skated by. But that's beside the point. I dropped out of college anyway, so I didn't need that D.
Correlation, it sounds like a big fancy word, but it's actually really, really simple. It's essentially the degree to which one metric, a predictor, has a correlation, a connection to another. Let me give you a couple of really simple examples just so you can understand this, and then we'll talk about some ways to use it in SEO and social.
Let's imagine, for a second, that you are a contractor. You're doing some content writing and you bill the companies that you write for on an hourly basis. You do ten hours of labor, at $10 an hour, let's say, and you get $100. The hours billed and the dollars received have a very, very good correlation. In fact, they'd have a correlation of 1.0 hopefully. Hopefully, you don't bill some hours and then people don't pay or they pay you more hours than you billed. Maybe those things will happen, but usually it's 1 to 1 correlation. It's 1.0 is the correlation. Remember all correlation numbers, at least statistically speaking, from a math perspective are between 0 and 1 positive correlations, at least. Then we do have negative correlations as well. We're not going to worry about those for a sec.
In the dollars received, hours billed, that's a perfect correlation. You see I billed for 1 hour, I got $10. I bill for 2 hours, I got $20. I bill for 3 hours, so perfect linear, nice 1.0 correlation. You can imagine there are lots of systems, simple systems that function like this. For example, the number of steps that you take and the distance that you travel. Those have sort of a perfect, nice correlation.
Then there's stuff that has a correlation but the correlation might not be as perfectly predictive, and we want to have numbers around what those correlations are. Here's a pretty simple example. This is the number of days that I wear yellow shoes. You can see I'm not wearing them today. But number of days that I wear yellow shoes and the number of days where I give a professional presentation. Oftentimes, these are quite connected. But it turns out there are also days where I wear the yellow shoes and I don't give a presentation, or where I give a presentation but I don't wear the yellow shoes. Those things do happen. It's not a requirement that every time I get up on stage I have to wear yellow shoes, but it happens a lot.
So we can map those. We can say, oh, well, there were five days where Rand wore yellow shoes and all five of those days he gave a presentation. Then a couple of days later, oh, you know what? Rand wore the yellow shoes just around town. He was breaking in a new pair. So there are a couple of more days where he wore them, but only one more day where he gave a presentation. So we get a little chart like this.
What correlation scores can do is they can help give a number like 0.7 to the connection between these two numbers. You can sort of say, huh, well, there's a good correlation between them, but it's not certain that every time Rand wears yellow shoes, he's giving a presentation, or every time he gives a presentation, he wears yellow shoes.
That's exactly what these numbers are designed to predict. Now, in really simple scenarios like this, a correlation score of 0.7, that's relatively high. But we'd actually need quite a few data points to be able to predict something called "standard error." So, standard error tells us the degree to which we're certain that these two things are connected.
If we have a standard error of let's say .25, that might be a pretty high standard error because we only have a few data points. That means that there's potentially a lot of fluctuation. This could be a much lower correlation than we think it is or a much higher one, depending. But if we got thousands of data points, if we had every data point around when I wore yellow shoes, every data point around when I've given a professional presentation, this standard error might drop dramatically to let's say .05. Now, we can be more certain that, oh, yeah, there's clearly a connection there, and with a little bit of fluctuation, we know pretty much what the correlation number is. So we can predict how often when Rand gives a presentation he's going to be wearing yellow shoes based on an average of previous data. That's what this is designed to tell us. That's exactly what correlation can be used for.
Let's talk about some ways to use correlation data in your SEO and your social media campaigns. First off, in a lot of the cases, you don't actually need a huge data set. Let's talk first about ways that you're probably already using correlation data, which is with individual data points. These are things where you gather, you look at search results, or you look at how you perform in social media. You look at how other people are doing. You form correlations in your mind. Like, boy, you know what, every time I see someone write a top ten list about something, that seems to get a lot of links and a lot of retweets and a lot of attention. It seems like top X lists are a really good way to produce content. People really like these top ten lists, or top X lists. Maybe that's a good way to go. That type of data point connection in your own mind is correlation. It's something where you're connecting these things seem to predict success, and so I am going to potentially imitate them and see if they predict success for me.
That's actually a fine thing to do. You could do something, like, hmm, it seems like when I have a tweet with a link that gets higher click-through rate, it also gets more retweets. So if I can figure out the formula to get one of these, chances are I'll do well with both of them. I'm going to work on my click-through rate. I'm going to work on things that predict higher click-through rate. I'm going to get those short punchy titles. I'm going to get a good URL shortener. I'm going to keep the . . . whatever it is that the format of the tweet that you send that gets one of these is, you can generally predict you'll get the other one, maybe in some cases.
This doesn't necessarily apply to everyone. A lot of the time it's just your personal experience, and that's a fine thing to use. Facebook shares, you might notice that in your Facebook account, when you share content that has a picture of a human face. So, it's got a little, oh, look, there's a nice picture of Rand. I appear quite "stick figurey" today. Yes, I draw like a second grader. It's weird that I do Whiteboard Friday. Facebook shares that have a human face as the thumbnail get more clicks. You think to yourself, huh, all right, maybe I need to start using more human faces in the thumbnail of what I put on Facebook, you know, the image that you choose when you share content on there. That might be a fine thing to discover. You could use that from an intuition basis, or you could actually measure it. You could go back through your account and look at all the click-through rates that you've earned, if you're using an URL tracker or shortener like bit.ly. Then you could see is this really the case? Put the numbers into Excel and run the data, see on average how you're performing. It's a pretty simple way to do things.
You might also notice something like an observational notice. Links with keywords in the anchor text provide more of a rankings boost in Google. When you get links, external links, and they contain the keyword you're targeting somewhere in the anchor text, then you get more rankings boost. So, you think to yourself, huh, anchor text. That must be a powerful signal. I'm going to start trying to do that. When I get anchor text on other sites, maybe I'm going to put it in my bio, so when people link to me, they'll use that particular keyword and pointing to the pages that I want.
This observational correlation is something that SEOs and social media marketers and digital marketers of all stripes have used for ages. They've used forever, this observational type of correlation. But there's cool stuff that you can do on a research basis that we call sort of aggregated or average correlations that produces lots of really interesting stuff too. I'll give you some examples of those.
So, over at HubSpot, their social media scientist, Dan Zarrella produces something called the "Science of Retweets," talking about how retweets are spread over the Web and what correlates well with things getting more retweets versus less retweets. He also does one that's great on the science of timing, talking about when is the best time to tweet or produce a blog post.
This correlation type of data is used all over the place, in tons and tons of different fields, definitely in digital marketing. We do some cool stuff here at SEOmoz where we collect hundreds or thousands of data points to be able to show aggregate or average correlation with two different metrics.
So for example, in our recent survey, we collected 10,000 different search results. The reason we collect such a high one remember is because we want that low, low standard error that comes from having a lot of data. So, we collect 10,000 and then we see, oh, how do tweets correlate with higher or lower rankings in Google? How do Facebook shares correlate with higher or lower rankings in Google?
You can see, actually, that some of the interesting things we've noticed from collecting this type of data is that, hmm, keywords in the alt attribute of an image, for example, predict higher average correlation than using the keyword in the H1. So a lot of SEOs tell you, oh, you know, that H1 tag, that's a really important tag. You've got to get the keywords in the H1 tag, got to have H1s on every page.
Looks to us like the correlation with H1s, keywords in the H1 is no better than having the keyword just near, at the very top of the page, which H1s usually predict anyway. So, maybe it's not the H1 that's helping. You don't know. It's correlation data. It's not causation. We don't know for sure that this is what's causing it, but we know that there's a connection numerically between these metrics.
That alt attribute, huh, it looks positive. We never thought, oh, maybe we should recommend that. So, for the last few years, we've been recommending put a good image on the page and make sure your keyword is in there.
You can see we did this with Twitter data. We did a cool study with Twitter data where we looked at a large number of tweets. We said, "What predicts higher click-through rate?" It turned out that shorter tweets produced higher click-through rate. Probably no surprise, right. So instead of using all 140 characters, you only use 60 characters, 80 characters. Looks like more people click on the links in those shorter tweets. That's kind of interesting, kind of cool. Maybe it suggests that when we're writing titles and headlines of things we want people to click, we should make those tweets very short. We looked at putting the link in the tweet at the front of the tweet versus the end of the tweet versus the middle. The middle looks slightly better than the other two.
You can learn all sorts of interesting stuff. This is what's awesome about correlation data. It doesn't necessarily mean it predicts things, but what it does mean is that things that have these features have a higher propensity to do well. So, in some cases, at least for me, I care a lot less about whether there's causation there. I do care, but I care much less about the causation than the raw correlation.
The reason I'm so interested in the correlation is because it says things that have this feature do better or worse. So, whether that's the cause of them or not, I like to imitate the things that do better and not imitate the things that do worse. I don't know whether it's directly causation or whether it's a second order effect or a tertiary effect or just some fragment of an effect. It doesn't matter to me. I want to look like the people who are successful. I want to do what successful people do, and that's what correlation data is so good for.
So, in part two, next week, we're going to talk about some really cool stuff that we found with correlation data and give you some ideas of where we're going in the next phases. Take care everyone.
Video transcription by SpeechPad.com
Comments
Please keep your comments TAGFEE by following the community etiquette
Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.