URL Tagging: Clean Up Your Act
This YouMoz entry was submitted by one of our community members. The author’s views are entirely their own (excluding an unlikely case of hypnosis) and may not reflect the views of Moz.
It was quite by chance I stumbled upon this issue early last year, while working on a project (and to be honest, it was the team at Latitude that picked up on it).
Basically, one of the sites I was working on with them had some serious bloat in search results, and after digging into the situation, the solution was pretty simple. Although I cannot mention the site here because of NDA’s, I can give you an example of a site that suffers the same problem, and hopefully will give you an idea into how to manage the problem for your own/client sites.
Stumbling Across
Orange has run a pretty solid offline/online campaign to promote its RockCorps campaign. Basically, the offline media was encouraging search engine usage to get to the site.
However, the campaign was executed before the site gained natural search positions and even as of now it's at position 2, being led by “I am Bored” (an interesting URL -- must have had a lot of free traffic as a result!). This obviously led to some real criticism of the campaign.
Photo Credit: http://trashmarketing.wordpress.com/2008/09/05/i-am-orange/
But that’s not the issue I would like to discuss – out of curiosity I looked into back links to the site, and was scrawling across the thousands of footer wide links the site has (using the anchor text link “I am”, of course) when I stumbled across an interesting URL:
shop.orange.co.uk/shop/index?linkfrom=time&link=header_orange_shop
The bold part of the URL is simply an analytics tag that is applied to URLs (tracking IDs) in order to track movement or clicks or whatever the end objective may be. The tag is usually applied to links on the site that is being tracked (therefore not to the actual page). Not a problem really, except these generate duplicate pages in the SERPs.
See for example:
171, 000 results for site:orange.co.uk , which is great, but a query for site:orange.co.uk ?linkfrom shows 16,200 results.
That’s an almost 10% bloat of indexed pages, and duplicate content that really should be excluded from search results. See Google Webmaster Central's explanation.
How can URL parameters, like session IDs or tracking IDs, cause duplicate content? When user and/or tracking information is stored through URL parameters, duplicate content can arise because the same page is accessible through numerous URLs.
Note:
Analytics packages aren’t the only ones that use tags on URLs -- affiliate networks do as well as session IDs, and I would suggest the same treatment for those parameters. There are also many other ways that your site may issue duplicate pages without your realising it. An old post by Adam Lasnik is a pretty good summary of some traditional problems.
A recent post by Google Webmaster Central kind of insists that you leave session ID’s, etc. on there and that there are no problems with dynamic URLs; however, I don’t believe that they are looking at all the angles here. The fact is yes, they may rank the canon URL, but that doesn’t seem to stop them from including the duplicate pages in the index.
Why is that a problem?
What wasn’t clear in what Google said about dynamic URLs was whether, when discarding duplicates, the strength of that link was channeled back into what it saw as the primary URL. Well, if you buy into PageRank sculpting and link juice distribution (and I do) then this simply means that the duplicate pages are diluting the strength of the site.
And I am not the only one who doesn’t buy into it. In fact, the article goes ahead to point out another Google Webmaster Central post, and I will highlight the suggested “fix” for getting rid of these duplicate URLs:
- One fix is to eliminate whole categories of dynamically generated links using your robots.txt file.
- Another option is to block those problematic links with a "nofollow" link attribute.
I may be ending this on a very tame note, but technical posts aren’t my forte. In fact, if you spot inaccuracies please point them out. Also please feel free to add further resources in the comments below.
If you would like to know more about page sculpting, see Rand’s recent post on it: PageRank Sculpting: Parsing the Value and Potential Benefits of Sculpting PR with Nofollow
If Rand doesn’t satisfy you, then head over to one of my other fav writers, Michael Gray (which, although he isn’t technical in nature, gives a pretty good analogy): NoFollow and PageRank Sculpting: Is It Worth the Effort?
If you would like to follow my ramblings, haunt me on Twitter: www.twitter.com/rishil
Comments
Please keep your comments TAGFEE by following the community etiquette
Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.