M
Down and Dirty: Write Your Own URL Rewrite
The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.
We all know by this time about the benefits of converting your parameterized URLs to human- and crawler-friendly URLs, but the stock tools of the trade (ISAPI_Rewrite, mod_rewrite, etc.) don't necessarily scale all that well when you have a large number of categories, product pages, etc. I'm going to walk you through what it takes to code this yourself, and I think you'll find it's less scary and complex than you thought, and gives you a number of benefits in terms of ongoing maintenance, flexibility, etc.
Overview
For clarity (and because it's Case Study month at SEOmoz :-) we'll use my honeymoon registry and travel site, www.thebigday.com, for our examples. We group the resorts on our site by destination and sub-destination - e.g., Hawaii -> Maui -> Fairmont Kea Lani. I'll do the examples in classic ASP, but it should be very easy for you to see how to convert my logic to PHP, ASP.NET, etc.
We serve up a hotel page using an ASP page called /Package.asp. It takes 3 numeric parameters, 1 for destination (Hawaii in this example), 1 for the sub-destination (the island of Maui), and 1 for the resort itself (Fairmont Kea Lani). What we really want to show is something more like this:
/TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm
Note that I've worked another important (to me, anyway) key phrase into the URL (travel specials) for SEO purposes. "ISAPI_Rewrite?" you say? Fine, if you have just a handful of categories and product names...and they never (or rarely) change. In this example, our rewrite "rules" are essentially translating between names and ID numbers by looking up either one in the database.
How it's going to do its magic:
Inbound Parameterized Links
You want to 301 redirect
/Package.asp?dest=2&subdest=51&resort=123
to
/TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm
You need a function that creates the readable URL from the parameters:
Function MakeFancySchmancyUrl (nDestID, nSubdestID, nResortID)
Dim sFancyURL
sFancyURL = "/TravelSpecials/" & GetDestName (nDestID) & "-" & GetSubdestName (nSubdestID) & "/" & GetResortName (nResortID) & ".htm"
MakeFancySchmancyUrl = sFancyURL
End Function
- GetDestName(), GetSubdestName(), and GetResortName() are functions you need to write that retrieve the English name of the component given the ID, BUT....you need to do a little "cleanup" on the names (all of them) to make sure you get decent URLs coming out the other end.
Here's an example of a resort name that would behave very badly as part of an URL without cleanup:
St. Regis Princeville, Kaua'i
Essentially you'll want a function that simply removes any non-alpha character in the name and returns the (probably shortened) result, and each of the Getxxxx() functions must do call this on the names they return. Some people have used the technique of also embedding the IDs in the URL as well as the names. While that does simplify the look up process, I'll admit, I do think it reduces the readability of the URL to the user, and doubles the number of "words" in the URL that the search engine might be looking at. To me, it's the equivalent of putting duct tape on your website.
Intra-Site Links
You'll need to go through your site, find all the places that reference your parameterized URL (e.g., /Package.asp) and replace those with a call to MakeFancySchmancyUrl().
Safety net: Keep in mind that if for some reason you miss converting any of your in-site links, the mechanism for 301'ing inbound links will take care of those for you.
Now, your parameterized ASP page is going to be called in two ways:
When someone clicks a link to /TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm, whether it's a link on your site, from a SERP, or from another site, we've got a little magic to perform, as there isn't really a page with this name (or with those folder names, either!).You'll need to create a custom 404 error handler (if you haven't already), and in there, look for these requests and hand them over to the /Package.asp page to show the content.
Here's our example:
On Error Resume Next
Dim iPos, sPageHit, cnList, cmdList, sUserID, rsUserWebPage, sGuestPassword, chTmpRegTypes
PageHit = Trim(Request.QueryString)
iPos = InStr (12, sPageHit, "/", 1)
sPageHit = Right (sPageHit, Len(sPageHit) - iPos)
sPageHit = LCase (sPageHit)
Dim sPageLeaf, iDomainEnd
sPageLeaf = LCase (Trim(Request.QueryString))
iDomainEnd = InStr (sPageLeaf, "thebigday.com")
If (iDomainEnd > 0) Then
sPageLeaf = Mid (sPageLeaf, iDomainEnd + Len (sThisPageDomain))
End If
'See if it's one of our static URLs that needs to be converted:
If (Left(sPageLeaf, 8) = "/travelspecials/") Then
Server.Transfer "/Package.asp"
End If
...
If it's not one of our magical virtual pages, then the logic continues on to actually display a 404 page. Note that Server.Transfer will delegate the responsibility of spitting out the page content to /Package.asp BUT the user will still see the full readable URL in the browser, and the browser will get a nice happy HTTP 200 OK response.
In /Package.asp, you'll need to:
Handling the 404 Handler Bit
In IIS anyway, the 404 handler has the original URL requested in its query string, pre-pended by 404. The full query string for our example would be:
404;http://www.thebigday.com/TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm
So, just look for:
404;http
as follows:
sFullQueryString = LCase (Request.QueryString)
If (Len (sFullQueryString) > 8)
Then
If (Left (sFullQueryString, 8) = "404;http") Then
Call ExtractResortParms (sFullQueryString)
End If
End If
Our function ExtractResortParms() above will parse the query string, pull out the destination name, subdestination name, and resort name, and attempt to look those up in the database.
If anyone would like to actually see what my version of ExtractResortParms() looks like, email me...it's not very exciting, just fun & games with Mid(), Left(), and InStr() etc.
Now, remember that the resort name, etc. in the URL isn't generally going to match what's in the database, as spaces and punctuation will have been stripped out....Fairmont Kea Lani became FairmontKeaLani. So you're not going to be able to do an indexed look up of the name--instead, you'll have to retrieve the whole set of possible names into a record set and walk the record set, running your name cleanup function on each name, THEN see if it matches what you extracted from the URL. If those record sets are going to be very big (say, over 100 records), you'll want to do a little optimization for performance. For us, the list of destinations and sub-destinations are both short enough that we don't worry about this, but for the resort name, we parse the destination and sub-destination first, then retrieve just the list of resorts that match those, which results in a much smaller list. An alternative that's pretty good performance-wise is to add a field to the database table for the "cleaned" name, and simply call your cleanup function in the content management page where you add/edit the content element, then put an index on the new cleaned name column.
301 Redirection Bit
If you didn't see 404;http in the beginning of the query string, then you've probably been linked to using the parameterized URLs and need to 301 to the readable version. "But," you ask, "since you have the parameters now, why not just look up the friggin' content and show it now?" Because, grasshopper, you want any link juice from your old URLs to be carried over to the new readable URL. So, pull the parameters out of the query string.
For example, the link will be something like
/Package.asp?dest=2&subdest=51&resort=123
And the redirect, using that fabulous function you wrote earlier to make your readable URLs:
Response.Status = "301 Moved Permanently"Response.AddHeader "Location", MakeFancySchmancyUrl (nDestID, nSubdestID, nResortID)
Gotchas
If you're renaming products occasionally, you could find yourself leaking link juice here and there...for example, let's say Princeville Resort is renamed to The St. Regis Resort, Princeville, and someone linked to our page a while ago as:
/TravelSpecials/Hawaii-Kauai/PrincevilleResort.htm
Of course, that's gonna get the user a shiny real-life 404 (and no link juice) as there will no longer be any resort found whose name "cleans" to "PrincevilleResort". Two options (your choice will depend on how frequently things get renamed):
If you've spent any time learning how your customers shop, you're well aware that the categorizations of your products that are most logical and convenient to you aren't likely to be the way your customers think about your products, and you've probably already got a number of different ways to group your products,which means that a given product page might appear in a number of different URLs using the above scheme. In our case, we not only group resorts by destination, but also by type of experience and by brand. If this is the case, you're going to need to tell the search engines which version of the rewritten URL is the "main" one, and that the others are really the same page. Time to use the new rel="canonical" trick. In our case, we have our categories (e.g., "all-inclusive", "spas", "luxury", etc.) coded as pseudo-destinations, so what we do is look up the primary destination ID that the resort belongs to and fabricate the URL for that:
<link rel="canonical" href="http://www.thebigday.com<%=MakeFancySchmancyUrl (nDestID, nSubdestID, nResortID)%>">
Conclusion
The above might LOOK like a lot of work, but seriously shouldn't take you more than a day, especially if you ask questions of people like me when you get stuck or confused :-)
Overview
- The core problem: your site uses parameter-happy URLs, but for SEO and user-friendliness you're dreaming of semi-readable URLs instead.
- You've got a lot of content, mostly coming from the database.
- The number of products, categories, subcategories, etc. mean the prospect of trying to create (and maintain!) rules for ISAPI_Rewrite or mod_rewrite makes you run screaming into the woods.
- Existing inbound links: redirecting the old URLs to the new URLs via 301 redirects
- Intra-site links: converting the parameterized URLs to the readable versions everywhere YOU link to them in your own site
- Serving content: when you get a request for the new URL, handling it with the parameterized page invisible to the user (and the search crawler)
- Duplicate content issues
For clarity (and because it's Case Study month at SEOmoz :-) we'll use my honeymoon registry and travel site, www.thebigday.com, for our examples. We group the resorts on our site by destination and sub-destination - e.g., Hawaii -> Maui -> Fairmont Kea Lani. I'll do the examples in classic ASP, but it should be very easy for you to see how to convert my logic to PHP, ASP.NET, etc.
We serve up a hotel page using an ASP page called /Package.asp. It takes 3 numeric parameters, 1 for destination (Hawaii in this example), 1 for the sub-destination (the island of Maui), and 1 for the resort itself (Fairmont Kea Lani). What we really want to show is something more like this:
/TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm
Note that I've worked another important (to me, anyway) key phrase into the URL (travel specials) for SEO purposes. "ISAPI_Rewrite?" you say? Fine, if you have just a handful of categories and product names...and they never (or rarely) change. In this example, our rewrite "rules" are essentially translating between names and ID numbers by looking up either one in the database.
How it's going to do its magic:
Inbound Parameterized Links
You want to 301 redirect
/Package.asp?dest=2&subdest=51&resort=123
to
/TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm
You need a function that creates the readable URL from the parameters:
Function MakeFancySchmancyUrl (nDestID, nSubdestID, nResortID)
Dim sFancyURL
sFancyURL = "/TravelSpecials/" & GetDestName (nDestID) & "-" & GetSubdestName (nSubdestID) & "/" & GetResortName (nResortID) & ".htm"
MakeFancySchmancyUrl = sFancyURL
End Function
- GetDestName(), GetSubdestName(), and GetResortName() are functions you need to write that retrieve the English name of the component given the ID, BUT....you need to do a little "cleanup" on the names (all of them) to make sure you get decent URLs coming out the other end.
Here's an example of a resort name that would behave very badly as part of an URL without cleanup:
St. Regis Princeville, Kaua'i
Essentially you'll want a function that simply removes any non-alpha character in the name and returns the (probably shortened) result, and each of the Getxxxx() functions must do call this on the names they return. Some people have used the technique of also embedding the IDs in the URL as well as the names. While that does simplify the look up process, I'll admit, I do think it reduces the readability of the URL to the user, and doubles the number of "words" in the URL that the search engine might be looking at. To me, it's the equivalent of putting duct tape on your website.
Intra-Site Links
You'll need to go through your site, find all the places that reference your parameterized URL (e.g., /Package.asp) and replace those with a call to MakeFancySchmancyUrl().
Safety net: Keep in mind that if for some reason you miss converting any of your in-site links, the mechanism for 301'ing inbound links will take care of those for you.
Now, your parameterized ASP page is going to be called in two ways:
- By users or search engine (in which case they need to 301 to the readable URL)
- By your 404 handler (next topic, don't worry!), in which case you DO NOT want to redirect...you want to follow through the logic on that page to actually produce the HTML content
When someone clicks a link to /TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm, whether it's a link on your site, from a SERP, or from another site, we've got a little magic to perform, as there isn't really a page with this name (or with those folder names, either!).You'll need to create a custom 404 error handler (if you haven't already), and in there, look for these requests and hand them over to the /Package.asp page to show the content.
Here's our example:
On Error Resume Next
Dim iPos, sPageHit, cnList, cmdList, sUserID, rsUserWebPage, sGuestPassword, chTmpRegTypes
PageHit = Trim(Request.QueryString)
iPos = InStr (12, sPageHit, "/", 1)
sPageHit = Right (sPageHit, Len(sPageHit) - iPos)
sPageHit = LCase (sPageHit)
Dim sPageLeaf, iDomainEnd
sPageLeaf = LCase (Trim(Request.QueryString))
iDomainEnd = InStr (sPageLeaf, "thebigday.com")
If (iDomainEnd > 0) Then
sPageLeaf = Mid (sPageLeaf, iDomainEnd + Len (sThisPageDomain))
End If
'See if it's one of our static URLs that needs to be converted:
If (Left(sPageLeaf, 8) = "/travelspecials/") Then
Server.Transfer "/Package.asp"
End If
...
If it's not one of our magical virtual pages, then the logic continues on to actually display a 404 page. Note that Server.Transfer will delegate the responsibility of spitting out the page content to /Package.asp BUT the user will still see the full readable URL in the browser, and the browser will get a nice happy HTTP 200 OK response.
In /Package.asp, you'll need to:
- Parse out the destination, sub-destination, and resort name
- Look up each in the database and get the parameter equivalent
- Fetch whatever data from the database you need for the destination, sub-destination, and resort to display the content on the page
Handling the 404 Handler Bit
In IIS anyway, the 404 handler has the original URL requested in its query string, pre-pended by 404. The full query string for our example would be:
404;http://www.thebigday.com/TravelSpecials/Hawaii-Maui/TheFairmontKeaLaniMaui.htm
So, just look for:
404;http
as follows:
sFullQueryString = LCase (Request.QueryString)
If (Len (sFullQueryString) > 8)
Then
If (Left (sFullQueryString, 8) = "404;http") Then
Call ExtractResortParms (sFullQueryString)
End If
End If
Our function ExtractResortParms() above will parse the query string, pull out the destination name, subdestination name, and resort name, and attempt to look those up in the database.
If anyone would like to actually see what my version of ExtractResortParms() looks like, email me...it's not very exciting, just fun & games with Mid(), Left(), and InStr() etc.
Now, remember that the resort name, etc. in the URL isn't generally going to match what's in the database, as spaces and punctuation will have been stripped out....Fairmont Kea Lani became FairmontKeaLani. So you're not going to be able to do an indexed look up of the name--instead, you'll have to retrieve the whole set of possible names into a record set and walk the record set, running your name cleanup function on each name, THEN see if it matches what you extracted from the URL. If those record sets are going to be very big (say, over 100 records), you'll want to do a little optimization for performance. For us, the list of destinations and sub-destinations are both short enough that we don't worry about this, but for the resort name, we parse the destination and sub-destination first, then retrieve just the list of resorts that match those, which results in a much smaller list. An alternative that's pretty good performance-wise is to add a field to the database table for the "cleaned" name, and simply call your cleanup function in the content management page where you add/edit the content element, then put an index on the new cleaned name column.
301 Redirection Bit
If you didn't see 404;http in the beginning of the query string, then you've probably been linked to using the parameterized URLs and need to 301 to the readable version. "But," you ask, "since you have the parameters now, why not just look up the friggin' content and show it now?" Because, grasshopper, you want any link juice from your old URLs to be carried over to the new readable URL. So, pull the parameters out of the query string.
For example, the link will be something like
/Package.asp?dest=2&subdest=51&resort=123
And the redirect, using that fabulous function you wrote earlier to make your readable URLs:
Response.Status = "301 Moved Permanently"Response.AddHeader "Location", MakeFancySchmancyUrl (nDestID, nSubdestID, nResortID)
Gotchas
If you're renaming products occasionally, you could find yourself leaking link juice here and there...for example, let's say Princeville Resort is renamed to The St. Regis Resort, Princeville, and someone linked to our page a while ago as:
/TravelSpecials/Hawaii-Kauai/PrincevilleResort.htm
Of course, that's gonna get the user a shiny real-life 404 (and no link juice) as there will no longer be any resort found whose name "cleans" to "PrincevilleResort". Two options (your choice will depend on how frequently things get renamed):
- If they're few and far between, you can add a few manual 301's in your 404 handler.
- You can create a table of resort name history, and each time your content management code changes the resort name, add a record to this table.Then, if your resort page handler doesn't find a match for the name, it looks up the cleaned name requested in this table.
If you've spent any time learning how your customers shop, you're well aware that the categorizations of your products that are most logical and convenient to you aren't likely to be the way your customers think about your products, and you've probably already got a number of different ways to group your products,which means that a given product page might appear in a number of different URLs using the above scheme. In our case, we not only group resorts by destination, but also by type of experience and by brand. If this is the case, you're going to need to tell the search engines which version of the rewritten URL is the "main" one, and that the others are really the same page. Time to use the new rel="canonical" trick. In our case, we have our categories (e.g., "all-inclusive", "spas", "luxury", etc.) coded as pseudo-destinations, so what we do is look up the primary destination ID that the resort belongs to and fabricate the URL for that:
<link rel="canonical" href="http://www.thebigday.com<%=MakeFancySchmancyUrl (nDestID, nSubdestID, nResortID)%>">
Conclusion
The above might LOOK like a lot of work, but seriously shouldn't take you more than a day, especially if you ask questions of people like me when you get stuck or confused :-)
Comments
Please keep your comments TAGFEE by following the community etiquette
Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.