Exploring Google's AJAX Crawling
This YouMoz entry was submitted by one of our community members. The author’s views are entirely their own (excluding an unlikely case of hypnosis) and may not reflect the views of Moz.
A while ago I set up an experiment to test Google's support for crawling pages that use AJAX and, because Google took a bit too long to index the page, I forgot all about it. Then, today I found the PHP code I used for the test so checked Google and found the page had been indexed. I thought I'd share the test for anyone who is interested.
If you don't know already, AJAX crawling is all about trying to show search engines content in pages that is loaded asynchronously after the initial page request. An example would be a page that had a widget that shows the current availability for a Hotel, which is added after the main page is loaded to give users a faster initial page load time. Because this content is added dynamically with JavaScript (in particular AJAX), the search engine crawler never sees it. The proposal outlined by Google describes a way to show the page to crawlers with the dynamic AJAX content included.
For the test I created a simple page with a couple of words that didn't have any results in Google. Here is a version of the code (I change the words from my actual test):
To quickly walk through the code - we have a page title containing the word "tttadffgdhffghmgfdsg", which is also the H1 tag for the page. It has a basic meta description and then a the new fragment meta tag, which is set to an exclamation mark (as described in the documentation). This signals to Google that this page is using AJAX so the crawler can make an additional request to see the page with the AJAX content loaded.
The line that is highlighted in pale yellow is a conditional statement that will cause the word "kerpwefhjuergsdfgk" (plus a bit more text) to appear in the body of the page if the page is requested with the parameter _escaped_fragment_ (e.g. mypage.php?_escaped_fragment_=true). This is the parameter that Google will pass to the page when it re-crawls to get the AJAX content.
My page contains absolutely no AJAX code at all, which means that no user will ever see the content within the PHP if block (unless they have a penchant for adding random parameters to their URL request). What I wanted to find out was, could I get my page to rank for the word kerpwefhjuergsdfgk, which was only in the body of the page when requested using the _escaped_fragment_ parameter.
I put the page up on a domain I own and submitted the URL to Google to crawl (no internal or external links to the page). Then I forgot about it. Today I tried doing a search for each of the words (tttadffgdhffghmgfdsg and kerpwefhjuergsdfgk) and found the page was ranking for both.
Remember, I have changed these words for this blog post so I don't mess up the original test (this post will start ranking), and so I have changed the SERPs to replace the actual words I used.
So what does this all mean? Well it means that I can easily create a page with content that is only visible to crawlers (i.e. cloaking) using a technique proposed by Google. I'm sure you can see how this could be abused. It would also be harder to detect than some forms of cloaking (such as different content based on user agent) because it would require crawlers to download the page and execute the JavaScript to compare it with what users really were seeing. If the crawler was going to do this then it might as well execute JavaScript on all pages.
The problem here is that the method proposed by Google (the fragment meta tag and _escaped_fragment_ parameter) doesn't really relate back to how AJAX is implemented in the real world. What you really need is a standard that both crawlers and browsers can understand such that both bots and humans will see the same thing. Until then, I can see potential for a combination of confusion, bespoke code just for bots and people to try to fool search engines with it (not me mind, my white hat is securely fastened).
OK - so this post was a little lacking in the humour department and so I thought to finish it off I'd share with you a genuine CAPTCHA which I saw whilst editing this:
Way to encourage me to contribute to YOUmoz ;-)
Let me know what you think about all this in the comments below. I'd be interested to know if anyone knows of somebody actually using Google's AJAX crawling support and what success they've had.
Comments
Please keep your comments TAGFEE by following the community etiquette
Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.