D
An Exclusive Interview with Farrah Hamid of Hakia.com
This YouMoz entry was submitted by one of our community members. The author’s views are entirely their own (excluding an unlikely case of hypnosis) and may not reflect the views of Moz.
Those of you who know me will be aware that I closely follow a particular area of search technology, semantic search. Back in December I wrote an introduction to semantic search which, amongst getting me my first ever sphinn, seemed to open up a lot of questions in the comments. The feedback made me really want to find out more about the technology behind true semantic algorithms and the implications for SEOs, so I decided the best way would be to go to the source of the technology and ask the experts.
Farrah Hamid, Communications Coordinator at hakia, was kind enough to give me an exclusive interview with their team to find out more about what goes on behind the scenes. I told them to expect some pretty tricky questions, to which they we're only too happy to respond…
1. Hakia was founded in 2004 by Dr. Riza Berkan, a nuclear scientist with a specialisation in artificial intelligence and fuzzy logic, and Dr. Pentti Kouri, an economist and venture capitalist. Can you tell us the story about how hakia was formed, and about the motivations and minds that developed it?
Our motivation dates back to as early as the days of the Star Trek TV show. Like most AI scientists, we couldn’t help but think that tomorrow’s search engines must talk back to you in the way Mr. Spock communicates with the Enterprise’s computer. That “computer” satisfies many of the requirements which are expected from a search engine, like holding vast volumes of information and being able to reply to a query intelligently, perhaps with a soft voice! What is missing today is the fundamental component of “understanding” that is the first step toward human-like interaction. In 2004, we came to a realization that nobody was carrying on this promise forward and the search engines were evolving like fancy slot-machines. Having hands-on experience in semantics and artificial intelligence, we formed our team including the father of Ontological Semantics, distinguished professor of linguistics, Victor Raskin. You know the saying, “There is no shortcut to a place worth going.” So, we knew from the beginning this would be a long journey.
2. Let’s dive right in at the deep end :) In an internet that seems saturated with search technologies, what does hakia have to offer users?
Our objective is to offer a new perspective to the Web searcher that is bringing quality results via our semantic search technology. Search engines like Google or MSN bring popular results via statistical methods. Popular results are not always quality results, and the searchers unknowingly suffer in many ways, as well as in wasted search time. There is no “other” perspective offered to the Web searcher today. Quality result satisfies three criteria simultaneously: It (1) comes from credible sources which are not commercially-biased, (2) is the most recent information available, and (3) is absolutely relevant to the query.
To give you an example, search for What are the benefits of Aspirin? at hakia.com. You will immediately notice the following:
3. For a long time Ask.com – aka Ask Jeeves – shouted about their ability to handle real, human questions (as opposed to search queries) and do a pretty good job at working out what a user was after. However, there was still too much basic keyword matching rather than any genuine semantic understanding of the user’s intentions, and eventually Ask.com seemed to accept its fate as another alternative search engine playing catch-up with Google. How is hakia’s approach truly different enough to avoid the same fate?
As far as we can tell without knowing the inside details, Ask.com is and always has been an indexing search engine, relying on keyword search and a version of popularity ranking. The same holds for all other current search engines. The underlying principles of the today’s search engines do not include “understanding,” but instead they rely on “approximation” by statistics (e.g., referral statistics or popularity). Therefore, when referral statistics are not available, they fail. This happens when (1) the query is a long-tail query, and (2) the pages are dynamic so that there is no time to collect statistics. The cases 1 and 2 constitute a huge portion of the available information on the Web (increasingly so). Therefore, you are half (if not more) blind to the available information on the Web using the current search engines. “Search fatigue” syndrome accounts for more than 50% of all searches, according to these studies.
We have built proprietary technology - the QDEX and our ontology - to be able to “understand” Web content and the query to yield more relevant results. If you look at the hakia search results, you will notice that the “quality” result text is displayed in coherent, uninterrupted sentences in contrast to bolded keywords and ellipses. Our approach is truly different since Ontological Semantics and other components of hakia have nothing to do with statistics or popularity methods. Our methods are centred on concept matching and content understanding. Note that semantic capabilities of hakia are still progressing as we render more and more analysis into page analysis.
4. Obviously the founding team had a good idea about semantics on a mathematical, linguistic, and logical level, but have you learned anything interesting or unexpected about the process when you added the human element?
Our hiring has been very successful and there has been nothing unusual or unexpected. People's commitment to the cause was extraordinary and surprising. I guess when people are presented something to work on, which has an enormous potential, it becomes an incentive itself. Previous innovations are breeding new innovations. That is what we are seeing at hakia.
5. Hakia’s algorithm uses a suite of very different elements to the big boys, including Ontological Semantics, Fuzzy Logic and Computational Linguistics. Can you give us a brief overview of the more interesting technologies that drive hakia?
Sure – as I mentioned previously, hakia has developed a new, proprietary middleware, the QDEX system, which allows semantic analysis. All incumbents use the inverted index, which has inherent limitations. Indexing backbone makes it almost impossible to deploy full-scale semantic analysis. Here are more details on hakia’s technologies: QDEX - During off-line crawling, hakia reads Web pages line-by-line and anticipates all the possible questions that can be asked to each sentence. Anticipated queries are then used as gateways to point to the paragraphs from which they originated. In other words, if you ask, “what is the best medicine for headache,” hakia may have already anticipated this question while analyzing relevant sentences off-line and created gateways to the originating text. The advantage of QDEXing is its accuracy, speed, and meaning-based analysis, unattainable simultaneously using an inverted-index as the backbone technology. You can actually peek into the core hakia technology at http://labs.hakia.com/
The first stages are similar to others, like crawling, parsing, and extracting the content. Then, the QDEXing process starts (as I explained above). In this process, the page’s URL, title, subtitles, and content distribution are analyzed. We also use a map of credibility to determine the legitimacy of the ideas presented. If your Web page is written and organized poorly, hakia will not like it. Last step is the preparation of text snippets as coherent and uninterrupted pieces of information. Relevant sentences are highlighted by on-the-fly analysis.
7. Due to the focus of link-based algorithms such as Google’s PageRank and Yahoo’s TrustRank, much of our work these days as SEOs is based around acquiring inbound links to websites. How much emphasis does hakia place on link popularity as a ranking factor? Do you think the link-based algorithms have much more longevity in them?
We do not put ANY emphasis on link popularity. Hakia’s proprietary SemanticRank algorithm is designed for this expressed purpose – higher relevancy, due to its ranking of QDEX data to determine relevancy based on grammar and sentence syntax, credibility, and age of document among other factors. The number of link referrals is virtually irrelevant. Otherwise, we would not be able to offer a “new” perspective, and our results would be just another display of popularity.To answer your second question: Popularity ranking was invented during the early ages of the Internet when the Internet search was like the wild-west. Now, the entire “value” map of the Internet is known, and the link-based popularity is no longer a novelty. Furthermore, popularity ranking fails when handling content that is statistically flat or infertile (known as long-tail). Also consider the growing problem of dynamic pages (there is no time to collect statistics). The future of link-based popularity is uncertain, and its range of application will most likely shrink.
8. The big players are well known for snapping up innovative and emerging technologies. In fact, as of May 2008, Google has acquired around 51 companies, Yahoo 57, and Microsoft a staggering 122, give or take a start-up. Whilst this can have fantastic opportunities and unlock massive R&D budgets, it can also risk the original aims. In the worst case scenario it can mean a technology is bought, patented and shelved, either intentionally so as to keep the playing field level, or just because the main company has other priorities. What’s hakia’s stance on the possibility of Google, Microsoft or Yahoo developing an interest in your semantic search technology?
It has been publicly acknowledged by many in the industry that semantic search is where Web search is headed. The large players also recognize this. As for hakia, our goal is to run a profitable business for now.
9. Depending on who you speak to, SEO is a profession based on essentially playing a system in a way it wasn’t intended to be played, or some would argue it’s about understanding the rules of the game more than others and sharing that advice. Either way we rely on an intimate knowledge of search technology and user psychology. Do you think SEOs should be worried about semantic search, or will it just change the nature of our jobs?
SEOs should certainly not be worried about semantic search, as it can provide a great amount of opportunity. It will change the nature of your job as far as optimizing Web pages for hakia.com is concerned. Ceteris paribus, Web sites that have better content, written in proper English, will rank higher than others. Link referrals will no longer matter. We will soon announce a set of criteria for the SEO activities.
10. Where is the project at the moment, and what we can look forward to seeing soon?
This is a very exciting time for us here at hakia. We are currently in the late stages of development, and our plans are to complete the development phase mid-2008. Just a few weeks ago, we announced the first licensee of our OntoSem search technology with Riverglass, a web-analytics provider. This week we have announced that we are introducing the quality dimension to Web search – starting with health content. This is just the beginning for hakia. We are working on our paid advertising platform, “hakia Precision Advertising,” and search engine marketing tools that will benefit from our core semantic technology. Stay tuned for announcements. Actually, the best way to stay connected to our announcement and product releases is by joining the hakia Club, where members get access to product reviews, Webmaster tools, and enjoy privileges such as site submissions, site search box downloads, and more.
11. Finally, is there anything you’d like to communicate to an audience who work with search engines as a career, and who follow their evolution with interest?
Do you know this old saying “There wouldn’t be fumes if there was no fire”? Many start-ups have already put their irons in the fire of “semantic search.” Undisputedly, it is the future of Web search. Whether hakia will get it right is yet to be seen, but to learn more about our approach on how to deliver it, we invite the community to visit the hakia Labs and Club. I would also like to add that people who work in the field should be extra open-minded for approaching innovations. This field is at its infancy; many things will change, some of which can be abrupt. Web search is a very deep subject. Those who trivialize it in their minds may find themselves outside the loop.
Thanks so much for the opportunity to hear more about your work and what it could mean for web users and SEOs. Best wishes for the future of hakia!
Thank you! SEO is an important part of the search echo system, and we look forward to further mutual support in this community. Thank you for the opportunity as well.
Farrah Hamid, Communications Coordinator at hakia, was kind enough to give me an exclusive interview with their team to find out more about what goes on behind the scenes. I told them to expect some pretty tricky questions, to which they we're only too happy to respond…
1. Hakia was founded in 2004 by Dr. Riza Berkan, a nuclear scientist with a specialisation in artificial intelligence and fuzzy logic, and Dr. Pentti Kouri, an economist and venture capitalist. Can you tell us the story about how hakia was formed, and about the motivations and minds that developed it?
Our motivation dates back to as early as the days of the Star Trek TV show. Like most AI scientists, we couldn’t help but think that tomorrow’s search engines must talk back to you in the way Mr. Spock communicates with the Enterprise’s computer. That “computer” satisfies many of the requirements which are expected from a search engine, like holding vast volumes of information and being able to reply to a query intelligently, perhaps with a soft voice! What is missing today is the fundamental component of “understanding” that is the first step toward human-like interaction. In 2004, we came to a realization that nobody was carrying on this promise forward and the search engines were evolving like fancy slot-machines. Having hands-on experience in semantics and artificial intelligence, we formed our team including the father of Ontological Semantics, distinguished professor of linguistics, Victor Raskin. You know the saying, “There is no shortcut to a place worth going.” So, we knew from the beginning this would be a long journey.
2. Let’s dive right in at the deep end :) In an internet that seems saturated with search technologies, what does hakia have to offer users?
Our objective is to offer a new perspective to the Web searcher that is bringing quality results via our semantic search technology. Search engines like Google or MSN bring popular results via statistical methods. Popular results are not always quality results, and the searchers unknowingly suffer in many ways, as well as in wasted search time. There is no “other” perspective offered to the Web searcher today. Quality result satisfies three criteria simultaneously: It (1) comes from credible sources which are not commercially-biased, (2) is the most recent information available, and (3) is absolutely relevant to the query.
To give you an example, search for What are the benefits of Aspirin? at hakia.com. You will immediately notice the following:
- Credibility - To make its determinations, hakia used the Medical Library Association’s list of quality health web sites. These are clearly marked and displayed at the top of search results page.
- Freshness - The top hakia search results list news, if available, and the most up-to-date information.
- Relevancy – hakia is committed to providing quality search results via semantic technology.
3. For a long time Ask.com – aka Ask Jeeves – shouted about their ability to handle real, human questions (as opposed to search queries) and do a pretty good job at working out what a user was after. However, there was still too much basic keyword matching rather than any genuine semantic understanding of the user’s intentions, and eventually Ask.com seemed to accept its fate as another alternative search engine playing catch-up with Google. How is hakia’s approach truly different enough to avoid the same fate?
As far as we can tell without knowing the inside details, Ask.com is and always has been an indexing search engine, relying on keyword search and a version of popularity ranking. The same holds for all other current search engines. The underlying principles of the today’s search engines do not include “understanding,” but instead they rely on “approximation” by statistics (e.g., referral statistics or popularity). Therefore, when referral statistics are not available, they fail. This happens when (1) the query is a long-tail query, and (2) the pages are dynamic so that there is no time to collect statistics. The cases 1 and 2 constitute a huge portion of the available information on the Web (increasingly so). Therefore, you are half (if not more) blind to the available information on the Web using the current search engines. “Search fatigue” syndrome accounts for more than 50% of all searches, according to these studies.
We have built proprietary technology - the QDEX and our ontology - to be able to “understand” Web content and the query to yield more relevant results. If you look at the hakia search results, you will notice that the “quality” result text is displayed in coherent, uninterrupted sentences in contrast to bolded keywords and ellipses. Our approach is truly different since Ontological Semantics and other components of hakia have nothing to do with statistics or popularity methods. Our methods are centred on concept matching and content understanding. Note that semantic capabilities of hakia are still progressing as we render more and more analysis into page analysis.
4. Obviously the founding team had a good idea about semantics on a mathematical, linguistic, and logical level, but have you learned anything interesting or unexpected about the process when you added the human element?
Our hiring has been very successful and there has been nothing unusual or unexpected. People's commitment to the cause was extraordinary and surprising. I guess when people are presented something to work on, which has an enormous potential, it becomes an incentive itself. Previous innovations are breeding new innovations. That is what we are seeing at hakia.
5. Hakia’s algorithm uses a suite of very different elements to the big boys, including Ontological Semantics, Fuzzy Logic and Computational Linguistics. Can you give us a brief overview of the more interesting technologies that drive hakia?
Sure – as I mentioned previously, hakia has developed a new, proprietary middleware, the QDEX system, which allows semantic analysis. All incumbents use the inverted index, which has inherent limitations. Indexing backbone makes it almost impossible to deploy full-scale semantic analysis. Here are more details on hakia’s technologies: QDEX - During off-line crawling, hakia reads Web pages line-by-line and anticipates all the possible questions that can be asked to each sentence. Anticipated queries are then used as gateways to point to the paragraphs from which they originated. In other words, if you ask, “what is the best medicine for headache,” hakia may have already anticipated this question while analyzing relevant sentences off-line and created gateways to the originating text. The advantage of QDEXing is its accuracy, speed, and meaning-based analysis, unattainable simultaneously using an inverted-index as the backbone technology. You can actually peek into the core hakia technology at http://labs.hakia.com/
6. As SEOs we’re constantly interested in the processes behind search engines, from crawling and indexation to the final serving up of the results. What are the stages through which a site passes before the final results page?
The first stages are similar to others, like crawling, parsing, and extracting the content. Then, the QDEXing process starts (as I explained above). In this process, the page’s URL, title, subtitles, and content distribution are analyzed. We also use a map of credibility to determine the legitimacy of the ideas presented. If your Web page is written and organized poorly, hakia will not like it. Last step is the preparation of text snippets as coherent and uninterrupted pieces of information. Relevant sentences are highlighted by on-the-fly analysis.
7. Due to the focus of link-based algorithms such as Google’s PageRank and Yahoo’s TrustRank, much of our work these days as SEOs is based around acquiring inbound links to websites. How much emphasis does hakia place on link popularity as a ranking factor? Do you think the link-based algorithms have much more longevity in them?
We do not put ANY emphasis on link popularity. Hakia’s proprietary SemanticRank algorithm is designed for this expressed purpose – higher relevancy, due to its ranking of QDEX data to determine relevancy based on grammar and sentence syntax, credibility, and age of document among other factors. The number of link referrals is virtually irrelevant. Otherwise, we would not be able to offer a “new” perspective, and our results would be just another display of popularity.To answer your second question: Popularity ranking was invented during the early ages of the Internet when the Internet search was like the wild-west. Now, the entire “value” map of the Internet is known, and the link-based popularity is no longer a novelty. Furthermore, popularity ranking fails when handling content that is statistically flat or infertile (known as long-tail). Also consider the growing problem of dynamic pages (there is no time to collect statistics). The future of link-based popularity is uncertain, and its range of application will most likely shrink.
8. The big players are well known for snapping up innovative and emerging technologies. In fact, as of May 2008, Google has acquired around 51 companies, Yahoo 57, and Microsoft a staggering 122, give or take a start-up. Whilst this can have fantastic opportunities and unlock massive R&D budgets, it can also risk the original aims. In the worst case scenario it can mean a technology is bought, patented and shelved, either intentionally so as to keep the playing field level, or just because the main company has other priorities. What’s hakia’s stance on the possibility of Google, Microsoft or Yahoo developing an interest in your semantic search technology?
It has been publicly acknowledged by many in the industry that semantic search is where Web search is headed. The large players also recognize this. As for hakia, our goal is to run a profitable business for now.
9. Depending on who you speak to, SEO is a profession based on essentially playing a system in a way it wasn’t intended to be played, or some would argue it’s about understanding the rules of the game more than others and sharing that advice. Either way we rely on an intimate knowledge of search technology and user psychology. Do you think SEOs should be worried about semantic search, or will it just change the nature of our jobs?
SEOs should certainly not be worried about semantic search, as it can provide a great amount of opportunity. It will change the nature of your job as far as optimizing Web pages for hakia.com is concerned. Ceteris paribus, Web sites that have better content, written in proper English, will rank higher than others. Link referrals will no longer matter. We will soon announce a set of criteria for the SEO activities.
10. Where is the project at the moment, and what we can look forward to seeing soon?
This is a very exciting time for us here at hakia. We are currently in the late stages of development, and our plans are to complete the development phase mid-2008. Just a few weeks ago, we announced the first licensee of our OntoSem search technology with Riverglass, a web-analytics provider. This week we have announced that we are introducing the quality dimension to Web search – starting with health content. This is just the beginning for hakia. We are working on our paid advertising platform, “hakia Precision Advertising,” and search engine marketing tools that will benefit from our core semantic technology. Stay tuned for announcements. Actually, the best way to stay connected to our announcement and product releases is by joining the hakia Club, where members get access to product reviews, Webmaster tools, and enjoy privileges such as site submissions, site search box downloads, and more.
11. Finally, is there anything you’d like to communicate to an audience who work with search engines as a career, and who follow their evolution with interest?
Do you know this old saying “There wouldn’t be fumes if there was no fire”? Many start-ups have already put their irons in the fire of “semantic search.” Undisputedly, it is the future of Web search. Whether hakia will get it right is yet to be seen, but to learn more about our approach on how to deliver it, we invite the community to visit the hakia Labs and Club. I would also like to add that people who work in the field should be extra open-minded for approaching innovations. This field is at its infancy; many things will change, some of which can be abrupt. Web search is a very deep subject. Those who trivialize it in their minds may find themselves outside the loop.
Thanks so much for the opportunity to hear more about your work and what it could mean for web users and SEOs. Best wishes for the future of hakia!
Thank you! SEO is an important part of the search echo system, and we look forward to further mutual support in this community. Thank you for the opportunity as well.
Comments
Please keep your comments TAGFEE by following the community etiquette
Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.