If..Else Log

The overhype of natural language search

David Sullivan has a good piece on hype behind natural language searches. Like biometric authentication, natural language searches are an example of a technology which sounds appealing on the surface but, after a more careful analysis, wouldn't actually be that good an idea in practice.

Google

The problems are varied but all ultimately stem from NLS' raison d'etre, namely that NLS are a more effective means of finding information and that the first company to bring one to fruition will have a killer application on their hands.

Flaw?

The problem is that many people haven't actually considered if that's actually even true. Keyword searches are actually an optimal means of searching for information in many situations.

Succint searches

The big part of the information encapsulated in a search is already contained within keywords. Borrowing David's example, the keyword search for "Pirates of the Carribean" already captures the bulk of the information that you want to search from. There is little more information that you can derive from such a phrase. The amount of benefit that NL analysis can provide is limited and expanding the phrase to a natural language equivalent does little to aid the search process.

However, perhaps the bigger issue is that, search is supposed to be a quick and efficient means of finding a resource matching your request. Keyword searches are quick and simple (to use and understand); natural language searches result in verbosity and greater initial time investment. Why do a search for "I want to find out more information on about Pirates of the Carribean" when a keyword search for "Pirates of the Caribbean" is just as effective? But you may cry "people don't have to use a NL, they can do a keyword search as well". In that case, why not use a regular search engine like Google which is likely to be more effective with such input.

User model

Google succeeded because the changes it made required no changes on the part of the user. The results simply got better, even though the method of searching (and number of words) remained the same.

This is an important point to note. For many people, keyword searching is an intuitive way to search. As mentioned, they are easy to understand and easy to use. The results are, in a way, predictable and fit in line with a user's mental model of how searching should work. The simplicity of the user model has two big repurcussions. Firstly, it's easy to pick up and use; it's easy to understand or teach the concepts of how to search with Google1. It's a familiar interface which people know how to use almost intuitively. They don't have to make an initial investment in learning how to use the system. Secondly, the user model is easy to extend; want to find PotC games? just add games to the search query. Thirdly, keyword searches are predictable and (relatively) unambiguous. I don't have to think about why the search worked this way; it might not have found what I was looking for but the search and results made sense.

Context and refinement

One of the often-mentioned problems of keyword search is that of context. I want information on the PotC ride, not the film. However, NL searches are a poor approach to this problem. Clustering (or the grouping of information from different domains) is one approach, which has nothing to do with NLS; tacking on another keyword is another. Suggested searches are another. These are all, however, enhancements to (or that can be made to) existing engines. Extension not replacement is the key.

Being better? First be just as good

The concept of extension rather than replacement is useful to remember as it asks you to consider why something is successful and what you have to do to displace an established product. For a product to displace a given leader in a given domain, it has to do two things. It has to have a killer feature, whether that's something it does which no one else does or something that it does differently (but more effectively) than someone else. More importantly however, it has to be able to everything else that the competitor competently. That is, it has to be able to act as a reasonable alternative for the leader. Competitors to Microsoft have had a difficult time in succeeding because they've all missed some feature somewhere.

The hard challenge is not being better than Google for a given task; it's about being good enough so as to minimise the inertia cost.

Behind Google's success

One of the things that people keep overlooking when talking about the next Google killer is what are Google's strengths. Ignoring their assets such as their developers, infrastructures, financial strength, customer base and supporting products, if you were to look solely at their search platform, they would still be a significant force.

Large index

I'm not sure how many pages are on the web nowadays but the phrase "If it isn't on Google, it's not on the web" certainly rings true in some respects. The size of Google's index is really quite incredible and if it doesn't sound so impressive now, it's only because we're conditioned to expect it. What's impressive about it is not just the size but the infrastructure and architecture that they've set up to accommodate and make use of such.

Firstly, the actual technical challenge in indexing, maintaining and searching such a large data set isn't trivial and the system that Google have developed to tackle such is impressive2. It's no surprise to hear some people referring to Google's architecture as being their real ace card, the unsung hero that enables them to keep ahead.

The other side to a large index is the weighting of results. Interpreting a search string is only part of the issue; Google's pagerank algorithm was a breath of fresh air when it came out as it provided a reliable framework for returning appropriate results. It was enough of a killer app to catapult them to market leader.

Investors really need to look further if they're looking for a Google killer. Whilst history has told us that no company is safe and that Google's ascendancy won't last forever, it's hard to see it being displaced by any of the companies listing NLS as their big selling point.


  1. To a degree, as illustrated by Scriv's post. However, if anything, that post reinforces the point. If people don't use Google in the expected manner, would NLS be any better? [back]
  2. Thousands of cheap servers replicating data across several machines which deal with issues such as failures and load in a scalable (price as well as performance) manner. [back]

-30-