If..Else Log

The overhype of natural language search

David Sullivan has a good piece on hype behind natural language searches. Like biometric authentication, natural language searches are an example of a technology which sounds appealing on the surface but, after a more careful analysis, wouldn't actually be that good an idea in practice.

Google

The problems are varied but all ultimately stem from NLS' raison d'etre, namely that NLS are a more effective means of finding information and that the first company to bring one to fruition will have a killer application on their hands.

Flaw?

The problem is that many people haven't actually considered if that's actually even true. Keyword searches are actually an optimal means of searching for information in many situations.

Succint searches

The big part of the information encapsulated in a search is already contained within keywords. Borrowing David's example, the keyword search for "Pirates of the Carribean" already captures the bulk of the information that you want to search from. There is little more information that you can derive from such a phrase. The amount of benefit that NL analysis can provide is limited and expanding the phrase to a natural language equivalent does little to aid the search process.

However, perhaps the bigger issue is that, search is supposed to be a quick and efficient means of finding a resource matching your request. Keyword searches are quick and simple (to use and understand); natural language searches result in verbosity and greater initial time investment. Why do a search for "I want to find out more information on about Pirates of the Carribean" when a keyword search for "Pirates of the Caribbean" is just as effective? But you may cry "people don't have to use a NL, they can do a keyword search as well". In that case, why not use a regular search engine like Google which is likely to be more effective with such input.

User model

Google succeeded because the changes it made required no changes on the part of the user. The results simply got better, even though the method of searching (and number of words) remained the same.

This is an important point to note. For many people, keyword searching is an intuitive way to search. As mentioned, they are easy to understand and easy to use. The results are, in a way, predictable and fit in line with a user's mental model of how searching should work. The simplicity of the user model has two big repurcussions. Firstly, it's easy to pick up and use; it's easy to understand or teach the concepts of how to search with Google1. It's a familiar interface which people know how to use almost intuitively. They don't have to make an initial investment in learning how to use the system. Secondly, the user model is easy to extend; want to find PotC games? just add games to the search query. Thirdly, keyword searches are predictable and (relatively) unambiguous. I don't have to think about why the search worked this way; it might not have found what I was looking for but the search and results made sense.

Context and refinement

One of the often-mentioned problems of keyword search is that of context. I want information on the PotC ride, not the film. However, NL searches are a poor approach to this problem. Clustering (or the grouping of information from different domains) is one approach, which has nothing to do with NLS; tacking on another keyword is another. Suggested searches are another. These are all, however, enhancements to (or that can be made to) existing engines. Extension not replacement is the key.

Being better? First be just as good

The concept of extension rather than replacement is useful to remember as it asks you to consider why something is successful and what you have to do to displace an established product. For a product to displace a given leader in a given domain, it has to do two things. It has to have a killer feature, whether that's something it does which no one else does or something that it does differently (but more effectively) than someone else. More importantly however, it has to be able to everything else that the competitor competently. That is, it has to be able to act as a reasonable alternative for the leader. Competitors to Microsoft have had a difficult time in succeeding because they've all missed some feature somewhere.

The hard challenge is not being better than Google for a given task; it's about being good enough so as to minimise the inertia cost.

Behind Google's success

One of the things that people keep overlooking when talking about the next Google killer is what are Google's strengths. Ignoring their assets such as their developers, infrastructures, financial strength, customer base and supporting products, if you were to look solely at their search platform, they would still be a significant force.

Large index

I'm not sure how many pages are on the web nowadays but the phrase "If it isn't on Google, it's not on the web" certainly rings true in some respects. The size of Google's index is really quite incredible and if it doesn't sound so impressive now, it's only because we're conditioned to expect it. What's impressive about it is not just the size but the infrastructure and architecture that they've set up to accommodate and make use of such.

Firstly, the actual technical challenge in indexing, maintaining and searching such a large data set isn't trivial and the system that Google have developed to tackle such is impressive2. It's no surprise to hear some people referring to Google's architecture as being their real ace card, the unsung hero that enables them to keep ahead.

The other side to a large index is the weighting of results. Interpreting a search string is only part of the issue; Google's pagerank algorithm was a breath of fresh air when it came out as it provided a reliable framework for returning appropriate results. It was enough of a killer app to catapult them to market leader.

Investors really need to look further if they're looking for a Google killer. Whilst history has told us that no company is safe and that Google's ascendancy won't last forever, it's hard to see it being displaced by any of the companies listing NLS as their big selling point.


  1. To a degree, as illustrated by Scriv's post. However, if anything, that post reinforces the point. If people don't use Google in the expected manner, would NLS be any better? [back]
  2. Thousands of cheap servers replicating data across several machines which deal with issues such as failures and load in a scalable (price as well as performance) manner. [back]

Diving with style

Harry Pearson writes on the art of diving.

Yet the truth is that it is not Zokora's dive that has created the furore; it is its ineptitude […] when it comes to cheating, we expect a degree of professionalism from our footballers.

It's interesting to compare the contrast in the reactions from footballers to physical contact1 in professional games to those played for fun in the park and playground. Whenever I've played, players would often shrug off all but the hardest of challenges as opposed to what is, rather disingeniuosly, termed in the professional game as gamesmanship. There are a number of (rational) reasons behind the differences but it is a shame that the spectacle of football is tarnished by such ugly acts.


  1. Or, in this case, the lack of physical contact [back]

A pink redesign

I've been meaning to find some time to work on this site and so when when Pink for October was announced, it sounded like an ideal opportunity. An excuse to redesign the site (not that I've ever really needed one) as well as spread the word about a good cause. Sign me up! However, as it always seems to be the case, time just trickles away and so when Matthew reminded me that October was only a weekend away, I found myself in a bit of a mess having not even started any of the development 1.

Pink for October

To make matters worse, on that very weekend, I'd been co-opted to helping out with some house-moving2. And so it was that I barely managed to squeeze in the launch of my redesign in time for the start of October3. Welcome to the 8th iteration of If..Else codenamed 'Dianthus'.

Spreading the word

It'll be remiss of me not to talk about what precipitated this redesign. This year, over a million people will be diagnosed with Breast Cancer. Despite initiatives such as screening programmes, Breast Cancer is still the most common form of Cancer affecting women. For the last two decades, National Breast Cancer Awareness Month has helped in building wider public awareness and so, it's my pleasure to be able to subvert this site for a good cause, even if it's only in a small way. In addition, I'll be donating all my adsense revenue + 100% for this month to Cancer Research UK.

Words on this redesign

Due to the beforementioned time constraints, this site redesign is far from what I would consider finished. Even ignoring the many design features that I had to drop to get this finished in time for the October launch, I haven't been able to give this a rigorous test to make sure that the design holds up across not just the various browser configurations4 but also across the various pages and site options. In particular, I have little more than a vague hope that the existing content holds up without too much damage and I'm disappointed that work on the archive and search pages had to be dropped due to a sheer lack of time. Apologies for any borkages.

The big point to note about this redesign is how far I've had to stretch the definition in order to class this as a pink redesign. My initial sketches and prototypes used pink in a far stronger manner. In fact, one of the designs that I was all but ready to go with was full-on-pink. The problem was balance.

I personally don't consider myself to be a great designer and that showed in trying to develop a design that fit in with the Pink theme whilst not detracting from the content. My main problem was that, whilst each of my initial designs looked good as a one-off feature, the design always seemed to overwhelm the content which it was intended to support. With more time, I would have probably have been able to bring it under control but in a bid to get things done, I took the easy way out and applied dashings of pink to a more stable design. It's no accident that this iteration bears a lot of similarity to it's predecessor; realign is once again the word of the month.

I'm not sure how successful this design is yet. I guess it's still a bit fresh for me to make an objective decision. However, despite the rush job, I'm reasonably pleased with how it turned out. Whether I'll keep this look post-October, or whether I revert back or even redesign again is still an open question but for now I'm happy I managed to go live with it. And with that done, I'm off to bed.


  1. Though fortunately, I had sketched out what I'd planned to do beforehand and worked on a couple of the major assets such as the header. [back]
  2. Which was not helped by the fact that I also ended spending a good 8 hours on the road. What is it with rain and traffic? [back]
  3. Actually, I missed it by about half an hour but I'm ruling it in via use of a technicality as this site is hosted in the US:) [back]
  4. I did give it a quick test across IE6, FF1.5, Safari, and Opera 9 [back]

« Previous Page