From factoids to facts (26 Aug 2004)
Dr Brill, meanwhile, has moved to a more difficult task. One of his most recent papers, written jointly with Radu Soricut of the University of Southern California, is entitled 'Beyond the Factoid'. It describes his efforts to build a system capable of providing 50-word answers to questions such as 'What are the rules for qualifying for the Academy Awards?' This is harder than finding a single-word answer, but Dr Brill thinks it should be possible using something called a 'noisy channel' model.
Such models are already employed in spell-checking and speech-recognition systems. They work by modelling the transformation between what a user means (in spell-checking, the word he intended to type) and what he does (the garbled word actually typed). Just as a telephone line distorts the voice of the person at the other end of the line, this process can be thought of as being a noisy channel that transforms the user's intention into something rather different.
By analysing many pairs of correct and mis-spelled words using statistical techniques, it is possible to predict how such transformations work in general cases. A system can then be designed to work the process backwards. Given a mis-spelled word, it can guess what that word is most likely to be a mis-spelling of.
Dr Brill's question-answering system does something similar. Many question-and-answer pairs exist on the web, in the form of 'frequently asked questions' (FAQ) pages. Dr Brill trained his system using a million such pairs, to create a model that, given a question, can work out various structures that the answer could take. These structures are then used to generate search queries, and the matching documents found on the web are scanned for things that look like answers.
Article URL: http://www.economist.com/science/displayStory.cfm?story_id=3127462
Read 43 more articles from Economist sorted by
date,
popularity, or
title.
Next Article: Clash of Rhetorical Cultures: Q&A with Dale Cyphert, Ph.D.
|