2005-04-27
Tiger Shot a Birdie
There are several solutions to dealing with ambiguous language. One is to gather about you all the contextual help you can find. If you know the context it is formulated in, then the tittle of this section is no longer ambiguous for you – if it ever was. It is a statement about golf. I could have written, “Tiger Woods shot a birdie on the 17th hole, but then I would have made the riddle too easy for you. Yet for a dumb machine, it would make little difference if I wrote Tiger, or Tiger Woods, or Tiger Woods the Nike guy, or anything else. “What do I know?”, the computer would say.
Imagine that a machine is used to keep score in a golf tournament. The correct procedure is to feed it the results of every hole for every player. The input routine might be something like this:
Input number of hole: ??
Input contestant’s number: ??
Input score: ??
An operator has been giving the machine the holes, players and scores as stipulated – in numbers, but on the 17th hole she forgets herself and writes “Tiger shot a birdie on the Road”. What does the machine think now? Well for starters, most machines are not going to let you treat them like that, specially dumb machines are going to demand, that if you want to tell them anything, it has to be said their way – not yours. Such a machine would tell her, “I don’t know what you are talking about – Just numbers in the correct order, thank you!
But if the machine was semi-smart, it might accept free text or natural language input, and have a go at figuring out what the operator meant. To do this it would need to have a rough idea of sentence structure, the rules of grammar and a dictionary. It would also need a taxonomy of the terms most often used within a particular domain, in this case Golf.
Some people are surprised to hear that machine translation or transcription works best for discourse which, as outsiders, we normally consider difficult. We might find the language of doctors obscure and hard to understand and wonder why a machine would have an easier time with a lot of arcane terms than with lite everyday conversation. The answer is of course that machines prefer obscure and arcane words because they are less likely to have ambivalent meanings. They are the code words of a domain – their use is constrained within a limited discourse. When the machine knows that now we are talking medicine, it knows that within medicine these words have crisp definitions.
Subscribe to Comments [Atom]