February 16th 2011 was the final night of the Man vs. Machine Grand Challenge in which the IBM supercomputer, Watson, played the trivia game Jeopardy against two of its all-time champions, Ken Jennings and Brad Rutter.  Watson bested its competition, with the final segment something of a neck-and-neck battle against Jennings, who held the lead much of the time.

I was at the IBM viewing party in Washington, DC for this event, and Subject Matter Experts (SMEs) were on hand to discuss various topics before, during and after the show. One of the points raised was Watson’s accuracy, specifically during Day Two of Final Jeopardy in which this answer was posed for category “U.S. Cities”: “Its largest airport was named for a World War II hero; its second largest, for a World War II battle.”

Although the question was correctly by Jennings and Rutter as “What is Chicago? ,” Watson answered incorrectly (“What is Toronto????”). Some reasons for this confusion have already been explained on the Smarter Planet blog, but the SMEs at last night’s viewing party also were able to provide more insight.

In making its candidate selection, Watson looks at a number of multiple dimensions associated with the category such as location, passage support, popularity, source reliability and classification. Like any good detective, it gathers evidence to piece each of these dimensions and then weights the scores to provide a confidence value alongside its threshold.

The potential flaw in the Day Two Final Jeopardy was a basic lack of evidence across several of these features: “U.S. city” was not explicitly stated in the question and Watson was not convinced about the World War II evidence it could find. Had the evidence lined up, Chicago (Watson’s second answer), might have percolated to the top of the answer queue.

It’s more important to also keep in mind that the “????” portion of the answer shows that Watson remained underconfident in its answer. If this had been a normal Jeopardy question, the SME told us, Watson would have remained silent. Given that Final Jeopardy requires mandatory response, however, Watson went with its best, albeit wrong, answer. In laymen’s terms, this is what happens when you force a machine to answer at gunpoint.

In an earlier post, I questioned whether Watson’s lack of instinct would be detrimental in its ability to correctly respond.

As these three rounds illustrates, the answer is both “yes” and “no.”  It is “yes” because, as with any type of speed-dependent guesswork, the gut reaction is a major force in producing an answer. But it is also “no” because Watson appears tuned to calculate the risks that are being taken, and will either stay silent or wager less than necessary.

This translates rather efficiently to healthcare and other segments that require expert opinion. Again, we need to emphasize that Watson is not a replacement for a medical specialist or lawyer but serves as a reference tool, theoretically aiding teams by searching instantaneously through hundreds of thousands of research journals and coming up with possible solutions to any number of phrases.

And, just like its human counterpart, Watson does not always have the answer on target. Unlike some human experts, however, Watson is able to admit when it might be wrong.

Judging from these rounds of Jeopardy, as well the future scenarios that these types of supercomputers will bring to the table, what really makes Watson unique is not the lack of instinct, but the lack of ego.  Human beings still have a lot to learn in that area as it is our need to be certain that is often the major sticking point in our own logical assessment.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>