AnalogyLike

Friday, February 18, 2011

Constantinople isn't just a Train in Rome: Watson's error rate on Jeopardy

A lot of ink has been spilled about Watson's error in guessing "Toronto" for a U.S. city, but amusing as that goof was, it wasn't one of his worst errors at all. According to IBM scientist David Ferucci, Watson only had 14% confidence in that answer, hence the multiple question marks. Had it not been a Final Jeopardy question it was forced to answer, it never would have buzzed in.

The far more egregious errors were those that Watson had full confidence in, but was dead wrong about. In fact, given IBM's stated goal of using similar technology to aid doctors in diagnosing patients, those are the most dangerous outcomes possible. It's one thing for Watson to throw its hands up and admit to not having a clue (as it essentially did with the Toronto question), but what if Watson informed a doctor that it was 97% certain that a patient had cancer, when in fact they have no malignant cells at all? That would be far more troubling, and is analogous to Watson's 97% confidence that Latin finis is a also a word for where trains originate:

Final Frontiers (400): From the Latin for "end", this is where trains can also originate

Correct Answer:
terminus/terminal

Watson's Answers:
finis 97%
Constantinople 13%
Pig Latin 10%

Watson on Jeopardy: Game 2 errors

The second Jeopardy game between IBM's Watson computer and two human competitors (Ken Jennings and Brad Rutter) was broadcast Wednesday night, Feb 16.

As I did in a previous post, I'm providing a list of the questions that stumped Watson in one way or another. Note that I include questions where Watson failed to buzz in because of lack or confidence, but not those where it was simply late on the buzzer but otherwise had the correct answer.

Yesterday I also discussed two questions from the broadcast which provided no information on Watson's guesses, and I speculated on how well Watson might have done on those if given a chance.

In future posts, I plan to discuss some of the more interesting errors from these tables.

Watson Speculation

Two questions from last night's Jeopardy game gave no indication of how Watson would have answered. In one case, Ken Jennings was gave the correct answer:

Senator Obama attended the 2006 groundbreaking for this man's memorial, 1/2 mile from Lincoln's
A: "Who is Martin Luther King"

Since this was a Daily Double question, we didn't get to see how Watson would have replied. Would Watson have gotten it right if given a chance?

In the other case, a glitch in the feed seems to have occurred. While Watson's guesses were shown on screen for every other question of the night (regardless of whether or not Watson buzzed in), no information from Watson was given for this one (which Brad Rutter correctly answered):

If you're one of these capable fellows, you're unfortunately "master of none"
A: "What is a Jack of all Trades"

So the question becomes, did Brad simply beat Watson to the buzzer (as he had done five times in this game and four times in the previous one, so we know he had the capability) or was Watson just unable to come up with the answer?

My completely speculative answer is that Watson would have had a hard time with the MLK question, but should have easily gotten the Jack of all Trades one. "Master of None" is a fairly unambiguous phrase that consistently is linked to "Jack of all Trades" (just do a Google search to see this). In contrast, Watson is often very bad at guessing what person is being discussed elliptically, as in "this man's memorial" in the first question. The big keywords in that one, "Obama," "memorial," and "Lincoln" are much less closely tied to MLK. "2006 groundbreaking" would help Watson figure it out, but there might be many more distractors, and its confidence in the right answer would likely be lower, if it even reached the level of buzzing in. But that's just my guess. Probably we'll never know.

Wednesday, February 16, 2011

Humans vs. computers

In response to a reader's question at The Washington Post, Ken Jennings gave a nice analogy explaining why computers won't take over the jobs of game show contestants any time soon:

I don't think this means it's lights out for trivia or quiz shows. The analogy I use is track and field: humans keep running races even though cars and trains have been faster than us for more than a century. It's all about the human psychology of the contest, not just the outcome.

Analyzing Watson on Jeopardy (part one)

The first Jeopardy game between IBM's Watson computer and two human competitors (Ken Jennings and Brad Rutter) was broadcast over two episodes on Monday and Tuesday Feb 14 and 15.

As a linguist, I'm particularly interested in investigating those questions that Watson failed to understand properly, and to what degree. Right now I'm just going to list them (in a series of posts). In a future post, I'll discuss some of the more interesting errors.

Part one of several posts follows.

Deafness and Wizarding

A former student of mine once made a rather clever (to my mind) analogy between the wizarding world in the Harry Potter novels and Deaf society in America. After the break are the main areas of comparison he made:

Language is like a building: Phonetics vs. Phonology

There is a great analogy over at dialectblog.com explaining the difference between phonetics and phonology:

To illustrate the difference between these two terms, imagine that human language is a building. (Bear with me.) Somebody studying phonology would look at the fundamental structure of the building, its engineering, its shape, its square footage, etc. Somebody studying phonetics, on the other hand, would study the actual materials used to make the building: the particular type of steel, whether it’s brick or wood, the glass in the windows. Phonetics studies the raw materials of language; phonology studies how these raw materials are used.