Wednesday, February 16, 2011

Analyzing Watson on Jeopardy (part one)

The first Jeopardy game between IBM's Watson computer and two human competitors (Ken Jennings and Brad Rutter) was broadcast over two episodes on Monday and Tuesday Feb 14 and 15.

As a linguist, I'm particularly interested in investigating those questions that Watson failed to understand properly, and to what degree. Right now I'm just going to list them (in a series of posts). In a future post, I'll discuss some of the more interesting errors.

Part one of several posts follows.



Watson's confidence level is given as a percentage after each potential answer.
















Questions that Watson blew completely
complete confidence in the wrong answer,
and the right answer was nowhere in its top three choices:
CategoryClueCorrect AnswerWatson's Answers
Final Frontiers ($400) From the Latin for "end", this is where trains can also originate terminal !*finis 97%
?*Constantinople 13%
?*Pig Latin 10%
Alternate Meanings ($800) Stylish elegance, or students who all graduated in the same year class !*chic 82%
?*panache 11%
?*Vera Wang 7%
Questions that Watson screwed up
complete confidence in the wrong answer,
but the right answer was one of its top three choices:
The Art of the Steal ($1600) In May 2010 5 paintings worth $125 million by Braque, Matisse & 3 others left Paris' Museum of this art period modern art !*Picasso 97%
?modern art 11%
Final Frontiers ($800)
Watson was beaten to the buzzer by Brad
It's a 4-letter term for a summit; the first 3 letters mean a type of simian apex !*peak 65%
?*acme 15%
?apex 12%
Name the Decade ($1000) The first modern crossword puzzle is published & Oreo cookies are introduced The 1910's !*1920's 57%
?1910's 30%
?*1912 4%
Olympic Oddities ($1000) It was the anatomical oddity of U.S. gymnast George Eyser, who won a gold medal on the parallel bars in 1904 missing leg !*leg 61%
note that Watson's answer is incomplete rather than fully wrong
Questions where Watson had no clue
no confidence in any choice, and no sense of the right answer
Name the Decade ($600) Klaus Barbie is sentenced to life in prison & DNA is first used to convict a criminal The 1980's ?*2002 11%
?*1987 7%
?*Lyon 3%
Olympic Oddities ($800) In the 2004 opening ceremonies a sole member of this team opened the parade of nations; the rest of his team closed it Greece ?*Olympic Games 20%
?*Athens 15%
?*2004 Summer Olympics 13%
Questions where Watson didn't know it had a clue
right answer not the top choice, no confidence in any choice
Literary Character APB ($400) His victims include Charity Burbage, Mad Eye Moody & Severus Snape; he'd be easier to catch if you'd just name him! Voldemort ?*Harry Potter 37%
?Voldemort 20%
The Art of the Steal ($800) A Goya stolen (but recovered) in 2006 belonged to a museum in the city (Ohio, not Spain) Toledo ?*Madrid 40%
?Toledo 26%
Alternate Meanings ($1000) A thief, or the bent part of an arm Crook ?*knee 40%
?*waist 10%
?crook 5%
U.S. Cities (FINAL) Its largest airport is named for a World War II hero; its second largest, for a World War II battle Chicago ?*Toronto 14%
?Chicago 11%
Note that had this not been a Final Jeopardy question, Watson would not have answered it
Questions where Watson almost had a clue
right answer was the top choice, but not enough confidence:
Name the Decade ($400) The Empire State Building opens & the "War of the Worlds" radio broadcast causes a panic The 1930's ?1930's 50%
Name the Decade ($800) The first flight takes place at Kitty Hawk & baseball's first World Series is played The 1900's ?1900's 17%
"Church" and "State" ($800) To bring back someone to his original function or position reinstate ?reinstate 2 32%
The Art of the Steal ($2000) A Titian portrait of this Spanish king was stolen at gunpoint from an Argentine museum in 1987 Philip II of Spain ?Philip 31%
The Art of the Steal ($1200) (DD) The ancient "Lion of Nimrud" went missing from this city's National Museum in 2003 (along with a lot of other stuff) Baghdad ?Baghdad 32%
Note that had this not been a Daily Double question, Watson would not have answered it

(edited for completeness)

No comments:

Post a Comment