Copyright 1999 The Durham Herald Co.
The Herald-Sun (Durham, N.C.)
April 26, 1999, Monday
Proverb tackles N.Y. Times crosswords
People prove more adept in some puzzlers than Duke computer program
JEAN P. FISHER The Herald-Sun
To look at it, the
crossword puzzle grid might be any casual dabbler's half-finished attempt.
Some words are filled in correctly, in confident black. Other rows contain best
guesses and nonsense words: No. 20 across --
But unlike a
dashed-off puzzle one might find in a discarded newspaper, no human hand helped
to fill in the letters and words steadily plugging into this particular grid,
displayed on a computer monitor at Duke University recently.
At least, not in the traditional sense.
The computer program at work solving crosswords in an empty classroom is the
product of several humans -- Duke computer scientists, to be precise -- who
took exception to New York Times puzzlemaster Will Shortz's assertion that
artificial intelligence could
never match human guile in solving crossword puzzles.
"The rallying cry when we took up this project was 'we will show him,'
" said Michael Littman, a Duke computer scientist who led the team of students
who designed the program.
And they did.
"Proverb," the program designed by Littman, with Duke
doctoral students Greg Keim and Noam Shazeer and other students and faculty,
achieved an average of 95.3 percent words correct when given a sample of 370
crosswords. The puzzles, chosen randomly from The New York Times and several
other sources, were each solved in under 15
Shortz, when made aware of the program's success, suggested an immediate test
against the nation's best human solvers, who competed in the 22nd Annual
American Crossword Puzzle Tournament held March 12-14 in Stamford, Conn., and
directed by Shortz.
received the tournament puzzles weeks before the competition in order to run
them through the program and complete a short analysis about why the computer
performed as it did.
The competition included 254 human competitors, and for the first time in its
history no one completed all seven puzzles
Proverb actually competed, its score, calculated by a formula that combines accuracy
and speed, would have put it in 147th place.
"We were really pleased, but I think we also were a little bit naive about how
good people are at this," said Littman.
"People solve some of these
puzzles very quickly. Four minutes -- bang, perfect solution. It was just very
humbling, both as a human being and as a designer of software."
Shortz expressed admiration for
Proverb's results, posting them on a Web site with the comment:
"The results were so interesting (in fact, so
amazing) that I printed them out on large sheets of paper and posted them,
along with Michael's analysis, after each round at the event."
Littman, in turn, was intrigued that the human solvers took such acute notice
of their cybernetic competitor.
"At the actual tournament, [Shortz] put up these big
Proverb's answers, and participants in the tournament would gather around after each
round to see them and talk about it," said Littman, who communicated with Shortz by e-mail over the course of the
On one tournament puzzle,
Proverb had the 20th highest score. But
on another, especially tricky puzzle, the program only got three words correct
and came in at 251st place.
All 78 clues in the brain-bender that threw
"spoonerized" with switched initial consonants. For example, instead of the clue
"Nome is here" for the answer
"Alaska," the puzzle used
"Home is near." Another clue, looking for the answer
"Alas," gave the clue as
"Moe is we" instead of
"Woe is me."
When the spoonerisms were corrected by hand,
Proverb was able to solve the puzzle perfectly.
Computers loaded with the most encyclopedic databases of facts, including
volumes of solutions to past
crossword puzzles, would be apt to miss clues like those, Shortz suggested.
"Is the computer going to be able to solve the clues involving puns and
wordplay? I don't think so," Shortz wrote in an introduction to a volume of The New York Times daily
He gave examples of
clues he felt computers would miss, such as
"event that produces big bucks," referring to
"pumpkin-colored," translating as
"it might have quarters downtown," meaning
Keim agreed that such trickery did present a challenge.
"Crosswords involve wordplay and puns and humor -- sometimes it felt a
little impossible," he said.
A computer doesn't have to understand a clue to come up with a right answer,
though, Littman said. All it has to do is recognize statistical relationships
between words and decide on the most probable answer.
"ace," -- common crossword fare -- is a
prime example, Littman said.
"In a lot of the clues that we've got for 'ace,' the word 'sleeve' turns up," he said.
"So there's this wonderful association between 'ace' and 'sleeve' in our
database. If you see the word 'sleeve' in the clue and the answer is three
Proverb is going to
tend to suggest 'ace' as an answer, without really understanding what the clue
While the program's product -- a completed puzzle -- looks the same as one
completed by a person, the computer arrives at its solution in a very different
"The human looks at a clue,
writes in a potential answer, then looks at the information in the crossing
grid to see what other words might fit," said Littman.
"So it's back and forth between the grid and the clues."
Proverb attacks a puzzle, it ignores the grid and concentrates on all the
word and letter clues by themselves. Then, tapping into a set of 30 databases
stocked with hundreds of thousands of crossword clues and answers, it embarks
on a detailed search to come up with a
"candidate list" of possible answers to each clue. Only when it has done this for
"down" clue does the program begin the trial-and-error process of fitting the
candidate words into the puzzle grid until it finds the best possible match.
The 30 databases, each of which focuses on a specific type of clues, such as
movies and television or dictionary definitions, run on as
many as 14 computers. The Duke programmers used about 400,000 crossword clues
-- the equivalent of memorizing 14 years worth of daily puzzles -- in the
design of the system.
The Duke team has described its results in a research paper to be presented in
July at the annual conference of the American
Association of Artificial Intelligence in Orlando, Fla.
Littman sees little potential for a commercial product in
"Partly the reason that crossword puzzles are popular is that people like to
solve them," he said.
"And this system would basically solve them for you, and what would be the fun
Greg Keim (firstname.lastname@example.org)
Last modified: Tue Jun 1 20:52:17 EDT 1999