Proverb: The Herald Sun


Copyright 1999 The Durham Herald Co.  
The Herald-Sun (Durham, N.C.)

April 26, 1999, Monday

Proverb tackles N.Y. Times crosswords

People prove more adept in some puzzlers than Duke computer program


JEAN P. FISHER The Herald-Sun


To look at it, the crossword puzzle grid might be any casual dabbler's half-finished attempt.

Some words are filled in correctly, in confident black. Other rows contain best guesses and nonsense words: No. 20 across -- "seapegost."

But unlike a dashed-off puzzle one might find in a discarded newspaper, no human hand helped to fill in the letters and words steadily plugging into this particular grid, displayed on a computer monitor at Duke University recently.

At least, not in the traditional sense.

The computer program at work solving crosswords in an empty classroom is the product of several humans -- Duke computer scientists, to be precise -- who took exception to New York Times puzzlemaster Will Shortz's assertion that artificial intelligence could never match human guile in solving crossword puzzles.

"The rallying cry when we took up this project was 'we will show him,' " said Michael Littman, a Duke computer scientist who led the team of students who designed the program.

And they did.

"Proverb," the program designed by Littman, with Duke doctoral students Greg Keim and Noam Shazeer and other students and faculty, achieved an average of 95.3 percent words correct when given a sample of 370 crosswords. The puzzles, chosen randomly from The New York Times and several other sources, were each solved in under 15 minutes.

Shortz, when made aware of the program's success, suggested an immediate test against the nation's best human solvers, who competed in the 22nd Annual American Crossword Puzzle Tournament held March 12-14 in Stamford, Conn., and directed by Shortz.

Littman received the tournament puzzles weeks before the competition in order to run them through the program and complete a short analysis about why the computer performed as it did.

The competition included 254 human competitors, and for the first time in its history no one completed all seven puzzles perfectly. Had Proverb actually competed, its score, calculated by a formula that combines accuracy and speed, would have put it in 147th place.

"We were really pleased, but I think we also were a little bit naive about how good people are at this," said Littman. "People solve some of these puzzles very quickly. Four minutes -- bang, perfect solution. It was just very humbling, both as a human being and as a designer of software."

Shortz expressed admiration for Proverb's results, posting them on a Web site with the comment: "The results were so interesting (in fact, so amazing) that I printed them out on large sheets of paper and posted them, along with Michael's analysis, after each round at the event."

Littman, in turn, was intrigued that the human solvers took such acute notice of their cybernetic competitor.

"At the actual tournament, [Shortz] put up these big posterboards with Proverb's answers, and participants in the tournament would gather around after each round to see them and talk about it," said Littman, who communicated with Shortz by e-mail over the course of the project.

On one tournament puzzle, Proverb had the 20th highest score. But on another, especially tricky puzzle, the program only got three words correct and came in at 251st place.

All 78 clues in the brain-bender that threw Proverb were "spoonerized" with switched initial consonants. For example, instead of the clue "Nome is here" for the answer "Alaska," the puzzle used "Home is near." Another clue, looking for the answer "Alas," gave the clue as "Moe is we" instead of "Woe is me."

When the spoonerisms were corrected by hand, Proverb was able to solve the puzzle perfectly.

Computers loaded with the most encyclopedic databases of facts, including volumes of solutions to past crossword puzzles, would be apt to miss clues like those, Shortz suggested.

"Is the computer going to be able to solve the clues involving puns and wordplay? I don't think so," Shortz wrote in an introduction to a volume of The New York Times daily crossword puzzles.

He gave examples of clues he felt computers would miss, such as "event that produces big bucks," referring to "rodeo"; "pumpkin-colored," translating as "orange"; or "it might have quarters downtown," meaning "meter."

Keim agreed that such trickery did present a challenge.

"Crosswords involve wordplay and puns and humor -- sometimes it felt a little impossible," he said.

A computer doesn't have to understand a clue to come up with a right answer, though, Littman said. All it has to do is recognize statistical relationships between words and decide on the most probable answer.

The word "ace," -- common crossword fare -- is a prime example, Littman said.

"In a lot of the clues that we've got for 'ace,' the word 'sleeve' turns up," he said. "So there's this wonderful association between 'ace' and 'sleeve' in our database. If you see the word 'sleeve' in the clue and the answer is three letters, Proverb is going to tend to suggest 'ace' as an answer, without really understanding what the clue is asking."

While the program's product -- a completed puzzle -- looks the same as one completed by a person, the computer arrives at its solution in a very different way.

"The human looks at a clue, writes in a potential answer, then looks at the information in the crossing grid to see what other words might fit," said Littman. "So it's back and forth between the grid and the clues."

When Proverb attacks a puzzle, it ignores the grid and concentrates on all the word and letter clues by themselves. Then, tapping into a set of 30 databases stocked with hundreds of thousands of crossword clues and answers, it embarks on a detailed search to come up with a "candidate list" of possible answers to each clue. Only when it has done this for every "across" and "down" clue does the program begin the trial-and-error process of fitting the candidate words into the puzzle grid until it finds the best possible match.

The 30 databases, each of which focuses on a specific type of clues, such as movies and television or dictionary definitions, run on as many as 14 computers. The Duke programmers used about 400,000 crossword clues -- the equivalent of memorizing 14 years worth of daily puzzles -- in the design of the system.

The Duke team has described its results in a research paper to be presented in July at the annual conference of the American Association of Artificial Intelligence in Orlando, Fla.

Littman sees little potential for a commercial product in Proverb.

"Partly the reason that crossword puzzles are popular is that people like to solve them," he said. "And this system would basically solve them for you, and what would be the fun of that?"


Greg Keim (keim@cs.duke.edu)
Last modified: Tue Jun 1 20:52:17 EDT 1999