[Archive: 1 May 1999]


Crossword wizard

Jonathan Beard

IS NOTHING SACRED? First computers learnt how to defeat people at chess and backgammon, and now, thanks to Duke University in Durham, North Carolina, they can even take us on at solving crossword puzzles--and sometimes win.

In March, Michael Littman of Duke University's computer science department, took his crossword-solving software, Proverb, to the American Crossword Puzzle Tournament in Stamford, Connecticut. While Proverb didn't compete in the contest, Littman reports that after analysing its performance in solving the same 15-minute puzzles presented to the contestants, it would have been placed 147th out of the 254 human solvers. He will present his results to a conference of the American Association of Artificial Intelligence in Orlando, Florida, in July.

Proverb, which runs on several PCs simultaneously, includes a complex array of databases and query algorithms, written in four different programming languages. It tackles puzzles in a way completely alien to human solvers. Faced with a typical quick newspaper puzzle, with clues such as "Sicilian spewer" (answer: ETNA), Proverb starts by sending all the clues and word lengths to each of its 30 modules, most of which are databases.

"Each software module then returns with a list--from dozens to over a thousand--of possible answers, ranking them according to the probability that each is correct," Littman told New Scientist. One database comprises 400 000 clues and associated answers from newspaper crosswords over the past 14 years.

Another module is what Littman calls a "degenerate" database which only looks for answers to clues based on direction: if words such as "point", "compass" or "direction" appear in clues for 3-letter words, the database will return answers like NNW and NNE--favourites of some puzzle compilers. One of the largest modules Proverb taps into is the ever-expanding Internet Movie Database (www.imdb.com), which is so big that Littman's team wrote a dedicated search engine to prevent searches taking too long and slowing down the overall crossword puzzle solving process.

Proverb was written in four computer languages because each has its own benefits, depending upon the task at hand. Simple searches to match crossword clues were written in PERL, Littman says, as it's good at scanning arbitrary text files and extracting information from them. The C language was used for number-crunching probabilities to decide on the best answer. Stringing the ensemble together were programs written in C++ and Java that coordinated the various modules.

Proverb takes the top-ranked trial answers--lists from each database module--and puts another program, "Merger", to work. "Merger balances the results from the lists, assigning a single probability to a master list of possible answers," Littman says. Finally, a program called Solver takes Merger's output and plugs the highest-scoring words into the crossword grid, then repeats the process, looking for the best fit until it runs out of time.

The idea for Proverb was hatched in 1997 when Will Shortz, puzzle editor at The New York Times, stated--shortly after IBM's Deep Blue computer defeated Garry Kasparov at chess--that computers would never understand English well enough to match humans. Of course, the computer does not really understand English. Littman says: "All it can do is find statistical relationships between words. But the [crossword] tournament showed that this is good enough to solve crosswords at a competitive level."

  • Find out more about Proverb, the Probabilistic Cruciverbalist, at www.cs.duke.edu/~keim/proverb/

    From New Scientist, 1 May 1999


    © Copyright New Scientist, RBI Limited 2000