Proverb consists of a set of programs that act as ``experts.'' Each expert knows how to solve a different type of clue and returns as many suggestions of the correct length as it can, marking each candidate answer with a confidence score. For example, the Movie Expert suggests answers for clues like <1954 mutant ants film : THEM> and <Farrow of ``Peyton Place'' : MIA>. In the second case, it actually gives back MIA (0.90), TOM (0.01), KIP (0.01), BEN (0.01), PEG (0.01), ... to make sure it doesn't miss the correct answer. It also knows how to answer fill-in-the-blank clues on character names and movie titles like <``Born in _____'' Cheech Marin film : EASTLA>. Non-movie clues are ignored by the Movie Expert.
The Puzzle Expert has a database of 5133 previously published crossword puzzles (that's about 14 years' worth of puzzles at a rate of one per day). It recognizes clues it has seen before both by exact match and a type of fuzzy match used in the field of information retrieval. It also uses ``data mining'' techniques to find patterns in the clues. It detects that ``X in France'' and ``Nice X'' often have the same answer (because of the French city of Nice). This allows it to solve <Nice story : ETAGE> using the clue <Story in France : ETAGE> in its database.
One expert has a database of compound words and short phrases, and knows how to answer clues like <Kind of duck or letter : DEAD>, <Suffix with switch or sock : EROO>, and <Cogito sum middle : ERGO>. Another expert deals with clues with multiword answers and is a big help on clues like <Don't say another word : BUTTONYOURLIP>, <T.V. Series? : SONYRCAMAGNOVOX>, and <Tall footwear for rappers? : HIPHOPBOOTS>. Other experts focus on the Encyclopedia, Literature, Geography, Music, and Synonyms. Each student in the crossword class contributed at least one expert. Running the complete set of 30 experts on a puzzle takes about 4 minutes distributed over a set of 14 computers.
Once all the experts have made their recommendations, complete with confidence scores, another specialized program takes over. Its job is to fit the suggested words into the puzzle grid, maximizing an overall measure of confidence. It uses as many high-confidence words as it can to accomplish this. This grid-filling process takes approximately 10 minutes after the experts have compiled their recommendation lists.
To test the system, we had Proverb attack a sample of 370 recent puzzles. It averaged 95.3% words correct and 98.1% letters correct per puzzle with perfect solutions on 46% of the puzzles. Puzzles were taken from seven different sources: New York Times (89.5% words correct), LA Times (98.0% words correct), USA Today (96.8% words correct), TV Guide (96.2% words correct), and several crossword syndicates. The New York Times puzzles were quite interesting, because the puzzles designed to be easier for humans (Monday through Wednesday: 95.5% words correct) were easier for Proverb than the puzzles designed to be harder for humans (Thursday through Sunday: 85.0%). Its performance is well above that of casual human crossword solvers.
We also tested Proverb using puzzles from the 1998 American Crossword Puzzle Tournament. Based on its accuracy and solving time, Proverb would have come in 213th place out of 252 tournament participants. On sufficiently fast computers, Proverb's standing would improve to 190th place. We are planning a number of changes to the system to further improve its accuracy and coverage and expect its performance to climb in the coming months.
Proverb is a ``Probabilistic Cruciverbalist'' in that it is a crossword puzzle solver (cruciverbalist) that bases its decisions on probability theory (confidence scores). Thus, Proverbs 022:021 is relevant: ``That I might make thee know the certainty of the words of truth...'' In other words, ``I am trying to tell you the probability of the correct answers.''