The Intelligent Word Prediction Project

Intelligent Word Prediction uses knowledge of syntax and word frequencies to predict the next word in a sentence as the sentence is being entered, and updates this prediction as the word is typed. The intended application of this project is to accelerate and facilitate the entry of words into an augmentative communication device by offering a shortcut to typing entire words. A prototype version has been written in LISP, using an ATN probabilistic parser and a lexicon containing word frequencies and subcategorizations.

Table of Contents

Purpose
Method
Progress
Acknowledgements

Contributors

Pat Demasco, Kathy McCoy, Mark Conrad, Chris Pennington , Peter Vanderheyden

Lisa Michaud -- michaud@asel.udel.edu

Last modified: Fri Jan 30 15:39:34 EST 1998

The goal of this project is to develop a word prediction system to assist individuals with disabilities who use alternative and augmentative communication (AAC). By modelling the linguistic constraints that apply to a sentence, the system predicts the next word to be produced in the sentence, and narrows the prediction search as each subsequent letter of the word is entered. Applying this technique to AAC would facilitate and accelerate augmented communication by offering a shortcut to typing all the words in their entirety.

A long term goal is to unify this project with our Compansion and Language Representation Database projects, in order to incorporate a more sophisticated body of linguistic information into augmentative communication systems. Consistent with this goal, an important implementation goal is to develop code that is object-oriented and modular.

The first word of the sentence is predicted on the basis of the frequency of words in the sentence-initial position. Each word is incorporated into an augmented transition network (ATN), with a branch for every acceptable syntactic interpretation of the sentence segment so far.

The grammar contains probabilities for each ATN structure, and the lexicon contains the frequencies with which words occur in each of their possible subcategorizations. Combining these statistics results in a list of possible next words, the 5 most probable of which are offered to the user. The user may reject the predicted word list and begin entering the word by hand. As each letter is entered, words that are not consistent with the word segment entered so far are filtered out, and the prediction list is updated.

It is assumed that the words in the sentence are entered in correct grammatical order. The accuracy of the system's predictions will depend on the accuracy of the syntactic rule probabilities and word frequencies.

The current system is written in LISP, and is in the process of being converted to CLOS (Common Lisp Object System). The grammar is quite extensive, but not yet complete. The lexicon contains over 12000 word subcategorizations (e.g. 'wrong' as both a noun and an adjective).

Future work includes developing a more extensive dictionary and grammar.

This work has been supported by a Rehabilitation Engineering Center grant from the National Institute on Disability and Rehabilitation Research. Additional support has been provided by the Nemours Foundation.