WORD COMPANSION: ALLOWING DYNAMIC WORD ABBREVIATIONS

Patrick Demasco, Matthew Lillard, Kathleen McCoy
Applied Science and Engineering Laboratories
University of Delaware and AI DuPont Institute

(c) 1989 RESNA Press. Reprinted with permission.

Abstract

In this paper we describe a system, which we call word compansion,
which is designed to enhance the communication rate of a disabled
individual who is using a spelling technique to enter his/her desired
message. The system saves the user key strokes by allowing the input
of individual words to be in abbreviated form. This work is novel in
that the abbreviations are not decided in advance, but are dynamically
created by the user at input time.

Introduction

Up until now, abbreviation expansion has been the method used by
systems which allow abbreviated word input from the user. Abbreviation
expansion is a technique that allows the user the ability to input to
the system a sequence of letters (that do not correspond to a word) as
an abbreviation for a word or phrase [4]. Letter sequences are
typically delimited with spaces. The system performs a lookup to a
table containing the abbreviation and its true word or phrase
representation (expansion). If an expansion is returned, it is
substituted for the abbreviation in the text stream.

Word compansion is similar to abbreviation expansion in function but
has important differences. The biggest difference is seen from the
point of view of the user of the system. In abbreviation expansion the
compressed sequences must be decided in advance and memorized by the
user. While this constraint allows the system to work very fast, it
places an unreasonable burden on the user. In word compansion this
burden is lifted. Using word compansion the user is permitted to input
novel compressions for words. The system, using its knowledge of the
language, knowledge of the likelihood of certain sequences of letters
to be part of an English word, and knowledge of how a human is most
likely to form a compression of a word, returns a set of one or more
possible interpretations for the user's input.

Background

In doing word compansion, the major task is to come up with the
correct expansion of the user's dynamically compressed word in a
reasonable amount of time. Our work has borrowed from work in word
recognition for heuristics to choose the correct expansion, and from
abbreviation systems for insight into the kinds of abbreviations we
might expect.

Word Recognition
A great deal of psychological literature has been devoted the
cognitive processes that occur during the activity of reading. One
major area of study is word recognition. Most models of word
recognition are based on a number of cooperative processes that
transform visual information into a semantically meaningful collection
of words. Out of this area of study, we have found several relevant
concepts.

Orthographic- Venesky and Massaro [5] present a two component model
that describes the role of orthographic information in word
recognition. Statistical redundancy processes are based on the
frequency of occurrence of letter sequences. Rule governed regularity
is based on phonological constraints that exist in spoken language.

Lexical Access- Burani et. al. [1] discuss lexical access from the
point of view of morphological structure. One of the major components
of lexical access is word frequency; a higher word frequency results
in quicker access. This frequency has two major components: a measure
of frequency over the individual's total experience and a local
frequency representative of recent experience.

Context cues - The effects of context on word recognition are widely
disputed [3] although there is some agreement on what the actual
mechanism should be. Previous words set up an expectancy that helps in
limiting the domain of possible words.

Abbreviation Systems
Many abbreviation techniques have been developed for use by
secretaries for real time transcription of spoken
language. Speedwriting is an example of such a system which is in
current use [2]. It combines some simple shorthand principles with
abbreviation techniques.  An example rule is vowel deletion. You can
delete all vowels except for those that begin the word.

Model

A model has been developed for word compansion that incorporates
several of the concepts previously discussed. The model enables the
user to use two different kinds of abbreviation techniques Upon
receiving a letter sequence from the user, the system first checks to
see if that sequence is a word in the dictionary or is an element in a
user programmed abbreviation expansion table. If it is not, then the
system checks to see if the input string is a prefix of a word in the
system's dictionary (currently nearly 6000 words).  This step enables
the user to use an abbreviation such as "prob" for the word "problem".

If both of the above steps fail to identify the word, then it is
assumed that the input sequence is a dynamically created
abbreviation. Our strategy in this case is based on three assumptions
(1) the initial letter in the sequence is rarely deleted (with the
exception of an e preceding an z as in "extra"), (2) the most commonly
deleted elements will be vowels and double consonants, and (3) each
letter in the abbreviation string will occur in the target word in the
order in which it occurs in the abbreviation.

Our strategy works in two directions at the same time: from the input
abbreviation to potential words in the dictionary, and from potential
words in the dictionary to the abbreviation. This enables us to
constrain our search from two directions: we do not consider
expansions of the input sequence which cannot correspond to English
words, nor do we consider English words which do not have crucial
resemblances to the input sequence.

Of course, in order for this enterprise to be successful, (a small
list of) potential dictionary word candidates must be identified
rather quickly. Since retrieving words with a particular prefix can be
done quickly, we have chosen this as our retrieval method. Given the
user's input sequence we must identify possible prefixes for the
target word. Based on our above assumptions, we go to the first
position in the input sequence from which letters could have been
deleted (between the first two given letters) and create a list of
potential prefixes for the target word by inserting all combinations
of 0,1, 2, or 3 vowel strings which occur in the words of the
language. We then retrieve from the dictionary all words that have any
one of the identified prefixes.

Each member of this list is compared with the original input string
and all members are returned which contain all letters in the original
input sequence in the order in which they occur in that sequence. If
exactly one word results, this word replaces the input sequence in the
user's input string. If more than one possibility results, then the
system opens a special window and scans through the list of
possibilities ordered by their frequency of occurrence in the
language. The user chooses the desired word by hitting any key.

Discussion

The generality of the method developed allows the user to be most
terse in the dynamically created abbreviation. On the other hand,
because the system places so few constraints on the user, the system
may often be unable to narrow down the choice of words to exactly one.
This can be alleviated by customizing the system to the abbreviation
strategies of individual users and/or by enhancing the system with
information about the syntactic and semantic context of the compressed
sequence. Strategy customization may be particularly useful in the
second stage of processing (i.e., in matching the input sequence
against potential dictionary words). This match currently allows
insertion of any number of arbitrary letters. It may be useful to
place limits on the number of letters deleted and their type (e.g.,
vowels). The addition of syntactic and semantic contextual constraints
is the subject of continuing research in this area.

Acknowledgments

This work is supported by Grant #H133E80015 from the National
Institute on Disability and Rehabilitation Research. Additional
support was provided by the Nemours Foundation.

References

[1] Burani, C., Salmaso, D., & Caramazza, A. Morphological structure
and lexical access. Visible Language. XVIII 4, 1984.

[2] Pullis, J. M., Principles of Speedwriting Shorthand. Glencoe
Publishing Company, Mission Hills, CA, 1987.

[3] Stanovitch, K. E., Nathan, R. G., West, R. F., Vala-Rossi,
M. Children's Word Recognition in Context: Spreading Activation,
Expectancy, and Modularity. In Child Development, Vol. 56,
pp. 1418-1428, 1955.

[4] Vanderheiden, G. C., A High Efficiency Flexible Keyboard Input
Acceleration Technique: Speedkey. In Proc. of 2nd Int. Conf. on
Rehab. Eng., Ottawa, 1954, pp. 353-354.

[5] Venesky, R. L., & Massaro, D. W. The role of orthographic
regularity on word recognition. In L. Resnick & P. Weaver (Eds.),
Theory practice of early reading. Hillsdale, N.J.: Erlbaum, 1979.

Contact

Patrick Demasco
Applied Science and Engineering Laboratories
AI DuPont Institute
PO Box 269
Wilmington, DE 19599