WORD COMPANSION: ALLOWING DYNAMIC WORD ABBREVIATIONS Patrick Demasco, Matthew Lillard, Kathleen McCoy Applied Science and Engineering Laboratories University of Delaware and AI DuPont Institute (c) 1989 RESNA Press. Reprinted with permission. Abstract In this paper we describe a system, which we call word compansion, which is designed to enhance the communication rate of a disabled individual who is using a spelling technique to enter his/her desired message. The system saves the user key strokes by allowing the input of individual words to be in abbreviated form. This work is novel in that the abbreviations are not decided in advance, but are dynamically created by the user at input time. Introduction Up until now, abbreviation expansion has been the method used by systems which allow abbreviated word input from the user. Abbreviation expansion is a technique that allows the user the ability to input to the system a sequence of letters (that do not correspond to a word) as an abbreviation for a word or phrase [4]. Letter sequences are typically delimited with spaces. The system performs a lookup to a table containing the abbreviation and its true word or phrase representation (expansion). If an expansion is returned, it is substituted for the abbreviation in the text stream. Word compansion is similar to abbreviation expansion in function but has important differences. The biggest difference is seen from the point of view of the user of the system. In abbreviation expansion the compressed sequences must be decided in advance and memorized by the user. While this constraint allows the system to work very fast, it places an unreasonable burden on the user. In word compansion this burden is lifted. Using word compansion the user is permitted to input novel compressions for words. The system, using its knowledge of the language, knowledge of the likelihood of certain sequences of letters to be part of an English word, and knowledge of how a human is most likely to form a compression of a word, returns a set of one or more possible interpretations for the user's input. Background In doing word compansion, the major task is to come up with the correct expansion of the user's dynamically compressed word in a reasonable amount of time. Our work has borrowed from work in word recognition for heuristics to choose the correct expansion, and from abbreviation systems for insight into the kinds of abbreviations we might expect. Word Recognition A great deal of psychological literature has been devoted the cognitive processes that occur during the activity of reading. One major area of study is word recognition. Most models of word recognition are based on a number of cooperative processes that transform visual information into a semantically meaningful collection of words. Out of this area of study, we have found several relevant concepts. Orthographic- Venesky and Massaro [5] present a two component model that describes the role of orthographic information in word recognition. Statistical redundancy processes are based on the frequency of occurrence of letter sequences. Rule governed regularity is based on phonological constraints that exist in spoken language. Lexical Access- Burani et. al. [1] discuss lexical access from the point of view of morphological structure. One of the major components of lexical access is word frequency; a higher word frequency results in quicker access. This frequency has two major components: a measure of frequency over the individual's total experience and a local frequency representative of recent experience. Context cues - The effects of context on word recognition are widely disputed [3] although there is some agreement on what the actual mechanism should be. Previous words set up an expectancy that helps in limiting the domain of possible words. Abbreviation Systems Many abbreviation techniques have been developed for use by secretaries for real time transcription of spoken language. Speedwriting is an example of such a system which is in current use [2]. It combines some simple shorthand principles with abbreviation techniques. An example rule is vowel deletion. You can delete all vowels except for those that begin the word. Model A model has been developed for word compansion that incorporates several of the concepts previously discussed. The model enables the user to use two different kinds of abbreviation techniques Upon receiving a letter sequence from the user, the system first checks to see if that sequence is a word in the dictionary or is an element in a user programmed abbreviation expansion table. If it is not, then the system checks to see if the input string is a prefix of a word in the system's dictionary (currently nearly 6000 words). This step enables the user to use an abbreviation such as "prob" for the word "problem". If both of the above steps fail to identify the word, then it is assumed that the input sequence is a dynamically created abbreviation. Our strategy in this case is based on three assumptions (1) the initial letter in the sequence is rarely deleted (with the exception of an e preceding an z as in "extra"), (2) the most commonly deleted elements will be vowels and double consonants, and (3) each letter in the abbreviation string will occur in the target word in the order in which it occurs in the abbreviation. Our strategy works in two directions at the same time: from the input abbreviation to potential words in the dictionary, and from potential words in the dictionary to the abbreviation. This enables us to constrain our search from two directions: we do not consider expansions of the input sequence which cannot correspond to English words, nor do we consider English words which do not have crucial resemblances to the input sequence. Of course, in order for this enterprise to be successful, (a small list of) potential dictionary word candidates must be identified rather quickly. Since retrieving words with a particular prefix can be done quickly, we have chosen this as our retrieval method. Given the user's input sequence we must identify possible prefixes for the target word. Based on our above assumptions, we go to the first position in the input sequence from which letters could have been deleted (between the first two given letters) and create a list of potential prefixes for the target word by inserting all combinations of 0,1, 2, or 3 vowel strings which occur in the words of the language. We then retrieve from the dictionary all words that have any one of the identified prefixes. Each member of this list is compared with the original input string and all members are returned which contain all letters in the original input sequence in the order in which they occur in that sequence. If exactly one word results, this word replaces the input sequence in the user's input string. If more than one possibility results, then the system opens a special window and scans through the list of possibilities ordered by their frequency of occurrence in the language. The user chooses the desired word by hitting any key. Discussion The generality of the method developed allows the user to be most terse in the dynamically created abbreviation. On the other hand, because the system places so few constraints on the user, the system may often be unable to narrow down the choice of words to exactly one. This can be alleviated by customizing the system to the abbreviation strategies of individual users and/or by enhancing the system with information about the syntactic and semantic context of the compressed sequence. Strategy customization may be particularly useful in the second stage of processing (i.e., in matching the input sequence against potential dictionary words). This match currently allows insertion of any number of arbitrary letters. It may be useful to place limits on the number of letters deleted and their type (e.g., vowels). The addition of syntactic and semantic contextual constraints is the subject of continuing research in this area. Acknowledgments This work is supported by Grant #H133E80015 from the National Institute on Disability and Rehabilitation Research. Additional support was provided by the Nemours Foundation. References [1] Burani, C., Salmaso, D., & Caramazza, A. Morphological structure and lexical access. Visible Language. XVIII 4, 1984. [2] Pullis, J. M., Principles of Speedwriting Shorthand. Glencoe Publishing Company, Mission Hills, CA, 1987. [3] Stanovitch, K. E., Nathan, R. G., West, R. F., Vala-Rossi, M. Children's Word Recognition in Context: Spreading Activation, Expectancy, and Modularity. In Child Development, Vol. 56, pp. 1418-1428, 1955. [4] Vanderheiden, G. C., A High Efficiency Flexible Keyboard Input Acceleration Technique: Speedkey. In Proc. of 2nd Int. Conf. on Rehab. Eng., Ottawa, 1954, pp. 353-354. [5] Venesky, R. L., & Massaro, D. W. The role of orthographic regularity on word recognition. In L. Resnick & P. Weaver (Eds.), Theory practice of early reading. Hillsdale, N.J.: Erlbaum, 1979. Contact Patrick Demasco Applied Science and Engineering Laboratories AI DuPont Institute PO Box 269 Wilmington, DE 19599