VOCABULARY SELECTION FOR INTELLIGENT PARSING IN AN AAC SYSTEM FOR
APHASICS

Bruce R. Baker, Semantic Compaction Systems
Kathleen McCoy, University of Delaware / A.I. DuPont Institute
Sheela Stuart, University of Nebraska
Eric H. Nyberg 3rd, Carnegie Mellon University / Semantic Compactions

(c) 1991 RESNA Press. Reprinted with permission.

Abstract

A communication aid which incorporates a parser with limited
intelligence is under development, and is intended help individuals
with aphasia to communicate. There are three important issues that
must be addressed in the design of such a system: language
representation (how to represent words, phrases, and topics in a way
accessible to a person with aphasia), vocabulary selection (which
words, etc., to include in the system), and syntactic and pragmatic
defaults (what kind of sentence structures should be included in the
parser's generative capacities). This paper explores a methodology for
designing an intelligent communication device that pays particular
attention to these issues. Much of what is discussed should also be
useful for designing conventional communication systems as well

Background

The term aphasia refers to an acquired disturbance of communication
resulting from damage to areas of the brain that are responsible for
language function. Aphasia varies in terms of severity and predominant
symptoms, but for most people, aphasia involves problems in talking,
listening, reading, writing, and gesturing. Other motor and sensory
problems, such as dysarthria and apraxia, frequently coexist with
aphasia (Katz, 1990, p. 167). Defining aphasia in terms of general
versus specific language impairment leads to controversy about
classifying persons with aphasia into various types.

Those who require classification of specific impairment (Rosenbek,
LaPointe, & Wertz, 1989; Goodglass & Kaplan, 1972; Kertesz, 1979)
divide aphasic patients into groups according to salient
symptoms. There are numerous classification systems. Some of the most
popular systems reflect universally observed symptomatic
differences. Some people with aphasia talk a lot, while others speak
very little, leading to the binary classification of fluent versus
nonfluent aphasia (Goodglass & Kaplan, 1983b). Some aphasic patients
have predominant problems in understanding and others have predominant
problems in word finding. Thus classification may be made on these
bases: expressive-receptive aphasia (Weisenburg & McBride, 1935)
versus taxonomic categorization of aphasia (Kertesz, 1979).

Supporters of the generalist approach (Darley, 1982; Schuell, Jenkins
& Jimenez-Pabon, 1964) resist categorization, and maintain that
patients suffering from aphasia have in common symptoms which can be
described as impairment of the capacity for interpretation and
formulation of language symbols. Such symptoms include multimodal loss
or reduction in efficiency of the ability to decode and encode
conventional meaningful linguist elements (such as morphemes and
larger syntactic units), reduced availability of vocabulary, reduced
efficiency in application of syntactic rules, reduced auditory
retention span, and impaired efficiency in input and output channel
selection (Darley, 1982).

Augmentative/alternative Communication (AAC) systems which would be
useful for people with aphasia must incorporate features which address
both the diversity and the commonality of this population. For
example, it may seem as though a system which provides speech output
for pre-stored sentences might be quite successful for an individual
with aphasia who is unable to formulate and articulate his or her own
sentences. However, a sentence vocabulary can be difficult to process
for many individuals with aphasia, because they may have difficulty
placing a sentence lexicon into working memory. Pictorial rather than
orthographic indices have been shown to improve the ability of some
aphasic individuals to access vocabulary (Steele, 1987), yet a system
making use of single-meaning pictures requires a picture for each
vocabulary item, making it impractical for all but the most limited
vocabularies. The generative power of a word-based system has been
deemed beyond the reach of many individuals with aphasia owing to
their problems with syntax and lexical access, although some systems
with elaborate hierarchical indices have been proposed (Steele, 1987).

Identifying the major needs of a broad range of individuals with
aphasia would seem to supply a direction for AAC application of
technology.  Kraat (1990) states that early research and clinical
reports suggest that AAC techniques might have three important roles
in aphasia treatment:

"First, as a compensatory or alternate means of communication in lieu
of spoken language, secondly, as a facilitation technique for the
re-acquisition of spoken language skills; and thirdly, as an
associative ' link' to enable spoken language skills to take place.'-
(p. 322)

The task of this paper is to report recent efforts and progress toward
providing a means of accomplishing the first role. A new type of
electronic communication device geared toward speech output for this
population is currently under development. The hope is that the device
will help people with aphasia to overcome both the lexical access
problem (by providing an appropriate interface for word selection) and
the syntactic production problem (by providing an intelligent parser
that can generate well-formed sentences from an underspecified input).

Statement of the Problem

The first problem that arises in the design of a communication aid for
people with aphasia is the representation of vocabulary, It is a
challenge to represent a large vocabulary in a transparent manner for
individuals experiencing substantial lexical access problems. The
iconic technique under exploration may provide individuals with a
cognitively syntonic representation of several hundred words.

Even with a transparent language representation, the size of the
vocabulary that such an individual can access is necessarily
limited. The second problem, thus, is in selecting an appropriate
vocabulary. The vocabulary must be large enough for gratifying
interaction, but small enough so that its access does not overwhelm an
individual with aphasia

The goal is to provide an individual who has less than complete syntax
with the ability to create well-formed sentences by entering just a
string of content words, Thus, the third problem that must be
addressed is configuring the syntactic prosthesis. The intelligent
parser must be able to give an interpretation that is complete yet
appropriate to the user's style and needs. It is certainly possible to
decide in advance that certain more likely interpretations shall be
made for given types of sentences, but the system must have a
mechanism for incorporating the default interpretations that best
fit the particular individual.

A large number of individuals with aphasia experience their lesions in
the 7th or 8th decade of life. As adults reach their 60s, 70s and 80s,
there are changes in many aspects of their lives. One of the areas
that reflects this change is the way people of this age take part in
conversation. Older persons are listened to in ways that are different
from younger persons. Information expected from older persons is, at
least in part, determined by the age-grade role. The type of request
for information from them is often performed as though the aged person
was a repository of cultural lore. It is hypothesized that this role,
along with inherent biological changes, may cause older persons to
recall and recode into a story-like mode, which often reflects the
extensive elaboration of memory information and serves to make it
highly digestible for the listener (Mergler & Goldstein, 1983).

The foregoing paragraph serves to illustrate the notion that elderly
individuals, who comprise the majority of individuals with aphasia,
have quite different communication needs from those of the general
population. This certainly has an impact on the size and content of an
effective vocabulary for such individuals; the story-telling mode of
communication requires a rich vocabulary which may intersect with, but
is not limited to, the everyday vocabulary most often associated with
electronic communication aids. In addition, the types of sentences
favored by these individuals can impact on the type of syntactic
processing that should be available in a successful syntactic
prosthesis. For example, the use of anaphora, ellipsis, and
conjunction decline with the age of the storyteller and with the
complexity of the narratives (Kemper, Rash, Kynette and Norman, 1990).

Approach

Language Representation

A common problem experienced by people with aphasia is the failure to
access the lexical items which correctly express the semantic meaning
the individual wishes to communicate. An iconic interface may be
helpful to such individuals because it may be easier to identify
pictures or icons which represent the meaning they wish to express. In
order to make this reasonable in terms of the physical layout of an
interface, it is not enough to use single-meaning key actuations,
since this would require an interface with 300 keys to represent 300
words. Instead, the project utilizes an iconic representation approach
requiring two key actuations for each selection, which can therefore
represent 300 words using far fewer keys. Consider, for example, the
selection of the noun sleeve. The first actuation would be of a key
representing the semantic category of the desired noun (e.g.,
CLOTHING). The second key in the sequence would be selected from a set
of more complicated icons, implicitly representing items from several
semantic categories. The multiple meanings associated with the second
icon would be disambiguated based on the first selection. For example,
if CLOTHING has been selected then an icon picturing a POLICEMAN might
result in the selection of the noun hat, while the selection of an ARM
icon illustrating an arm holding an ice cream cone might result in the
selection of the noun sleeve. Note that the same icons can trigger
different nouns if a different category has been chosen, e.g.,
selecting the ARM icon when the FOOD icon has been selected might give
us ice cream.

The utility of such a representational scheme is being developed
through interactions with able-minded individuals in their 7th and 8th
decades, as a prelude to testing on individuals with aphasia.  We
hypothesize that this representation technique will support lexical
access of several hundred lexical items for this population using a
relatively small number of keys.

Vocabulary Selection

Our goal is not merely to provide an unstructured "word list" for this
set of clients; rather, we feel that the following steps are necessary
in the creation of a complete model of the client vocabulary:


o Corpus Acquisition.  Using appropriate data collection techniques, a
large representative corpus is gathered.

o Language Analysis.  The corpus is analyzed in order to answer the
following questions about the language model:

	1. What words and classes of words are used?
	2. What syntactic structures are commonly used?
	3. What semantic concepts underly the communication?
	4. What pragmatic goals are evidenced in the communication?

o Language Model The results of language analysis are compiled into
the lexicon, syntactic rules, and semantic concepts needed to design
and implement the communication aid.

Syntactic Defaults

The job of the intelligent parsing component in a communication aid is
to determine which of the available syntactic patterns best fits a
particular input given by the individual with aphasia. The knowledge
required to perform this task successfully results from the
construction of a language model for a particular client group
(Fristoe and Lloyd, 1980).  Construction of the language model will
reveal patterned relationships between what the individual wishes to
express and the syntactic patterns commonly used by non- aphasic
individuals in similar life circumstances.

Once the client has selected a sequence of icons that express the
intended communication, the parser must "fill in the gaps" left behind
owing to a lack of syntactic knowledge on the pan of the client. Some
individuals with language loss are prone to lapses in correct word
order and the omission of function words, such as determiners and
prepositions. For example, the user might key in the icon sequences
for the words TABLE CUP PUT when the intended communication is Put the
cup on the table. In this case, the intelligent parser must determine
that CUP is the object of PUT, and that TABLE is the locative of PUT,
and reorder the words appropriately. The parser must also add any
missing determiners (like the) and prepositions (like on).

Implications

Corpus Acquisition.  Cerebro-vascular accidents (CVAs) causing aphasia
often strike individuals in their 7th and 8th decades (60's and
70's). Stuart and Beukelman (in press) examine the topics and lexica
used by 5 non- aphasic individuals in this age group. While these
individuals are not aphasic, it is reasonable to expect that their
vocabulary and syntax needs are similar to those of aphasics in this
age group (Holland, 1975).  Beukelman and Stuart's data collection
methodology has resulted in the recording of a large amount of
previously unavailable data concerning the vocabulary and topics
prevalent in an age group commonly affected by aphasia. These data and
the language corpora resulting from Stuart's subsequent work (Stuart,
forthcoming) form the basis of the vocabulary being developed.

Language Analysis. The analysis of the acquired corpus involves
morphological and syntactic analysis of each sentence to determine not
only the actual word forms present, but also the underlying lexical
form and inherent meaning of each word. For example, the verb throw
can appear in various surface forms.  If a system fails to perform
morphological analysis, it will be unable to determine that throws and
threw are both forms of the same verb. In addition, if we fail to
perform syntactic and semantic analysis, we will conflate the
occurrence of a single form of throw in sentences like John threw up
and John threw the ball. The key point is that the same surface form
can be used to indicate different meanings, depending on the
surrounding words (i.e., syntactic structure). It should be noted that
syntax is sufficiently rich to render ineffectual the use of simple
two-word co-occurrences; for example, in John threw his hands up, an
entire noun phrase is interposed between the verb and its particle. In
this case, the verb and its panicle can only be related through a more
complete syntactic analysis. Without this type of detailed analysis,
broad classes of words (such as phrasal verbs, non-neighboring
collocations, etc.) cannot be distinguished on the basis of key-word
analysis only. It is also difficult to appreciate the pragmatic
communication goals of the client group unless this type of analysis
is performed, since the overall desire expressed by a particular
communication act depends quite heavily on its syntax and semantics.

Language Model. Once the corpus has been analyzed, a language model is
constructed that includes not only the selected vocabulary, but also a
set of syntactic constructions, pragmatic goals, and semantic concepts
that must be present to support communication by the client
group. This is necessary to support the subsequent design and
development of a communication aid for the particular client group,
since not only the vocabulary itself but also the syntax, semantics,
and pragmatics must also be encoded in the device (McCoy, et al.,
1990).

Discussion

The use of intelligent parsing in augmentative communication has been
a distant dream for many years. The actual development of such a
system is now at hand.  Older adults with aphasia have been selected
as its first target population, because the needs of this community
are underserved, and the potential benefits of intelligent parsing are
great. A substantial corpus, reflecting the actual speech and language
use of this population has been gathered and is now the object of
attention by computational linguists and speech pathologists in 3
major centers of research.

Some form of intelligent parsing may hold great promise in the design
of AAC systems for aphasic individuals. The strengths of intelligent
parsing (filling in missing words and re-ordering scrambled input)
complement the difficulties of individuals with reduced language
function. In addition, an intelligent parsing system that utilizes an
iconic interface can make that capability available in a pictorial
form that might be easier for aphasics to access, thus addressing the
important problem of lexical access faced by individuals with aphasia.

To make intelligent parsing successful for a broad range of clients,
we must envision not a single system with a single vocabulary, but
several systems with vocabularies tailored for particular client
groups and indeed particular clients. The effectiveness of intelligent
parsing techniques can only be as effective as the the amount of care
taken to acquire and support the vocabulary and language model
required by the particular client or client group. Note we are not
suggesting that our initial corpora vocabulary will in itself be
sufficient for all clients. Indeed, the content of individual
"stories" must be client specific, and will draw on both the common
vocabulary and a vocabulary specific to the particular client. The
client-specific vocabulary must also be acquired and made available in
the communication aid.  However, analysis of the collected corpus
should provide us with a set of contextual guidelines which will make
it much easier to query close family members for vocabulary content
specific to a given individual in specific communication situations.

Acknowledgments

This work is supported in part by Grant #H133E80015 from the National
Institute on Disability and Rehabilitation Research, Support was also
provided by the Nemours Foundation.

References

Darley. F. (1982).  Aphasia. Philadelphia: W. B. Saunders.

Fristoe. M. & L Lloyd (1980). Planning an initial expressive sign
lexicon for persons with severe communication impairment Journal of
Speech and Hearing Disorders, 45:170-IS0.

Goodenough-Trepagnier, C. (1959) VIC performance-effect of grammatical
category.  Proceedings RESNA 12th Annual Conference, New
Orleans. Louisiana, pp. 143-144.

Goodglass, H. & Kaplan, E. (1972). The assessment of aphasia and
relaxed disorders.  Philadelphia: Lea & Febiger

Goodglass, H. & Kaplan, E. (1983b). The assessment of aphasia and
related disorders (2nd ed.). Philadelphia: Lea & Febiger.

Holland, A. (1975) Language therapy for children: Some thoughts on
context and content, Journal of Speech and Hearing Research, 40:
514-523.

Katz,R.C.(1990) Microcomputer applications in research on treatment of
aphasia. In E. Cherow (Ed.), Proceedings of she Research Symposium on
Communication Science and Disorders and Aging,
(pp. 167-176). Rockville, Maryland, American Speech-Language-Hearing
Association.

Kemper, F. Rash, S., Kynette, D., and Norman, S. (1990) Telling
stories: the structure of adult's narratives, European Journal of
Cognitive Psychology, Vol. 2, No.3, p. 205-228.

Kertesz, A. (1979).  Aphasia and associated disorders: Taxonomy,
localization, and recovery. New York: Grune & Stratton.

Kraat, A.W (1990).  Augmentative and alternative communication: does
it have a future in aphasia rehabilitation?  Aphasiology, 4 (4), 321-
338.

McCoy, K., E. Nyberg and B. Baker (1990).  Intelligent AAC Systems:
What Can Be Done Now, Proceedings RESNA 13th Annual Conference,
Washington, DC, June 19.

Mergler, N. and M.  Goldstein (1983). Why are there old people?  Human
Development, 26:72-90.

Rosenbek, J., LaPointe, L, & Wertz, R. (1989) Aphasia: a clinical
approach, Austin, Texas: Pro. ed.

Schuell, H., Jenkins, J., & Jimenez-Pabon, E.  (1964). Aphasia in
adults: Diagnosis, prognosis, and treatment. New York: Hoeber Medical
Division, Harper & Row Publishers.

Steele, R., M. Weinrich, M. Kleczewska, G.  Carlson, and R Wertz
(1987). Evaluating performance of severely aphasic patients on a
computer-aided visual communication system. In R. H. Brookshire, ed.,
Clinical Aphasiology, Minneapolis: BRK Publishers.

Stuart S. (forthcoming).  PhD dissertation (in progress), University
of Nebraska, Lincoln.

Weisenberg, T. & McBride, K. (1935).  Aphasia: A clinical and
psychological study.  New York: Commonwealth Fund.

Bruce R. Baker
Semantic Compaction Systems
801 McNeilly Road
Pittsburgh, PA 15226