An Augmentative Communication Interface
Based On Conversational Schemata

Peter B. Vanderheyden (vanderhe@asel.udel.edu)
Applied Science and Engineering Laboratories
Department of Computer and Information Sciences
University of Delaware / A. I. duPont Institute
Wilmington, DE 19899 USA

Many people with severe speech and motor impairments make use of
augmentative and alternative communication (AAC) systems. These
systems can employ a variety of techniques to organize stored words,
phrases, and sentences, and to make them available to the user. It is
argued in this paper that an AAC system should make better use of the
regularities in an individual's conversational experiences and the
expectations that an individual normally brings into a conversational
context.

An interface and methodology are proposed for organizing and
retrieving sentences appropriate to a particular conversation context,
possibly developed from earlier conversations. These conversations are
represented according to the schema structures discussed by Schank
(1982) as a model for memory and cognitive organization. The interface
allows the user to proceed with minimal effort through conversations
that follow the schema closely, and facilitates the derivation of new
schemata when a conversation diverges from an earlier one. This
interface is intended to operate in parallel with and to complement a
user's existing electronic communication system. Investigations to
consider the effectiveness of the interface and methodology are
planned for the future.

keywords: augmentative communication, natural language processing,
schemata, scripts

An Augmentative Communication Interface
Based On Conversational Schemata

Many people with severe speech and motor impairments make use of
augmentative and alternative communication (AAC) systems. These
systems can employ a variety of techniques to organize stored words,
phrases, and sentences, and to make them available to the user. It is
argued in this paper that an AAC system should make better use of the
regularities in an individual's conversational experiences and the
expectations that an individual normally brings into a conversational
context.

An interface and methodology are proposed for organizing and
retrieving sentences appropriate to a particular conversation context,
possibly developed from earlier conversations. These conversations are
represented according to the schema structures discussed by Schank
(1982) as a model for memory and cognitive organization. The interface
allows the user to proceed with minimal effort through conversations
that follow the schema closely, and facilitates the derivation of new
schemata when a conversation diverges from an earlier one. This
interface is intended to operate in parallel with and to complement a
user's existing electronic communication system. Investigations to
consider the effectiveness of the interface and methodology are
planned for the future.

1 Introduction

The goal of designing a system for augmentative and alternative
communication (AAC) is to facilitate the communication of people who
have difficulty in speech, writing, and sign language. Speech
synthesis and digitally-encoded recorded speech have made it possible
to provide a person with a voice. For many people with severe speech
and motor impairments, keyboards and many other computer input devices
are difficult to use, and sentence production can be very slow. The
design of the interface to the AAC system, therefore, can go a long
way towards enabling the user to communicate with this new voice more
effectively and with less effort.

Early AAC devices were simply boards or books containing symbols such
as letters, words, and pictures. The person using the communication
board pointed at a symbol on the board, and the person with whom they
were conversing was responsible for identifying the symbols pointed
at, and interpreting their meaning. The burden of interpretation was
laid on this second person. This person was also given the power to
control the conversation, and to manage the topic and turns.

Computerized AAC systems provide the augmented communicator with a
voice, but are also able to organize words and sentences so that they
are more easily accessible. Some systems apply natural language
techniques in order to predict the user's next word or to fill in
missing words, making use of lexical, syntactic, and semantic
information within the current sentence (Demasco & McCoy, 1992). In
this way, a well-formed sentence might be produced with less time and
effort.

An AAC system should also take the greater conversational context into
consideration, and this paper develops and discusses one approach for
doing so in order to further facilitate augmented communication.

2 Schemata

2.1 Schemata for stories

When listening to a friend describe the day's events and hearing that
he picked up his clothes from the laundromat, one would generally
assume that the clothes had been cleaned and the friend had paid
before leaving the laundromat with them. How is it that we can infer
these details, if they were never explicitly stated? Schank and
Abelson (1977) offered one explanation. As a result of picking up
clothes at the laundromat many times, we have developed a mental
script to represent the typical sequence of events involved in
visiting the laundromat, and typically the clothes were cleaned and we
paid for them before leaving with them. Scripts are built up during
the course of an individual's life, as a result of the individual's
experiences, perceptions, and interpretations of those experiences.

Processing information by incorporating it into scripts has a number
of advantages. The sheer magnitude of the information is reduced by
storing only once every occurrence that follows the typical sequence
of events. Only when an exception occurs, when, for example, we notice
the spot on a favorite shirt has not been removed, do we store the
details of a new event. By representing the typical sequence of events
for a given situation, scripts also provide a means of inferring
actions that have not yet taken place or have not been explicitly
stated. A script for a familiar situation can also be abstracted and
used to provide initial expectations in a related but novel
situation. To give an example, if we take our clothes to a laundromat
that we have never before visited, we can also generalize our
experiences in the familiar laundromat to apply to this new one.

Schank (1982; informally in 1990) extended and modified the idea of
scripts into a hierarchy of schema structures called MOPs (memory
organization packets). Continuing with the laundromat example and
taking the abstraction a level or two higher, in any novel situation
where we are the customer, we would expect to pay for services
rendered, and a metaMOP would represent this general
customer-in-a-store situation in Schank's system. MetaMOPs represent
high-order goals. This metaMOP would contain separate MOPs for more
specific instances of this goal, such as picking up clothes from the
laundromat, or picking up the car from the mechanic. MOPs can
themselves be hierarchical, so a general laundromat MOP can be
associated with any number of MOPs for specific laundromats. Each MOP
contains scenes, or groups of actions, that occur within a MOP. The
MOP for picking up clothes at the laundromat might include an entrance
scene, a scene for getting the clothes, a scene for paying, and so
on. Each scene has associated with it any number of scripts, where a
script contains the actual actions that have taken place. One script
for the entrance scene, for example, may include opening the door,
walking in, and greeting the shopkeeper.

To demonstrate the use of schemata in understanding stories and
answering questions, Schank (1982) described a number of computer
experiments, including CYRUS. CYRUS contained databases of information
about two former Secretaries of State, integrated new information with
these databases, and provided answers to questions such as "Have you
been to Europe recently?" and "Why did you go there?" Miikkulainen
(1993) developed DISCERN, a computer program that built schemata from
input text, represented the schemata subsymbolically (in terms of
features and probabilities, rather than words), and answered questions
based on the input texts.

2.2 Schemata for conversation

Conversations can be described in terms of the intention and form of
each utterance, and the overall structure in which the utterances of
the participants occur. The question-answering systems of Schank and
Miikkulainen demonstrate that the literal, or locutionary, exchange of
information that is one component of conversation can be described in
terms of schemata.

Kellermann et al. (Kellermann, Broetzmann, Lim, and Kitao, 1989)
described conversations between undergraduate students, meeting for
the first time, as a MOP and identified 24 scenes. These conversations
appeared to have three phases: initiation, maintenance, and
termination. Scenes in the initiation phase, for example, included
exchanging greetings, introducing themselves, and discussing their
current surroundings. Scenes tended to be weakly ordered within each
phase, but strongly ordered between phases, so that a person rarely
entered a scene in an earlier phase from a scene in a later phase. A
number of scenes involved what the investigators called subroutines,
or the common sequence of generalized acts: get facts, discuss facts,
evaluate facts, and so on.

JUDIS (Turner and Cullingford, 1989) is a natural language interface
for Julia (Cullingford and Kolodner, 1986), an interactive system that
played the part of a caterer's assistant and helped the user plan a
meal. JUDIS operated on goals, with each goal represented by a MOP
containing the characters (caterer and customer), scenes (either
mandatory or optional), and the sequence of events. Higher-level MOPs
handled higher-level goals, such as the goal of getting information,
while lower-level MOPs handled lower-level goals, such as answering
yes-no questions.

JUDIS recognized that the person with which it was interacting had
goals of their own, and tried to model that person's goals on the
basis of their utterances. It also recognized that several MOPs could
contain the same scene, and several scenes can contain the same
utterance. Only one MOP executed at a time, but other MOPs that were
consistent with the current state in the conversation were activated
as well.

3 Augmentative Communication Systems

An augmentative communication system must address the abilities and
needs of the individual, and the context in which the system will be
used.

3.1 Word-based, sentence-based, and letter-based systems

On a word-based interface, the user selects individual words and
word-endings. Such systems offer the advantage of a great deal of
flexibility. The user can produce any sentence for which the
vocabulary is available, and is in complete control of the sentence
content, length, and form. However, word-based sentence production
relies heavily on manual dexterity or access rate, and on the
individual's linguistic and cognitive abilities. An individual who can
select only one item per minute will either produce very short
sentences or will produce long gaps in a conversation while selecting
the words. Such a system may not be suited for an individual who has
difficulty generating well-formed or appropriate sentences (Elder and
Goossens', 1994).

Sentence-based systems allow an individual to utter an entire sentence
by selecting a single key sequence, resulting in much faster sentence
production. The sentence that is produced can be prepared to be long
or short, and linguistically well-formed, thus overcoming some of the
difficulties of word-based systems. However, strict sentence-based
systems have shortcomings of their own. The user is limited to the
often small number of sentences prestored in the system. These
sentences are syntactically correct, but cannot be modified to be
appropriate in a given semantic or pragmatic context. As well, the
user can incur additional cognitive load if the interface design makes
the task of locating and retrieving sentences non-trivial (Baker,
1982).

A third class of systems are letter-based, requiring the user to enter
words letter by letter. Letter-based systems can have many of the
strengths and weaknesses of word-based systems. Letter-based input is
flexible, potentially removing even the constraints imposed by system
vocabulary limits. However, the demands of entering each letter can be
even greater than the demands of entering whole words. This is one
reason why some letter-based systems attempt to predict the word as it
is being entered, reducing some demands on the user but possibly
introducing others (Koester & Levine, 1994).

Of course, systems need not be exclusively letter-based, word-based,
or sentence-based. On a system from the Prentke Romich Corporation
called the Liberator`, for example, a user can map an icon key
sequence to a word, a phrase, or an entire sentence. Templates can be
set up containing a phrase or sentence with gaps to be filled by the
user at the time of utterance.

3.2 Conversational considerations

CHAT (Alm, Newell, and Arnott, 1987) was a prototype communication
system that recognized general conversational structure. A model
conversation would begin with greetings, then move on to smalltalk and
the body of the conversation, then finally to wrap-up remarks and
farewells. Often, the exact words we use for a greeting or a farewell
are not as important as the fact that we say something. A person using
CHAT could select the mood and the name of the person with whom they
were about to speak, and have CHAT automatically generate an
appropriate utterance for that stage of the conversation. An utterance
was chosen randomly from a list of alternatives for each
stage. Similarly, while pausing in our speech to think or while
listening to the other participant in a conversation, it is customary
to occasionally fill these gaps with some word or phrase. CHAT could
select and utter such fillers on demand.

To assist the augmented communicator during the less predictable main
body of the conversation, a database management system and interface
called TOPIC (Alm et al., 1989) was developed. The user's utterances
were recorded by the system, and identified by their speech acts,
subject keywords, and frequencies of use. If the user selected a topic
from the database, the system suggested possible utterances by an
algorithm that considered the current semantics in the conversation,
the subject keywords associated with entries in the database, and the
frequency with which entries were accessed. The possibility of
allowing the user to follow scripts was also considered.

These systems offered the user an interface into a database of
possible utterances, drawn either from fixed lists (CHAT) specific to
several different positions in the conversation, or reusing sentences
from previous conversations (TOPIC) organized by semantic links. Once
the conversation had entered the main body phase, however, there was
no representation of the temporal organization of utterances. As well,
topics were linked in a relatively arbitrary net, rather than
organized hierarchically.

The body of a conversation is not always as difficult to predict as
these systems may lead one to believe. There are many contexts in
which conversations can proceed more or less according to
expectations. For example, when ordering lunch at a favorite
restaurant or when dropping off or picking up a laundry order,
conversational exchanges are often quite standardized.

4 A Schema-based AAC Interface

The interface proposed in this section represents conversations
according to schemata, in order to take advantage of their predictable
structure. The conversation is broken down into smaller and smaller
substructures that are constructed to correspond well with a person's
own intuitive representation. When a conversation proceeds in the
expected manner, the interface follows with it and displays
appropriate and complete sentences from which the user may select an
utterance. Sentence and phrase templates can be set up, to provide
some flexibility with very little effort demanded of the user. The
interface provides the speed of access inherent in many sentence-based
systems, but with greater flexibility. Combining these advantages with
the user's regular AAC system, the resulting system seems well adapted
to facilitating conversation.

4.1 The schema framework

The initial schemata in the interface are developed a priori. In the
evaluation studies, described in a later section, these schemata will
be developed in consultation with users. A separate MOP is planned out
for each conversational context that is likely to reoccur frequently,
and for which there are reasonably well-developed expectations. These
MOPs are then grouped by similar goals, and a higher-level MOP is
developed by generalizing among the members of each group. For
example, MOPs for going to McDonald's, Burger King, and the lunchroom
cafeteria might all be grouped under the "eating at a self-serve
restaurant" MOP. Going to other restaurants might fall under the
"eating at a waiter-served restaurant" MOP, and together these two
MOPs would be contained in the more general "eating at a restaurant"
MOP.

A metaMOP is defined here as any category of activities that spans
several MOPs. For example, the "going out to eat" metaMOP might
contain MOPs for choosing where to go eat, travelling to a restaurant,
eating at a restaurant, and then returning home. A receptionist's
metaMOP for "dealing with a new employee" might contain MOPs for the
introductory conversation, a description of the company, introductions
to several other employees, and showing the new employee to their
office.

Each MOP contains an ordered sequence of scenes, each scene contains a
list of (currently no more than one) scripts, and each script contains
a partially ordered set of sentences. A sentence may contain slots as
well as words, and each slot would be associated with a group of
fillers. When the user selects a sentence with slots, the appropriate
list of fillers is displayed and any one of these can be used to fill
the slot. The user can also choose to enter a slot filler, or the
entire sentence for that matter, using their regular existing AAC
system.

Sentence templates containing slots and fillers provide a convenient
means of producing a wide range of sentences of similar form. For
example, the sentences "I'd like a shake and an order of fries" and
"I'd like a Big Mac and a root beer" could be produced by selecting a
template "I'd like a ____" and the fillers `shake' and `order of
fries' or `Big Mac' and `root beer'. (The conjunction `and' is
inserted automatically when multiple fillers are selected for a single
slot.) Only three selections are required to produce either sentence,
rather than nine selections to enter the words separately (or six, if
`order of fries' is counts as one word). The templates also capture
the intuitive similarities in the form and function of the sentences.

The hierarchical structure of this representation quite naturally
leads to inheritance of properties by lower-level schemata. If "eating
at McDonald's" involves the same sequence of scenes as contained in
the more general "eating at a fast food restaurant", for example, it
would access these scenes by inheritance from the more general MOP
rather than containing a redundant and identical copy of them. The
McDonald's MOP could still differ from the more general MOP by having
its own set of fillers for slots (such as the list of food items that
can be ordered) and its own set of scripts and sentences.

Inheritance provides a mechanism for providing schemata for contexts
that are novel but similar in some respects to existing schemata. When
the user enters a new fast food restaurant for the first time, if a
MOP for a restaurant serving similar food exists, then it could be
selected. If such a MOP does not already exist, then the MOP for the
general fast food restaurant could be selected. Entering a restaurant
that the user has never before been in, the interface is able to
provide an appropriately organized set of sentences and sentence
templates to use.

4.2 Sequential organization

The sentences in a MOP are presented to the user according to the
order of the scenes in the MOP. As a scene begins, the first sentence
in the scene's script is highlighted. When this scene is completed,
the sentences contained in the scene scroll out of sight, and the
sentences for the next scene are displayed. In this way, the interface
keeps pace with the conversation, and minimizes the need to search for
the next desired sentence. A conversation may advance to the next (or
any other) scene at any time by scrolling and selecting a sentence in
that scene.

The simplest method for participating in a conversation requires the
user to access only two keys. The user confirms with one key that the
highlighted sentence should be used, and the interface utters this
sentence and highlights the next one in the current scene. This cycle
repeats until the user advances to the next scene with the second key,
at which point the first sentence of that scene is highlighted. (At
the very least, a third key is needed if sentences in a scene are to
be uttered out of order.)

For the purpose of this two-key operation, scenes are assumed to be
strongly ordered within a conversation MOP. If scene B follows scene A
in the MOP, selecting a sentence in scene B (or any later scene)
indicates that scene A has been completed. In many cases this appears
to be a reasonable generalization, though perhaps not always. One
generally greets a person at the beginning of a conversation, for
example, but it may also happen that the greeting is made after some
initial exchange. To return to a previous scene, the user can either
select that scene directly, or cycle through the remaining scenes by
repeatedly selecting the "next scene" key until the first scene comes
into focus once more.

5 Future Work

5.1 Preliminary investigations

The goal of this interface is to facilitate an augmented
communicator's participation in conversations. A preliminary
investigation will ask several people who have AAC systems to comment
on the effectiveness of such an interface after using it for a few
days. Each AAC user will participate in developing the schema
hierarchy for their own interface.

In an iterative fashion, the author will guide each user in developing
a small number of simple preliminary schemata. The users will then
employ these schemata in conversations with their regular
communication devices. The author and the user together will review
interactions recorded by the communication device, and enhance the
existing schemata or develop new ones. This cycle will be repeated
until the schemata are developed to the user's satisfaction. A schema
development program is planned for the future, to allow the users to
continue refining and adding schemata to the interface on their own.

This interface is intended to be applicable to all AAC users and all
conversational contexts. For this reason, it is hoped that a diverse
group of people will be able to participate in this preliminary
investigation. Of particular interest will be people who use their AAC
systems in the context of their employment.

5.2 AAC users with developmental delays

The interface and methodology developed in this paper have many
features in common with strategies for training developmentally
delayed adolescents and adults to use their augmentative communication
systems (Elder and Goossens', 1994). Elder and Goossens' discussed the
conversational contexts of domestic living, vocational training
opportunities, leisure/recreation, and community living. They
developed an activity-based communication training curriculum, in
which students are taught context-appropriate communication in the
process of performing the relevant activity with the instructor or
with another student.

A script is generated for each activity, and is represented by an
overlay to be placed over the individual's AAC system. The authors
emphasized the importance of a concentrated message set, meaning that
all of the words, phrases, or sentences required to complete an
activity should appear on a single overlay. Activities that have a
logical sequence to them are advantageous because one event in the
activity can act as a cue to recall the next event, and it is
impossible to successfully complete the activity in any but the
correct order. Supplemental symbols could be added off to the side of
an overlay for, in one example, specific food types in a "making
dinner" script.

These similarities suggest that the schema-based interface developed
in this paper may be an effective aid in communication training. The
groups of activities described by Elder and Goossens' are all good
candidates for any conversational schema hierarchy. The script-based
overlays are analogous to the scripts contained within a conversation
MOP, and the supplemental symbols are similar to the filler sets for
sentence slots.

5.3 Possible extensions to the interface

AAC systems that attempt to predict words on the basis of the initial
letter(s) selected by the user in a domain-nonspecific context may
have a very large vocabulary to consider for each word in a
sentence. A similar problem of scale can face systems that attempt to
complete partial or telegraphic sentences. A schema-based interface
makes use of the current MOP and the current position within the MOP
to define a specific conversational domain. This domain could serve to
constrain or prioritize the vocabulary and semantics that the system
would need to consider, and reduce the time to process the sentence.

The network of MOPs and their substructures must currently be
constructed by the investigator, in consultation with the
user. Determining which contexts should be represented is obviously a
highly subjective issue that reflects the individuality of one's
experiences. It would be preferable to develop a means by which users
could construct their own hierarchy of schemata. Better still would be
a dynamic system that could store sentences as they were produced
during a conversation, and for which the schemata could be created and
updated interactively by the user.

6 Summary

An interface for augmentative communication systems is proposed that
makes the expected content of a conversation available to the
user. This can facilitate interaction in predictable situations by
reducing the need to produce common utterances from scratch. A
methodology is described for organizing conversations in a variety of
contexts according to hierarchical schema structures. At the highest
level, complex goals are represented by metaMOPs, and more specific
goals by MOPs. Each MOP contains a list of scenes in the order in
which they are expected to occur in the conversation. Each scene
contains sentences that the AAC user can choose. Sentences can be
complete or in the form of templates containing slots to be filled in
as needed. This interface makes it possible to participate in a
conversation by using only two keys, if the conversation fits a MOP
closely, but is in general intended to be used together with an
individual's regular AAC system.

7 References

Alm, N., Newell, A. F., and Arnott, J. L. (1987) A Communication Aid
Which Models Conversational Patterns. In Proceedings of the RESNA 10th
Annual Conference. (pp. 127-129). San Jose, CA.

Alm, N., Newell, A. F., and Arnott, J. L. (1989) Database Design For
Storing and Accessing Personal Conversational Material. In Proceedings
of the RESNA 12th Annual Conference. (pp. 147-148). New Orleans, LA.

Baker, B. (1982) Minspeak. Byte. (pp. 186-202).

Cullingford, R. E., and Kolodner, J. L. (1986) Interactive advice
giving. In Proceedings of the 1986 IEEE International Conference on
Systems, Man and Cybernetics. (pp. 709-714). Atlanta, GA.

Demasco, P. W., and McCoy, K. F. (1992). Generating text from
compressed input: An intelligent interface for people with severe
motor impairments. Communications of the ACM. 35(5). (pp. 68-78).

Elder, P. S., and Goossens', C. (1994) Engineering Training
Environments for Interactive Augmentative Communication: Strategies
for adolescents and adults who are moderately/severely developmentally
delayed. Southeast Augmentative Communication Conference Publications
Clinician Series: Birmingham, AL.

Kellermann, K., Broetzmann, S., Lim, T-S., and Kitao, K. (1989) The
Conversation Mop: Scenes in the stream of discourse. Discourse
Processes. 12, 27-61.

Koester, H. H., and Levine, S. P. (1994). Quantitative indicators of
cognitive load during use of a word prediction system. In Proceedings
of the RESNA `94 Annual Conference. (pp. 118-120). Nashville, TN.

Miikkulainen, R. (1993) Subsymbolic Natural Language Processing: An
integrated model of scripts, lexicon, and memory. MIT Press:
Cambridge, MA.

Schank, R. C., and Abelson, R. P. (1977) Scripts, plans, goals and
understanding: An inquiry into human knowledge structures. Erlbaum:
Hillsdale, NJ.

Schank, R. C. (1982) Dynamic Memory: A theory of reminding and
learning in computers and people. Cambridge University Press: NY.

Schank, R. C. (1990) Tell Me A Story: A new look at real and
artificial memory. Charles Scribner's Sons: NY.

Turner, E. H., and Cullingford, R. E. (1989) Using Conversation MOPs
in Natural Language Interfaces. Discourse Processes. 12, 63-90.

8 Acknowledgments

This work has been supported by a Rehabilitation Engineering Research
Center Grant from the National Institute on Disability and
Rehabilitation Research (#H133E30010). Additional support has been
provided by the Nemours Foundation.

Peter B. Vanderheyden (vanderhe@asel.udel.edu)
Applied Science and Engineering Laboratories
Department of Computer and Information Sciences
University of Delaware / A. I. duPont Institute
Wilmington, DE 19899 USA