Organization of Pre-Stored Text in Alternative and Augmentative Communication Systems: An Interactive Schema-Based Approach by Peter Bryan Vanderheyden ASEL Technical Report #AAC9501 Applied Science & Engineering Laboratories University of Delaware/A.I. duPont Institute 1600 Rockland Rd., P.O. Box 269 Wilmington, DE 19803 Phone: (302)651-6830 FAX: (302)651-6895 asel ACKNOWLEDGMENTS Work on this project was performed at the University of Delaware Center for Applied Science and Engineering, located at The Alfred I. duPont Institute, in Wilmington, DE, with support from the United States Department of Education, National Institute on Disability and Rehabilitation Research, Grant # H133E30010-94. Additional support was provided by the Nemours Foundation. TABLE OF CONTENTS LIST OF FIGURES vi LIST OF TABLES vii ABSTRACT viii Chapter 1 INTRODUCTION 1 2 AUGMENTATIVE AND ALTERNATIVE COMMUNICATION 3 2.1 AAC Users 3 2.2 Communicative Competence 4 2.3 Components of an AAC System 5 2.4 Strategies for Accelerating Message Production 7 2.5 Example AAC Techniques 8 2.5.1 Compansion 8 2.5.2 The Liberator TM and Minspeak TM 9 3 APPLYING CONVERSATIONAL STRUCTURE TO AAC 11 3.1 Stages of Conversation 12 3.1.1 CHAT 12 3.1.2 TOPIC 13 3.2 Schemata 13 3.2.1 Schema Theory 13 3.2.2 Schemata for Conversation 14 3.3 Conversational Schemata in AAC 15 3.3.1 MOP 16 3.3.2 Scene 16 3.3.3 Slot 16 3.3.4 Sentence Templates 17 4 SCHEMATALK: DESIGN AND IMPLEMENTATION 18 4.1 Initial Setup 19 4.1.1 Menu Bar 20 4.1.1.1 "List Topics" Button 20 4.1.1.2 "List Scenes" (or "List Slots") Button 21 4.1.1.3 "Slot Mode" (or "Scene Mode") Button 21 4.1.1.4 "Volume" Button 21 4.1.1.5 "Done" Button 22 4.1.2 Schema Display Area 22 4.1.3 Speech Output Area 22 4.2 Scene Mode 23 4.2.1 Navigation 23 4.2.2 Sentence Selection 24 4.3 Slot Mode 24 4.3.1 Navigation 24 4.3.2 Filler Selection 25 4.4 Sentence Templates 25 4.5 Schema File 27 4.5.1 Grammar 27 4.5.2 The MOP Tree 30 4.5.3 Organization of Scenes and Slots 31 4.5.4 Sentence Templates and Features 32 5 AN EVALUATION OF SCHEMATALK: ESPN INTERVIEWS 34 5.1 Participants 34 5.2 Equipment 35 5.3 Procedure 35 5.4 Transcription of Conversational Turns 37 5.5 Measures 38 5.5.1 Turn Count 39 5.5.2 Word Count 40 5.5.3 Turn Duration 41 5.5.4 Speech Rate 42 5.6 Observations 43 6 DISCUSSION AND CONCLUSIONS 45 6.1 Discussion 45 6.2 Some Possible Improvements 45 6.2.1 Extensions to SchemaTalk 45 6.2.2 Evaluation Study 46 6.3 Conclusions 47 BIBLIOGRAPHY 48 APPENDIX A: SAMPLE SCHEMA FILE 52 APPENDIX B: ESPN JOB DESCRIPTION 57 APPENDIX C: "POSSIBLE INTERVIEW QUESTIONS" FOR INTERVIEWERS 58 C.1 Career Path 58 C.2 Background and Character 58 C.3 Work Style 59 C.4 Availability 59 C.5 Disability-Related 59 APPENDIX D: APPLICANT SCHEMA FILES 60 D.1 Schema Files of A1 60 D.1.1 Schema File of A1 for Interview #2 60 D.1.2 Schema File of A1 for Interview #3 61 D.1.3 Schema File of A1 for Interview #4 63 D.2 Schema Files of A2 64 D.2.1 Schema File of A2 for Interview #2 64 D.2.2 Schema File of A2 for Interview #3 65 APPENDIX E: TRANSCRIPTS OF APPLICANTS' FINAL INTERVIEWS 67 APPENDIX F: RESULTS OF INTERVIEWS 89 F.1 Results of Interviews with A1 90 F.2 Results of Interviews with A2 99 APPENDIX G: APPROVAL LETTER FROM HUMAN SUBJECTS REVIEW BOARD 105 LIST OF FIGURES 2.1 Components of an AAC system 5 4.1 SchemaTalk window at start-up 19 4.2 "List Topics" (left), and "List Scenes" (right) dialog boxes 21 4.3 The "enter" scene in a possible restaurant schema 23 4.4 The "drink or appetizer" slot in a possible restaurant schema 25 4.5 A sentence template in a possible restaurant schema 26 4.6 EBNF grammar of schema structures 27 4.7 Regular expression definitions of schema terminals 29 4.8 Implicit definition of a schema member object 30 4.9 Explicit definition of a schema member object 31 4.10 Definition of a slot, containing both listed and imported fillers 32 4.11 A sentence, sentence forms, and fillers, all with features 32 5.1 Turn count 40 5.2 Word count (words/turn) 41 5.3 Turn duration (minutes/turn) 42 5.4 Speech rate (words/minute) 42 LIST OF TABLES E.1 Legend for Transcripts 67 E.2 Transcript of Interview #4 with A1 68 E.3 Transcript of Interview #3 with A2 81 F.1 Legend for Results 89 F.2 Results of Interview #1 with A1 90 F.3 Results of Interview #2 with A1 93 F.4 Results of Interview #3 with A1 94 F.5 Results of Interview #4 with A1 96 F.6 Summary of Results of Interviews with A1 98 F.7 Results of Interview #1 with A2 99 F.8 Results of Interview #2 with A2 101 F.9 Results of Interview #3 with A2 103 F.10 Summary of Results of Interviews with A2 104 ABSTRACT The field of augmentative and alternative communication (AAC) is concerned with assisting individuals with severe physical and language impairments to communicate more effectively. Existing AAC systems make use of a variety of approaches to accelerate sentence generation, including different selection methods, encoding strategies, and natural language processing. Augmented communicators continue to produce words at a very slow rate, and have difficulty participating actively in conversation. However, only recently have AAC systems begun to make use of the predictable patterns that occur in conversation. To date, such systems have focussed on either highly constrained and relatively content-free utterances, or on loosely structured, monologue type text. This thesis develops an alternative but compatible approach to facilitating conver sational participation in AAC which attempts to target a broader range of conversations, representing both their content and structure. Motivated by schema theory, this approach applies schema structures to the domain of conversation. A set of structures is proposed with which text from past conversations can be made available for reuse. To demonstrate this approach, a prototype is developed and evaluated. The prototype behaves as an interface that augments a user's current AAC system by providing access to conversational schemata created and updated by the user. In the evaluation study, two individuals used the interface while taking part in a series of mock job interviews. Results of the study were encouraging. Chapter 1 INTRODUCTION An individual who uses an augmentative communication system gains an alternative voice, one that can augment and complement a natural voice that is difficult to pro duce or to understand. In order to "speak" with this alternative voice, augmentative communication systems require the individual to physically select symbols representing the words to be spoken, either by hand or using some other motor channel. Dependence on motor abilities that are also impaired, however, means that utterances can take much time and effort to produce. To reduce this time to speak, systems could make sentences or larger segments of text available as single units. Such "reusable" text could then be spoken, as is, with very little effort. Alternatively, when reusable text is not available, the individual could select fewer items and speak in short, or incomplete, sentences. Although these two strategies might both reduce the time to produce a sentence, speaking with incomplete sentences, or with noticeably "canned" sentences that are not quite context-appropriate, can be interpreted by unfamiliar listeners as signs of cognitive impairment. The challenge to the designers of augmentative and alternative communication (AAC) systems, therefore, becomes one of increasing the rate of spoken output without compromising the image of the augmented communicator, the individual using the system. Contextual information is very important during conversation for determining both the meaning and the appropriateness of an utterance. Within the proper context, then, reusable text that is made available and selected by the user will not sound canned. However, such precise contextual information is not available, in an automatic fashion, to current AAC systems. It is, however, available to the individual who is speaking through the system. The individual is aware of the situation in which the conversation is taking place, and of the intended self-image. The challenge for the system becomes one of making context-appropriate reusable text available to the individual in a reasonable amount of time and without excessive cognitive load. This interaction between system and user should involve as little effort as possible during a conversation, so that the individual can concentrate on the topic and on the other participants. I suggest that there are three requirements for an AAC system to facilitate conversational interaction: (1) the augmented communicator produces text at some time prior to the conversation, and stores it in the system; (2) the AAC system supports an organization for stored text that is consis tent with observed features and patterns in conversation; (3) during a conversation, and with very little effort, the augmented communicator is able to retrieve desired and appropriate pre-stored text. Requirement (1), the pre-storage of text, is already a common feature in many systems. However, few systems offer real support for (2) and (3), structuring and retrieving this text for conversation. Notable exceptions are the systems CHAT and TOPIC that were developed at the University of Dundee (and their realization as a commercial product, Talk:About, manufactured by Don Johnston Inc.). CHAT (Alm et al., 1992) supported quick production of simple utterances, including greetings, small talk, and farewells, applicable in many conversational contexts. TOPIC (Alm et al., 1989) provided a database of reusable text, taken from previous conversations and linked by topic, and was concerned mainly with the monologue-type segments that occur in the body of a conversation. This thesis discusses an alternative, possibly complementary, approach to organizing and retrieving pre-stored conversational text in an AAC system. This approach is motivated by Schank's (1982) description of schemata, representing the dependence of how we behave and think on how we behaved and thought in similar situations in the past. Situations that are judged similar are grouped together to form the basis for expectations about future instances of similar situations. These can include expectations about what people or things will be involved in a situation, what events will occur, and in what order they will occur. An individual's cognitive system can store experiences more efficiently in this "schematized" form, and can organize new experiences around these schemata. In this thesis, I explore the notion of storing reusable text for schematized situations in a manner similar to that described by Schank's schemata. The intuition is that an AAC system which represents conversation similarly to our own cognitive system should be able to offer the user access to conversational text in a way that is both intuitive and efficient. In pursuit of this goal, a prototype interface, SchemaTalk, has been developed that adds schematic organization to a text-based AAC system, and enables the user to access that information. The effectiveness of this configuration was investigated in a study in which two participants, involved in mock job interviews, communicated using the interface and sentences they had organized into schemata. Chapter 2 AUGMENTATIVE AND ALTERNATIVE COMMUNICATION In North America, there are over two million people unable to speak adequately to meet their communication needs (American Speech-Language-Hearing Association, 1991; cited in Beukelman & Mirenda, 1992, p. 4). The field of augmentative and alternative communication is concerned with developing methods and devices, tuned to the abilities of each individual, to facilitate active and effective participation in conversation and other forms of communication (e.g., writing). In this thesis, I will focus on augmenting spoken conversation, and on electronic AAC systems with speech synthesis capabilities. 2.1 AAC Users The American Speech-Language-Hearing Association (ASHA) gives the following definition for the population of individuals who might use AAC systems: Individuals with severe communication disorders are those who may bene fit from [AAC] -- those for whom gestural, speech, and/or written communication is temporarily or permanently inadequate to meet all of their communication needs. (American Speech-Language-Hearing Association, 1991, p. 10; quoted in Beukelman & Mirenda, 1992, p. 4) Emphasis is placed on the individual's natural modes of communication not being adequate to meet all of their needs. In some situations, and with some conversational partners, individuals may prefer to communicate with their natural voice or gestures, and may find it more effective to do so. Communication may be severely impaired as a consequence of a congenital neurologic dysfunction, such as cerebral palsy, mental retardation, autism, developmental verbal apraxia, and specific language disorders (Mirenda & Mathy-Laikko, 1989, p. 3). Severe communication impairment may also be acquired as a result of amyotrophic lateral sclerosis (ALS), multiple sclerosis (MS), brain injury, stroke, or spinal cord injury (SCI) (Beukelman & Yorkston, 1989, pp. 42-47). These same neurological conditions may impair non-language motor abilities and perception, as well. Motor deficits, such as hypertonic muscle tone, often accompany cerebral palsy, as do visual and hearing deficits (Mirenda & Mathy-Laikko, 1989, p. 4). Acquired brain or spinal injuries may result in limited mobility or sensory losses. The ability to control a communication device that relies on motor input will, in many cases, be affected. 2.2 Communicative Competence The goal of AAC is to assist the user in becoming a competent communicator. Light defines communicative competence as the ability "to initiate and maintain daily interactions within the natural environment" (1989, pp. 138) adequately to meet daily needs. This presupposes knowledge, judgment, and skill in four areas (Light, 1989, pp. 139): (1) linguistic competence, using the rules of the language code (phonology, morphology, syntax, and semantics); (2) operational competence, using the AAC system itself (e.g., controlling the volume, retrieving and producing a word); (3) social competence, interacting with others (e.g., initiating a conversation, reacting to what another person says); (4) strategic competence, adapting to a situation and compensating for any difficulties that may arise (e.g., rephrasing an utterance if the listener did not understand it, rather than simply repeating it). As well, communicative competence is relative, not absolute. An individual may be competent interacting with one partner but not with another, in one situation but not another, or at one stage of the conversation but not another. People interact for a variety of reasons: to communicate their wants or needs, to convey or receive other information, to increase social closeness, and to fulfil the require ments of social etiquette (Light, 1988, p. 76). Interactions with different goals may differ in many ways. Social etiquette and expression of wants and needs, for example, may be characterized by highly predictable interactions in which communication rate is very im portant. Communication rate may also be important when the goal of the interaction is to convey or receive information (Light, 1988, p. 76). AAC systems need to recognize these varying demands in order to support communication more effectively. 2.3 Components of an AAC System An AAC system (Figure 2.1) can be described in terms of its language model, and its input and output interfaces (Demasco & Mineo, 1995). The input interface provides the user with a method for selecting symbols (letters, words, or icons) represented in the system. How symbols are represented, organized and processed is specified in the language model. The user's message is presented by the output interface. Currently, AAC systems can accept input from a wide variety of devices. Keys on a keyboard can be selected with the user's hands, or with a stick fastened to a head-mount or held in the user's mouth. For users with more limited motion, switch devices can be activated by movement of the hand, foot, or eyebrow. A beam of light, emitted from a head mounted source and detected by receivers on the AAC system, can be used to make selections on a switch or a keyboard. There is even work in progress to detect and follow the user's eye gaze (Sandler, 1994). Figure 2.1: Components of an AAC system [Figure Diagram] LANGUAGE MODEL: - representation - organization - processing PHYSICAL INPUT INTERFACE: - input devices - selection methods PHYSICAL OUTPUT INTERFACE: - output devices Selection methods can be generally classified as either direct or scanning. With direct selection, the user indicates the desired item from a set of items (Beukelman & Mirenda, 1992, p. 58). Spelling words on a computer keyboard is an example of direct selection. Each key represents a letter in the alphabet and is selected by the user via a key stroke. Using a scanning method, items in the set are displayed in some predetermined or der by the system, or by a conversational partner or facilitator, and the user indicates when the desired item has been presented (Beukelman & Mirenda, 1992, p. 62). In row-column scanning, for example, symbols are organized into rows and columns and the system high lights rows until the user indicates the row containing the desired symbol. The system then highlights columns until the user indicates the column containing the desired symbol. The system selects the symbol located at the point where that row and column intersect. Scanning input can be slower than direct selection, because the user must wait while the system traverses undesired items. However, direct selection of even a relatively small number of items requires a fair amount of motor dexterity. With scanning, an individual with severely limited motor abilities is potentially able to select any symbol repre sented in the system using only a single key or switch. The words and messages of the language model that are available for selection on an AAC system can be represented in a variety of symbol sets. The most appropriate set for a specific user will depend on that user's age, cognitive and language abilities, and perceptual abilities (see Beukelman & Mirenda, 1992, pp. 21-27, for a discussion on "representational symbols"). Letters, pictures, abstracted icon sets, or combinations of all three are used by different systems. The symbols available in a system must be organized and displayed to the user in some fashion. For instance, in a letter-based system performing row-column scanning, decisions must be made about how many rows and columns to use, and in which order the letters should occur in the columns. A system may contain more symbols than it can display at one time. Symbols must then be organized to provide the user with a consistent method for accessing them. This organization includes how symbols are physically distributed on the system display, as well as how hidden symbols are reached. In addition to providing this vocabulary of symbols, an AAC system may provide processing capabilities to facilitate its use. For instance, a vocabulary symbol may have a phrase or sentence associated with it which is retrieved by the system when the symbol is selected. Several strategies of processing to increase communication rate are discussed in Section 2.4. Both visual and spoken output are available on many AAC systems. Visual output methods include displaying symbols on a computer screen, or printing symbols out on paper. Spoken output is achieved with either digitized or synthesized speech. Digitized speech is pre-recorded and fixed, while synthesized speech is generated at the time of message production and is not constrained in terms of message content. However, the voice quality of digitized speech is far more natural than that of synthesized speech in current AAC systems. 2.4 Strategies for Accelerating Message Production One of the great challenges to AAC has been to increase the rate of message production by augmented communicators. With AAC systems, production rates of between 2 and 20 words per minute are not uncommon, an order of magnitude slower than for un augmented speakers using their natural voices (Kraat, 1987, p. 63). This slow production rate may affect how augmented communicators participate in conversations, and how they are viewed by their conversational partners. Two acceleration strategies used in existing AAC systems are message encoding and message prediction. Compansion, a third acceleration strategy, in which the user in puts a sentence in "telegraphic" style and the system completes the sentence, is discussed in Section 2.5.1. Message encoding refers to retrieving a word, or a longer unit of text, by using an associated code that requires fewer selections. Codes can consist of letters, numbers, icons, or some combination of these. For example, "Please open the door" may be associated with the code "OD", the initial letters of the salient words "open" and "door" (salient letter codes, Light & Lindsay, 1992, p. 35). An alternate encoding might be "DD", be cause the sentence is giving "directions" regarding the "door" (letter category codes, Light & Lindsay, 1992, p. 35). Flexible abbreviation expansion (Stum & Demasco, 1992) is an alternative letter coding system for abbreviating words. In a standard coding or abbreviation scheme, there is a fixed set of codes which is stored by the user ahead of time. The processing done by the system is a simple table look-up. In contrast, flexible abbreviation expansion has no fixed table of codes. Instead, the user and system follow rules to expand the letter codes. Message prediction is a dynamic strategy, whereby the system makes use of earlier portions of the message to interpret and predict the next icon, letter, word, or larger segment of text (Beukelman & Mirenda, 1992, pp. 42-44). On a spelling-based system, for example, after the user selects the letters "s", "t", and "u", the system may predict the word in progress to be "student", "stupid", or "studying". If the intended word is in this list, the user can select it immediately rather than continuing to spell it. A system could also make use of syntax rules (VanDyke et al., 1992) to better predict the intended word. 2.5 Example AAC Techniques To illustrate possible techniques for AAC, two systems are described here in detail. The first, Compansion, is different from other systems in that it uses natural language processing techniques to complete a telegraphic sentence once the user has entered it. The second system, the Liberator TM (Prentke Romich Company), is an example of a commercially available system, and contains a variety of tools with which the user can edit and navigate through stored text. One participant in the evaluation study, discussed in Chapter 5, normally communicates using the Liberator. 2.5.1 Compansion The strategies discussed in Section 2.4 attempt to accelerate message production by reducing the number of selections required to retrieve a word or sequence of words. To produce a novel sentence, the user must specify each word or phrase, and the resulting sentence may or may not be well-formed. A well-formed sentence may be retrieved as a single unit, but only in the fixed form in which it was encoded. Compansion (McCoy et al., 1994a; Demasco & McCoy, 1992) is a strategy for producing a well-formed, novel sentence by selecting only uninflected content words. The system expands this input to produce a complete sentence. Compansion, in its current implementation, consists of three stages: a word-order parser, a semantic parser, and a translator/generator. The sequence of input words enters the word-order parser. Here, each word is matched with an entry in the system lexicon, and its part of speech and semantic attributes are identified. The word-order parser then uses a grammar either to reject the sequence or to identify the main verb, attach modifiers to the words they modify, and identify embed ded sentences. As the name implies, the word-order parser assumes words appear in the input in the same order they appear in the completed sentence. In cases where more than one part of speech is possible, multiple interpretations of the input are processed. The lexicon contains nouns represented in a hierarchical meaning structure. "Hammer", for example, is classified specifically as a tool, more generally as a physical object, and so on. Verbs are also represented hierarchically, and are associated with case frames. A verb case frame contains cases for noun phrases that can occur with the verb in a sentence, and preferences for the kinds of noun phrases that can fill each case. The word-order parser passes the information derived from the input to the semantic parser, which uses the lexicon entry for the main verb in the input to build a case frame for the entire sentence. Noun phrases identified by the word-order parser are then used to fill cases in this case frame, moderated by the verb's preferences for case fillers. When all required cases for a verb have been filled, the case frame is passed to the final component of the compansion system, the translator/generator. Here, the case frame is translated into words to produce the sentence, with function words and morphological endings added as required. As an example, consider the input: "John break window hammer yesterday." Both "break" and "hammer" can be verbs, but only "break" allows for a complete parse of the sentence. "John" is identified as a human object, "window" as a fragile inanimate object, and "hammer" as a tool. The preferences for the case fillers of "break" suggest that the agent, John, used an instrument, the hammer, to break the theme, the window. "Yesterday" modifies the verb and specifies the past tense. Definite articles, "with" before the instrument, and the past tense marker are filled in automatically to produce the completed sen tence: "John broke the window with the hammer yesterday." Any alternative interpretations of the input would be produced if the user did not accept the one given. A modification to the current system that is underway involves replacing the current lexicon with a database that can access information from various sources (Zickus, 1995). A study into interactions between AAC users and conversational partners, and the implications for Compansion, is also in progress (McCoy et al., 1994b; Vander heyden et al., 1994). 2.5.2 The Liberator(TM) and Minspeak(TM) The Liberator consists of a "keyboard", a small liquid crystal display (LCD), and a speaker for outputting synthesized speech (as well as a printer, serial communication port, and other features not discussed here). The keyboard is typically configured to use the maximum of 128 keys, but may also function as a set of 8 or 32 keys. A light-emitting diode (LED) is situated next to each key, and a key guard (a plastic layer with a round hole for each key) lies on top, to help the user's finger find and stay on the intended key. In the 128-key configuration, approximately 64 of the keys are associated with pictographic icons, as well as characters, digits, and a number of keys regularly found on a typewriter or computer keyboard. A short sequence of keys can be associated with a word or phrase, fewer keys than would be required to spell the word or phrase. Baker and Barry (1990) noted that, in English orthography, 26 letters are combined into words that are of ten much more than three characters in length. If these words were retrieved by 3-key sequences, between 60% (Baker & Barry, 1990) and 85% (Prentke Romich Company, 1991) fewer key presses would be required to produce text than by spelling. Minspeak (Baker, 1982) is a strategy for encoding text, similar to letter category encoding, using icons instead of letters. A sequence of two to three multi-meaning icons is associated with each word or phrase. For example, the sentence "When do we eat?" can be encoded as a sequence of the icons "question mark", "clock", and "apple" (Prentke Romich Company, 1991, Appendix - Tutorial Demo Vocabulary, p. 1), representing the concepts question, time, and food. Rule-based programs are available to provide the user with a systematic strategy for encoding words and phrases. In the Words Strategy(TM) (Prentke Romich Company, 1992) program, for example, "turtles" is encoded as the sequence of icons "zebra", "treasure chest", and "noun+s". The icons represent the semantic category and specific word meaning, and the syntactic category of the word, respectively. Minspeak is a strategy for lexical retrieval only, a "transducer" for associating key sequences directly with stored text, and performs no linguistic processing. The user may, intentionally or by mistake, produce an ungrammatical or nonsense sentence. A future version is planned that will process sentence syntax and infer semantic roles, and will be able to produce a complete sentence from partial input (Baker & Nyberg, 1989). The Liberator, using Minspeak, provides access to pre-stored words, phrases, and sentences, or any other segments of text, by selecting a sequence of keys. Organizing a set of words into a "Theme" saves the user one key selection, as long as only words within the current Theme are being retrieved. The user may also choose to spell text letter by letter, and edit text on the LCD display. A key sequence can also retrieve a sentence template, or a sentence that contains a blank (a "Minsert") to be filled in at retrieval time. One or more words are selected to fill the Minsert, and complete the phrase or sentence. This provides a certain amount of flexibility for using pre-stored text. However, there are no linguistic processing mecha nisms to ensure agreement between the template and the Minsert contents. For example, separate sentence templates might be required to handle the case when either a singular or plural noun phrase could fill a Minsert. Stored text can be organized and reorganized by the user in a structure called a "Notebook". A key can be programmed to allow the user to have the Liberator speak the contents of the Notebook a sentence at a time, pausing for the user to press a key before speaking the next sentence. Thus, a user could store a speech or a conversation and proceed through it, controlling the rate of sentence production. Chapter 3 APPLYING CONVERSATIONAL STRUCTURE TO AAC In the previous chapter, a number of general approaches currently being applied to AAC were described briefly, and two systems were discussed in more detail. Existing AAC systems are accessible using a wide variety of input devices, and contain large vocabularies of words and phrases which users can combine together and speak. A number of systems attempt to facilitate and accelerate augmented communication further by doing one or more of the following: (1) organizing the vocabulary to reduce the time and effort required to find and access it; (2) providing the opportunity for users to store and re-use their own utterances, in addition to the regular stock vocabulary; (3) applying natural language processing techniques to the user's input. However, a fourth consideration, one that many speaking individuals take for granted, has not been sufficiently addressed in AAC: (4) organizing text to support participation within the context of conversation. A conversation-based organization of text does not prohibit a system from pursuing points (1), (2), and (3) as well. Conversation is not just a random exchange of utterances by two or more people. Rather, it is often a structured and cooperative construct in which participants follow well established, though rarely stated, rules and expectations. Yet participants in unaugmented conversation may pay little attention to controlling how the conversation develops. By contrast, augmented communicators must continuously be thinking about how to produce their utterances using their AAC system. Alm urges that this should not be, that "a conversation aid should ideally be able to simulate the non-conscious control mechanisms which are available to unimpaired speakers." (Alm, 1988, p. 29) 3.1 Stages of Conversation A conversation can be described in terms of the following sequence of stages (Alm, 1988, p. 107), understanding that the same conversation need not contain them all: (1) greetings; (2) smalltalk; (3) main business; (4) wrap-up remarks; (5) farewells. Greetings and smalltalk can be used to open the conversation, wrap-up remarks and fare wells to close it. These four stages serve a social and pragmatic purpose, exchanging little or no real information. The main topic is discussed in the third stage of the conversation. 3.1.1 CHAT During those stages in which the content of an utterance was not important, the prototype CHAT system (Alm, 1988; Alm et al., 1987; Alm et al., 1992) allowed the user to construct and generate an utterance, to choose from several prepared utterances, or simply to request the system to choose and produce an utterance on its own. To avoid repeat ing the same utterance, the system made a constrained random choice of pre-stored utterances for the current stage of the conversation. At any time, the user could advance the system to the next, or any other, stage. The user could select a mood (polite, informal, angry, and so on) for the conversa tion and the name of the other participant. Pre-stored utterances of the selected mood were then made available, and the name of the other participant was inserted where appropriate. For example, an informal greeting to "Bill" might have taken the form: "Hi, Bill. How's it been going with you?" (Alm et al., 1987, p. 129) CHAT also provided the user with easy access to utterances for filling pauses dur ing speech (e.g. "so", "well"), and for giving feedback when another person was speaking to indicate that the user was listening attentively (e.g. "uh-huh"). Example conversations using CHAT (transcribed in Alm, 1988, and in Alm et al., 1987) were brief, suggestive of what might occur between acquaintances passing on the street and exchanging a few words. 3.1.2 TOPIC To assist the augmented communicator during the less predictable main body of the conversation, a database management system and interface called TOPIC (Alm et al., 1989; called CHAT4 in Alm, 1988, p. 159) was developed. Segments of text could be stored by the user as records in the database. The purpose of TOPIC was to, whenever possible, predict what text the user would choose next, and to help the user find relevant segments of text stored in the system (Alm et al., 1989, p. 148). Database records could contain text several sentences in length, and were identified by their speech acts, subject keywords, and frequency of use. The system suggested possible next utterances by considering subject keywords and speech acts of the previous item selected in the conversation. It then searched for records with matching subject key words in the database, giving preference to those records that were accessed frequently. Conversations produced by TOPIC tended to be longer and similar to monologues. Examples included jokes, stories, and lectures (Alm, 1988, p. 171-176). 3.2 Schemata While CHAT and TOPIC can provide valuable aids to conversation, an AAC sys tem may well be able to provide more support to the augmented communicator. CHAT provided interactional utterances, but with little content. TOPIC provided text with con tent, but gave little support for interaction. In this section, an approach is developed that supports both meaningful and inter active conversation. As well, this approach provides a more formal definition of the structural components in a conversation and how they are organized. 3.2.1 Schema Theory Schank and Abelson (1977) suggested that a person developed mental scripts as a result of repeated experiences. Each script represented the typical sequence of events that occurred in a particular situation. New experiences were interpreted on the basis of exist ing scripts. Events which did not already match an action in a script but which were considered important were either added to the script for subsequent use, or developed into a new script. Schank (1982) extended and modified the idea of scripts into a hierarchy of schema structures. Memory organization packets (MOPs) replaced scripts in representing typi cal situations. Meta-MOPs represented higher-level goals, and contained MOPs corresponding to situations that, together, could satisfy those goals. For example, the meta-MOP "trip" (Schank, 1982, p. 99) might involve planning, preparation, travel, arrival, and so on, each of which was represented as a MOP. Each MOP was associated with a sequence of scenes, and each scene represented the sequence of actions leading to a specific goal. For example, the MOP for a trip by air plane could include scenes for getting tickets, driving to the airport, boarding the plane, and so on. Each scene was associated with any number of alternative scripts, where a script contained one instance of the actions that fulfilled the scene's goal. MOPs and scenes could each be organized in hierarchies, with MOPs and scenes at higher levels representing more general situations. For example, "travel by train" and "travel by plane" MOPs could both contain a "buy ticket" scene that contained actions specific to buying train and plane tickets, respectively. The "buy ticket" scene in a more general "travel" MOP would then capture aspects common to both. To demonstrate the use of schemata in understanding stories and answering ques tions, Schank developed the computer program CYRUS (Schank, 1982, p. 207). CYRUS contained databases of information about two former Secretaries of State, and could integrate new information into these databases. On the basis of the information it gathered, CYRUS could correctly answer such questions as "Have you been to Europe recently?" and "Why did you go there?" Schematization (Matlin, 1989, p. 118) is a term for the abstraction of experiences and the development of schemata (the plural of schema) in memory. Experimental evidence suggests that people may remember and interpret events according to schemata (see Matlin, 1989, pp. 230-236, for a review). How an experience is abstracted into a schema is a highly individualized process, relying on a person's own history of experiences. In addition to organizing past experiences in memory, schemata can drive expecta tions during new experiences (Schank, 1982, p. 37). For example, when visiting a newly opened fancy restaurant for the first time, one may expect to be seated by a maitre d' and to pay a great deal for dinner because of "fancy restaurant" experiences in the past. 3.2.2 Schemata for Conversation Whereas Schank (1982) had concentrated on schemata involving typical sequenc es of events for story understanding, Kellermann et al. (1989) observed conversations be tween undergraduate students meeting for the first time, and described their conversations in terms of scenes in a MOP. Scenes were grouped into three phases: initiation, maintenance, and termination. In the initiation phase, for example, participants might exchange greetings, introduce themselves, and then discuss their current surroundings. A number of interesting patterns were found in this study. Scenes were weakly ordered within each phase, but strongly ordered between phases, so that a person rarely entered a scene in an earlier phase from a scene in a later phase. Some scenes shared what the investigators called "subroutines" (used in an informal, not computational, sense), or common sequences of generalized acts. In one example, several scenes contained the sequence "get facts", "discuss facts", "evaluate facts", and so on. Thus, conversations have been represented in terms of schemata. Scenes in these conversations demonstrated some degree of temporal ordering, and could be described as being in a hierarchy with phases and subroutines. An example of a natural language interface that represented conversation in terms of schemata was JUDIS (Turner & Cullingford, 1989, p. 75). JUDIS was the interface for an interactive system that played the part of a caterer's assistant and helped the user plan a meal. JUDIS represented its goals and the goals of the user as MOPs. Each MOP contained a list of characters (the caterer and customer), scenes (either mandatory or option al), and the sequence of events. Higher-level MOPs handled higher-level goals, such as the goal of getting information, while lower-level MOPs handled lower-level goals, such as answering yes-no questions. JUDIS differed significantly from the interface developed later in this thesis be cause JUDIS tried to "understand" and model the user's goals, and, consequently, was limited to operating in a narrow domain. The latter approach, by making use of MOPs that have been constructed by the user, can be applied to any domain with which the user is familiar. 3.3 Conversational Schemata in AAC In this thesis, the notion of schemata and MOPs is explored as a method for organizing the user's pre-stored text in an AAC system. Several researchers have suggested that scripts might be applied to AAC (Newell, 1990, p. 50; Elder & Goossens', 1994), though nobody, to the author's knowledge, has taken this idea and developed it into a system. Yet the notion shows promise in giving the AAC user access to large amounts of prestored text in a way that is transparent, since it presumably mirrors the way that information is stored in memory. The system of schemata developed here is motivated by, and loosely based on, Schank's hierarchy of MOPs and scenes. The user's text is stored in a large hierarchy. At the highest level in this hierarchy is the MOP. 3.3.1 MOP The MOP identifies the conversational topic, or conversational context. In this system, a user defines MOPs to represent topics or contexts that are relevant to him. Each MOP contains a list of scenes (Section 3.3.2) and a list of slots (Section 3.3.3). Based on Schank's definition of schemata as representing actions, as well as on observations by Kellermann et al. (1989) of actual conversations, scenes are expected to be traversed in the order in which they appear in the MOP. Once the conversation leaves a scene, the expectation is that the scene will not be re-entered. However, some conversa tional situations may follow the schema more closely than others. Matlin (1989, p. 225) distinguished between "strong scripts" that are ordered and "weak scripts" that are not. During weakly ordered conversational situations, the order in which scenes are traversed may be different from the order in which they appear in the MOP. MOPs can be organized in a hierarchy, so that lower-level MOPs inherit structures from more general higher-level MOPs. Thus, a scene that occurs in a number of different restaurant MOPs needs to be defined only once, in the common parent MOP. The actual text of the conversation is contained in the scenes and slots, and these are accessed by first selecting a MOP. 3.3.2 Scene Formally, a scene contains actions that, together, fulfil a specific goal in a MOP. Looked at in another way, scenes are subtopics of the MOP, and an individual can define them in whatever manner is logical to her. In the conversational setting, associated with each scene is a set of sentences (or, more generally, utterances). These are the sentences that the user may wish to select when discussing the actions, or subtopic, captured in the scene. Sentences grouped together in a scene are treated, to some extent, as a unit. Child MOPs inherit complete scenes from their parent. Sentences within a scene are expected to be traversed in sequence. These expectations are motivated on the same basis as those for the ordering of scenes in a MOP. 3.3.3 Slot A particular situation may have associated with it words or phrases that do not form complete utterances. For example, a list of food items may be associated with a particular restaurant. A new schema structure, a slot, is introduced here to group together words or phrases (slot fillers) that are related to each other in the context of the MOP. 3.3.4 Sentence Templates There may be sentences in a scene that are identical except for one word or phrase, such as the food items in the "order" scene of a restaurant MOP. Rather than listing in a separate sentence each food item that an individual might order, a sentence template could be used. A sentence template contains some words and a reference to a previously defined slot. So, for example, the "order" scene of a restaurant MOP might contain a template: "I'll have a ." When this template was selected, the system would give the user access to the food slot, to select the specific food items. This technique would require an additional selection over listing each item in a separate sentence, but reduces the number of sentences that need to be included in the scene. Sentence templates are especially advantageous if the slot contains many items. To this point, sentence templates and slots here are similar to sentence templates and Minserts in the Liberator. Another advantage to using a sentence template in this system could be realized if the template occurs in a scene that is inherited from a parent, or higher ancestor, MOP. The same template could be associated with slot fillers contained in each MOP. Thus, the sentence template would appear to be uniform across all related MOPs, while retrieving fill ers appropriate to the current MOP. The use of sentence templates may require the system to perform some natural language processing. For example, if a singular slot filler in a sentence template is filled by a plural filler, morphological endings on other words in the sentences may need to be modified to match the filler number. Chapter 4 SCHEMATALK: DESIGN AND IMPLEMENTATION SchemaTalk was designed to complement an individual's existing augmentative communication device, adding a schematic organization of pre-stored and reusable text to the device's own capabilities. In order to investigate issues related to design and implementation, a prototype system was developed and tested at the Applied Science and Engi neering Laboratories (ASEL). The most common way to use SchemaTalk is to enter sentences organized as sche mata into a specially designed text file. This may be done using the user's regular AAC system, or using a standard computer keyboard and mouse. Once the schemata are stored, SchemaTalk gives the user access to this pre-stored text in an organized manner, making it easy to step through the schema and select this text as the conversation moves along. The SchemaTalk user may: (1) select and speak pre-stored text, as is; (2) select pre-stored text, edit it to more accurately reflect current needs, and speak this edited text; (3) use the standard AAC system to compose and edit novel text in SchemaTalk, and then speak this text. For the remainder of the chapter, the term "keyboard" will be used to refer either to a keyboard or to an AAC system emulating a keyboard. Similarly, "mouse" will refer to either a computer mouse or a mouse emulator. SchemaTalk currently runs on a Sun system, in a UNIX environment. The program code consists of several files of Tcl/Tk scripts, and extensions to the Tcl/Tk library, written in C++. These C++ functions are used to build the internal representation of the user-spec ified schema hierarchy. The Tcl/Tk script files define the SchemaTalk window and user in terface, calling Tcl/Tk functions to display and control various graphical widgets, including dialog boxes, buttons, and scrolling boxes. 4.1 Initial Setup At start-up, the schema file (see Section 4.5) is loaded and the SchemaTalk window (Figure 4.1) is displayed. This window consists of, from top to bottom: (1) a title bar, containing the title "SchemaTalk", and, on the left, a small triangle in a square that, if pressed using a mouse, will iconify the window (provided by the OpenLook window manager); (2) a menu bar, consisting of a row of menu buttons for listing topics (MOPs), listing scenes, setting the mode, setting speech volume, and exiting; (3) the schema display area, consisting of a scrolled list box that will contain the schema text, and buttons to its right that are used for navigation through the schema; and, (4) the speech output area, consisting of the output text box with the head ing "Output:", and the buttons "Speak" and "Clear" to its right. Figure 4.1: SchemaTalk window at start-up Buttons can be pressed using a mouse, or by entering a key from a keyboard. The key that will "press" the button is specified in the button label. For example, the "Speak" and "Clear" buttons can be "pressed" by entering the "Return" and "Escape" keys, respectively. For the remainder of the chapter, the term "pressing" a button will refer to either pressing a button with a mouse or mouse emulator, or by entering the appropriate key. All dialog boxes contain two buttons: "OK", and "Cancel" (e.g., Figure 4.2). Although they do not say so explicitly, these buttons are pressed either with a mouse or by pressing the keys "Return" and "Escape", respectively. If the dialog box contains a list, the list can be scrolled using a mouse, or the "Up" and "Down" arrow keys on a keyboard. The current selection (highlighted with a black background and white text) is returned and the dialog box disappears when the "OK" button is pressed, or nothing is returned when the "Cancel" button is pressed. Also, all dialog boxes are modal. This means that, while a dialog box is present, the SchemaTalk window will not accept any input. 4.1.1 Menu Bar The five menu bar buttons can be pressed using a mouse, or by pressing the "Control" key followed by a key specific to the button. If the "Control" key is pressed and re leased, all of the menu buttons are highlighted (black background, white text), and pressing the button-specific key presses the button. If the "Control" key was pressed by mistake, pressing it again will release the "Control" highlight. 4.1.1.1 "List Topics" Button Pressing the "List Topics" button (or "Control" and "T") causes the "List Topics" dialog box to be displayed (Figure 4.2), listing all MOPs contained in the schema file. Approximately the first 20 characters of each entry in a dialog box is visible. Indentation in dicates the hierarchical structure, with the names of child MOPs indented two spaces to the right of their parents. A MOP is selected by scrolling to the topic name (using "up" and "down" arrow keys, or controlling the scroll bar with a mouse) and pressing the "OK" button (or "Return" key). The schema list box is then updated to contain sentences from this new MOP, which becomes the current MOP. 4.1.1.2 "List Scenes" (or "List Slots") Button The SchemaTalk window is initially in scene mode (Section 4.2), and the schema list box contains sentences from the current MOP. Pressing the "List Scenes" button (or "Control" and "S") causes the "List Scenes" dialog box to be displayed (Figure 4.2), list ing all scenes in the current MOP. The scene selected from this dialog box becomes the current scene, and the schema list box is scrolled so that sentences from this scene are visible. In slot mode (Section 4.3), the same process takes place, involving slots rather than scenes. A new current slot is selected from the dialog box, and the schema list box is updated to contain all, and only, the fillers from this new slot. 4.1.1.3 "Slot Mode" (or "Scene Mode") Button Pressing this button causes the mode to toggle from scene mode to slot mode, or slot mode to scene mode (described in Section 4.2 and Section 4.3). The button label indi cates the mode that will be active after the button has been pressed: when SchemaTalk is in scene mode, this button is labelled "Slot mode", and when SchemaTalk is in slot mode, this button is labelled "Scene mode". 4.1.1.4 "Volume" Button The user can adjust the sound level of the speech output by pressing the "Volume" button (or the keys "Control" and "V" together), and scrolling to the desired sound level on the dialog box that appears. Possible sound levels range from zero to the maximum limit, by 10% increments. Figure 4.2: "List Topics" (left), and "List Scenes" (right) dialog boxes [FigureDiagram] 4.1.1.5 "Done" Button When the user is ready to quit from SchemaTalk, pressing the "Done" button (or "Control" and "D") causes a dialog box to be displayed, which asks the user to confirm or cancel this request to quit. Pressing the "OK" button exits from the interface and the SchemaTalk window disappears. 4.1.2 Schema Display Area The schema display area is initially empty, and the buttons for navigating through the schema are disabled, as indicated by their grayed lettering. When a MOP has been selected, sentences (in scene mode) or fillers (in slot mode) are displayed in the list box. Schema buttons are enabled only if they are appropriate. For example, the "Next" scene button is enabled (the text is black, not gray) only if there is a scene after the current one in the current MOP. The schema list box displays up to approximately 60 characters per line. (The list box width is based on the average width of a character. Using a proportional font, it is possible that a particular sentence will have more than or less than 60 characters actually visible.) 4.1.3 Speech Output Area Any text that the user enters will appear in the speech output area's text box. This text box will grow if more text is entered than can fit into its current size, and shrink back to its original size as text is deleted. To speak the text, the user can press the "Speak" button (or the "Return" key). Text is automatically cleared from the speech output area after it has been spoken, but will be spoken again each time the "Speak" button is pressed before any new text is entered. Text in the speech output area can also be cleared by pressing the "Clear" button (or the "Escape" key). Pressing the "Delete" or "Backspace" key will delete the last character, while pressing "Control" and "Delete" or "Control" and "Backspace" together will delete the last word, space, or punctuation symbol. 4.2 Scene Mode In scene mode, the schema list box contains sentences from the current MOP. If SchemaTalk can not find the schema file, or if there are no MOPs in the file, this list box is empty. To illustrate the operation of SchemaTalk, a possible MOP for visiting the Blue & Gold Club restaurant (Figure 4.3) shall be used as an example. The first sentence has been selected and appears in the speech output text box. After a sentence has been selected, the next sentence is highlighted, ready to be selected, to further facilitate involvement in con ersations that closely follow the schema. Sentences in the current scene are marked with a ">" in the left margin. A total of 10 sentences are visible at any time. Sentences can be 400 characters long, but only approximately the first 60 characters of each are displayed. 4.2.1 Navigation The user can move to the next or previous sentence within the current scene by pressing the "Next" and "Prev" sentence buttons, or the "Down" and "Up" arrow keys. Figure 4.3: The "enter" scene in a possible restaurant schema [Figure Diagram] These buttons are disabled when the current sentence is the last or first sentence in the scene, respectively. Pressing the "Next" or "Prev" scene buttons (or the "Control" and "Down" or "Up" arrow keys) will cause the next or previous scene, respectively, to be made current, and the first sentence of that scene to be highlighted. Alternatively, the user can press the "List Scenes" button and directly select the scene to be made current. The list box scrolls to make the current scene visible. 4.2.2 Sentence Selection The currently highlighted sentence is selected by pressing the "Select" sentence button (or the "Tab" key). The selected sentence is appended to the text in the speech output text box. If there is insufficient space in the text box, the box grows, and shrinks when the text is deleted. When a sentence is selected, if it is not the last sentence in the scene, the next sentence becomes current, and the list box is updated to display and highlight the new current sentence. Text in the speech output area can include both text entered directly from the keyboard, and sentence selected from the schema. 4.3 Slot Mode In slot mode, the schema list box contains fillers for the current slot. For example, in a restaurant schema, a slot might contain food items for the user to order at a particular restaurant (Figure 4.4). 4.3.1 Navigation Similar to navigation among sentences and scenes, the user can move to the next or previous filler within the current slot by pressing the "Next" and "Prev" filler buttons, or the "Down" and "Up" arrow keys. These buttons are disabled when the current filler is the last or first filler in the scene, respectively. Pressing the "Next" or "Prev" slot buttons (or the "Control" and "Down" or "Up" arrow keys) will cause the next or previous slot, respectively, to be made current, and the first filler of that slot to be highlighted. Alternatively, the user can press the "List Slots" button and directly select the slot to be made current. 4.3.2 Filler Selection The currently highlighted filler is selected by pressing the "Select" filler button (or the "Tab" key). The selected filler is appended to the text in the speech output text box. 4.4 Sentence Templates A slot can be linked to a sentence using a sentence template. Continuing with the Blue & Gold Club restaurant MOP in scene mode (Figure 4.3), consider the fifth sentence: "Is the very spicy?" The angle brackets around the word "entree" indicate a slot associated with that position in the sentence, in this case the subject position. When this sentence is selected, the SchemaTalk window changes to slot mode, and the "entree" slot becomes the current slot (Figure 4.5). The cursor is initially located in the sentence-slot box, which takes the place of the slot name in the template (in this example, "") and is empty to begin with. The user can navigate through the fillers in the slot in the normal fashion, using the "Next" and "Prev" buttons, or the arrow keys. Pressing the "Select" button (or "Tab" key) selects the highlighted current filler and appends it to the text at the location of the cursor. Pressing the "Right" arrow key moves the cursor out of the sentence-slot box and to the end of the sentence, while pressing the "Left" arrow key moves the cursor to the end of the text with in the sentence-slot box. Figure 4.4: The "drink or appetizer" slot in a possible restaurant schema [Figure Diagram] The sentence template can be configured to automatically change its form to agree with the text in the sentence-slot box. In this preliminary implementation of SchemaTalk, there is no capability for parsing sentences. However, the user can include more than one sentence form in the schema, and set syntactic and semantic features to match fillers with the correct sentence form. Features are described in Section 4.5.4. In the example in Figure 4.5, the original sentence form was "Is the very hot?" When the first filler, a singular "filet of sole", was selected, the speech output text was "Is the filet of sole very hot?" When a second singular filler, "chicken parmesan", was selected, the sentence-slot text formed a plural noun phrase. The plural sentence form was substituted for the singular form, and the text became "Are the filet of sole and chicken parmesan very spicy?" It should be noted that the "and" between the two fillers was added by SchemaTalk. If yet a third filler were selected, perhaps "rice pilaf", SchemaTalk would replace the "and" with a comma, and place "and" before the last filler, to obtain "Are the filet of sole, the chicken parmesan, and the rice pilaf very spicy?" Figure 4.5: A sentence template in a possible restaurant schema The "entree" slot was made current, after the sentence "Is the very spicy?" was selected. Then "filet of sole" and "chicken parmesan" were selected from the entree slot. [Figure Diagram] 4.5 Schema File The user defines the schemata for SchemaTalk in a text file named "st.schemata". To illustrate the various schema structures, a schema for visiting restaurants, including the Blue & Gold Club, is developed below. The complete restaurant schema is included in Appendix A. 4.5.1 Grammar The format of the schema file must conform to the grammar in Figure 4.6, written in Extended BNF (Sethi, 1989, p. 18). Parentheses, "()", indicate that their contents should be treated as a unit. Square brackets, "[]", indicate that only one item of their contents should be chosen. The contents of braces, "{}", are optional, repeated zero or more times. The terminal CR refers to the carriage return. Figure 4.6: EBNF grammar of schema structures [Figure Diagram] SchemaFile::= { Comment | SchemaDefn | WhiteSpace } Comment::= `' Text CR SchemaDefn::= { MopDefn | SceneDefn | SlotDefn } MopDefn::= `' WhiteSpace MopPathAndName CR { SceneDefn | SlotDefn} `' CR SceneDefn::= `' WhiteSpace SceneName CR { TextSentenceDefn | ImportedSentenceDefn | SentenceTemplateDefn } `' CR SlotDefn::= `' WhiteSpace SlotName CR { TextFillerDefn | I mportedFillerDefn } `' CR TextSentenceDefn::= `' WhiteSpace Text CR ImportedSentenceDefn::= `' WhiteSpace `{' ScenePathAndName `}' CR SentenceTemplateDefn::= `' WhiteSpace Text `<' SlotName `>' Text CR { SformDefn } SformDefn::= `' WhiteSpace Text `<' SlotName `>' Text CR TextFillerDefn::= `' WhiteSpace Text CR ImportedFillerDefn::= `' WhiteSpace `{' SlotPathAndName `}' CR MopPathAndName::= MopName | MopPath MopName ScenePathAndName::= SceneName | MopPath SceneName SlotPathAndName::= SlotName | MopPath SlotName MopPath::= `:' | MopName `:' | MopPath MopName `:' MopName::= Text A sentence, sentence form, filler, or identifier name (MOP_NAME, SCENE_NAME, and SLOT_NAME) is limited to 400 characters in length. The terminals Letter, Digit, Punctuation, and WhiteSpace are defined using regular expressions in Figure 4.7. Square brackets, braces, and the vertical line are definition symbols and not legal characters. Space and Tab refer to the space key and tab key, respectively. Figure 4.7: Regular expression definitions of schema terminals [Figure Diagram] SceneName::= Text SlotName::= Text PosFeature::= ` pos=s' | ` pos=o' | ` pos=x' NumFeature ::= ` num=s' | ` num=p' | ` num=x' Text ::= { Letter | Digit | Punctuation | WhiteSpace } Letter::= [ a-zA-Z ] Digit::= [ 0-9 ] Punctuation::= [ !@#$%^&*-=+;'"`~,.? ] WhiteSpace::= { Space | Tab } 4.5.2 The MOP Tree The schemata read from the schema file are organized in a tree of MOPs. At the top of this tree is the global or root MOP. Scenes and slots are defined within a MOP and are said to be member objects of that MOP. In this case, the MOP which contains these scenes and slots is referred to as their parent MOP. Scenes and slots not defined within a particular MOP are member objects of the root MOP. Similarly, MOPs defined within a MOP become member objects, subMOPs, of the parent. Thus, a MOP contains three (possibly empty) lists: subMOPs, scenes, and slots. Member objects can be defined in two ways: implicitly, and explicitly. An object is implicitly defined as a member if it is defined within its parent MOP. A waiter-service restaurant MOP, for example, can be defined implicitly as a subMOP of a restaurant MOP (Figure 4.8). To define an object to be a member explicitly, the object is defined outside of the body of the parent MOP, and the name of the object is preceded by the path to the parent MOP (Figure 4.9). The path lists the names of all ancestor MOPs of the member, be ginning with the most general MOP (but not including the root) and ending with the parent MOP, separated by colons. A second form for the path is a solitary colon, which indicates the parent of the current MOP. Figure 4.8: Implicit definition of a schema member object [Figure Diagram] restaurant ... waiter service ... Blue & Gold Club ... ... ... Member scenes and slots can be defined for a MOP in the same way as subMOPs are defined, explicitly or implicitly. However, a MOP that contains no scenes will, by default, inherit all of the scenes of its parent, and a MOP that contains no slots will inherit all of the slots of its parent. On the other hand, no inheritance of scenes (or slots) occurs if any scene (or slot) is defined in the subMOP. Thus, a subMOP resembles its parent MOP, unless defined otherwise. 4.5.3 Organization of Scenes and Slots Each scene contains a (possibly empty) list of sentences. These sentences may be listed in the scene definition, or they can be imported from a scene of an ancestor MOP. A scene can contain both listed and imported sentences. Similarly, slots can contain both list ed and imported fillers. In the slot example below (Figure 4.10), a drink slot is defined for the Blue & Gold Club restaurant MOP. Braces in the filler text indicate that fillers will be imported, and not listed. The first "filler" in Figure 4.8 contains a path and slot name, and can be read as "import the fillers of the restaurant MOP drink slot into the current slot." All of the fillers of the drink slot in the restaurant MOP are then copied into the drink slot of the Blue & Gold MOP, beginning at the location of this first "filler". The second "filler" contains a colon and slot name, which can be read as "import the fillers of the parent MOP drink slot into the current slot." All of the fillers from the drink slot of the waiter service MOP are then copied into the drink slot of the Blue & Gold MOP. Figure 4.9: Explicit definition of a schema member object [Figure Diagram] restaurant ... restaurant : waiter service ... restaurant : waiter service : Blue & Gold Club ... The third filler contains text not enclosed in braces, and defines a simple filler. 4.5.4 Sentence Templates and Features A sentence template contains text and a slot associated with a specific location within that text. For example, "Is the very spicy?" contains the text "Is the " and " very spicy", with the "entree" slot associated with the subject position of the verb "is". Fillers and sentence forms can be defined with a number feature (Figure 4.11) to enable automatic generation of sentences with number agreement. Figure 4.10: Definition of a slot, containing both listed and imported fillers [Figure Diagram] Figure 4.11: A sentence, sentence forms, and fillers, all with features [Figure Diagram] restaurant : waiter service : Blue & Gold Club ... drink {restaurant : drink} {: drink} Canada Dry ginger ale ... order entree ... Is the very spicy? Is the very spicy? Are the very spicy? ... entree filet of sole steamed mussels ... Two aspects must be specified: (1) what the instantiated sentence template should look like for the allowable values of the number feature; (2) what value for the number feature is associated with the filler (or fillers) in the sentence slot. For example, Figure 4.11 shows the definition of a sentence template and its associated slot filler definition. The top half of the figure defines the template. the first line shows how the template should be displayed in the SchemaTalk schema display area. Next, the two lines beginning with " This file contains a sample hierarchy of restaurant schemata. natural numbers one two three four five six seven eight nine ten twenty fifty numbers zero {natural numbers} restaurant drink water tea coffee Coke food {drink} enter Hello. order I'd like . I'd like . Thanks. pay cashier Here's dollars. Here's dollar. Here are dollars. exit Thank you. Good bye. restaurant : self service restaurant : self service : McDonald's food Big Mac shake {: food} restaurant : self service : Treats food pepperoni pizza {: food} restaurant : waiter service drink or appetizer {: drink} entree dessert enter Vanderheyden. A table for two, please, by the window. order drinks and appetizers , please. , please. That's fine. order entree Is the very spicy? Is the very spicy? Are the very spicy? I would like , please. I would like , please. eat waiter questions Everything is very good, thanks. Could we have the dessert menu, please. order dessert , please. , please. eat pay waiter The check, please. exit restaurant : waiter service : China Royal drink or appetizer egg roll hot and sour soup {: drink or appetizer} entree moo goo gai pan chicken sweet and sour pork {: entree} dessert {: dessert} restaurant: waiter service: Blue & Gold Club drink or appetizer the soup of the day a glass of white wine {: drink or appetizer} entree filet of sole chicken parmesan {: entree} dessert Snickers bar pie {: dessert} Appendix B ESPN JOB DESCRIPTION * * * Job Opening -- Sportswriter for ESPN SportsCenter * * * Description The SportsCenter program provides television viewers with a summary of current sports events and results. SportsCenter is recorded four times daily, and covers professional football, hockey, baseball, basketball, and tennis. We are looking for one qualified individual to report current sports events in Pennsylvania and the surrounding areas. Support in researching events will be provided by the ESPN research team, but the individual will be responsible for arranging the relevant information into a concise and exciting format. Qualifications A minimum of two years experience reporting sports events. Television experience is preferred, but radio or newspaper work is also acceptable. The individual must be self-motivated, knowledgeable and interested in local sports, and must demonstrate an ability to report these events in an engaging style. Salary $28,000 per year; negotiable, based on experience Available September 1, 1995 Contact Denise Peischl, General Manager ESPN Appendix C "POSSIBLE INTERVIEW QUESTIONS" FOR INTERVIEWERS C.1 Career Path (1) Why are you leaving your present job? (2) What do you expect to be doing in five years? (3) What are your long-term career goals? (4) Why did you change jobs so often? (5) Why do you want this job? C.2 Background and Character (1) Tell me about yourself. (2) What is your main strength? (3) What is your main weakness? (4) What specific strengths would you bring to this position? (5) What do you do when you have trouble solving a problem? C.3 Work Style (1) Do you prefer working as a member of a team, or would you rather work alone? (2) Do you like to work with people? (3) How do you react to criticism from superiors, if you believe it is unwarranted? (4) Do you work well under pressure? (5) Are you able to work alone without direct supervision? (6) What three areas of your job do you like the least? C.4 Availability (1) Are you able to work overtime if necessary? (2) Do you have a good work attendance record? (3) May we contact your present employer? (4) Do you have any questions? C.5 Disability-Related (1) Do you think you'll fit in with the rest of the staff? (2) Will your disability interfere with your attendance? (3) Will you require adaptive equipment to perform your work? (4) (a) Do you think you'll require assistance from your coworkers to perform job functions? (b) My staff is not trained to work with disabled people. Appendix D APPLICANT SCHEMA FILES D.1 Schema Files of A1 D.1.1 Schema File of A1 for Interview #2 my name is __(name)__ I did like hearing people ideas abouts sports when I did worked at 610 w i p. I did take calls from people calling in television when I worked for k y w t v I did my writing myself they had people to help with writing but I wanted to say my own words myself. any but not golf no I would like more calls about tennis pete sampras The Davis Cup was in Florida that year in December I love to go to any thing I am like a kid in toys are us when some I am at a sport michael chang & andre agassi for United States I know vic b I was happy they won the n l like eighty and eighty-six people getting mad at m w with out him wouldn't no play off he didn't NEED one more game six of n l. in ninety three when ashe die he said what he thought my love of sports and knowing who to call for in put game six of 93 n l to long for air time that is I why I want this job you have the time to do sports the right way after game six of 93 world series I had one on one c s writing therefore for e s p n may be helping chess along to a sport yes I had been bad man for Washington D.C. I did it I will have to live in Connecticut my name is jeff endler yes the add in the u,s,a today I has lived in that part of United States 35 of thirty nine years of my life other four years my family was in Connecticut I try to read five or six papers a day and last year I got myself a dish and I love to go sports myself I am like a kid in toys are us when I am at big sport from 90 to 93 I did worked at k y w television giving sport I always wrote my words myself they had person they would write I didn't used them I felt it part of my job to keep my lines to sports people but it wasn't in my job outline ninety the n l only three hits all 3 was homerun 3-2 I would like to do some chess week my parents looked fast knowledge and lines to sports people telephone and going to sports begin talking talk to long Canada went I did television before I did sit with the person when they did that it was my ideas they did the work if I move will you pay for a dish and my telephone call to keep my lines to sport people D.1.2 Schema File of A1 for Interview #3 sports interview introducing myself My name is __(name)__. I am 39. about old jobs I had two jobs. My first one was at KYW-TV giving sports on the news. I always wrote my words myself. They had people to help with writing, but I wanted to say my own words myself. I thought people could tell if I didn't. After three years of just having a few minutes a day, I wanted to try talk radio. From 93 until now, I am at 610-WIP radio. I did take calls from people calling in. I did like hearing people ideas about sports. what I am good at and why I am good at knowing where to go to get an answer to a question. Many sports people know and like me. They are willing to help me on and off the air. I am like a kid in Toys Are Us when I am at a big sport event. what sports I like and why I really like all sports except golf. I like hockey for it is fast moving. I like football for its team work. I love basketball for its movements with or without the ball. I like baseball for its thinking. I think tennis is pretty. It and basketball are my most favorite. why I want this job When I did worked on television before, I had just a few minutes on the air. I know I will not be on the air much but I will have to write longer stories. about moving and the dish If I move, will you pay for a dish and my telephone calls to keep my lines to sport people? D.1.3 Schema File of A1 for Interview #4 sports interview introducing myself My name is __(name)__. I am 39. I am happy you are taking your time seeing me about this job. about old jobs I had two jobs. My first one was at K Y W - T V giving sports on the news. I always wrote my words myself. They had people to help with writing, but I wanted to say my own words myself. I thought people could tell if I didn't. After three years of just having a few minutes a day, I wanted to try talk radio. From 93 until now, I am at 610 - W I P radio. I did take calls from people calling in. I did like hearing people ideas about sports. After two years of radio I had got tired of some people crazys ideas. I am ready for something new. what I am good at and why I am good at knowing where to go to get an answer to a question. Many sports people know and like me. They are willing to help me on and off the air. I am like a kid in Toys Are Us when I am at a big sport event. what sports that I like and why I really like all sports except golf. I like hockey for it is fast moving. I like football for its team work. I love basketball for its movements with or without the ball. I like baseball for its thinking. I think tennis is pretty. It and basketball are my most favorite. why I want this job When I did worked on television before, I had just a few minutes on the air. I know I will not be on the air much but I will have to write longer stories. I like the idea you do sport s joined all day long. about moving and the dish If I move, will you pay for a dish and my telephone calls to keep my lines to sport people? D.2 Schema Files of A2 D.2.1 Schema File of A2 for Interview #2 INTERVIEW GREETINGS / CLOSINGS Hi! Nice to meet you. Thank you for this interview. Have a good day. I am looking forward to talking with you again. EDUCATION I graduated in the top 5 percent of my class at Shippensburg. I helped supervise three other news writers. EXPERIENCE My main strength's include concise and interesting writing, especially editorials, and good organizational skills. My main weakness is a lack of experience working with a large staff. Probably my best work was a week-long series on how endorsements have changed college football. EXPECTATIONS My eventual goal is to become a national sports commentator. I am looking for opportunities to gain national exposure and experience. To leave my current position I would need to receive a 10 to 15 percent increase in my salary. QUESTIONS What are the possibilities for advancement within the organization? In what community would I be located? What would my initial responsibilities include? D.2.2 Schema File of A2 for Interview #3 INTERVIEW GREETINGS / CLOSINGS Yes. No. That would be fine. That sounds promising. No, not really. Hi! Nice to meet you. Thank you for this interview. Have a good day. I am looking forward to talking with you again. EDUCATION I graduated in the top 5 percent of my class at Shippensburg. I helped supervise three other news writers. Working on the Gazette was a requirement of my major. EXPERIENCE My main strength's include concise and interesting writing, especially editorials, and good organizational skills. My main weakness is a lack of experience working with a large staff. Probably my best work was a week-long series on how endorsements have changed college football. I enjoyed doing interviews and got along well with the players. I primarily covered the Harrisburg Senators. EXPECTATIONS My eventual goal is to become a national sports commentator. I am looking for opportunities to gain national exposure and experience. Travel is not a problem as I am still single and have no other significant obligations. To leave my current position I would need to receive a 10 to 15 percent increase in my salary. QUESTIONS What are the possibilities for advancement within the organization? In what community would I be located? Will any travel be involved? What would my initial responsibilities include? Appendix E TRANSCRIPTS OF APPLICANTS' FINAL INTERVIEWS [Table Diagrams] Table E.1: Legend for Transcripts Table E.2: Transcript of Interview #4 with A1 Table E.3: Transcript of Interview #3 with A2 Appendix F RESULTS OF INTERVIEWS [Table Diagrams] Table F.1: Legend for Results Table F.2: Results of Interview #1 with A1 Table F.3: Results of Interview #2 with A1 Table F.4: Results of Interview #3 with A1 Table F.5: Results of Interview #4 with A1 Table F.6: Summary of Results of Interviews with A1 F.2 Results of Interviews with A2 [Table Diagrams] Table F.7: Results of Interview #1 with A2 Table F.8: Results of Interview #2 with A2 Table F.9: Results of Interview #3 with A2 Table F.10: Summary of Results of Interviews with A2 Appendix G APPROVAL LETTER FROM HUMAN SUBJECTS REVIEW BOARD