WORD PREDICTION FOR DISABLED USERS: APPLYING NATURAL LANGUAGE PROCESSING TO ENHANCE COMMUNICATION By Julie A. Van Dyke A thesis submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Honors Bachelor of Arts in Cognitive Studies. June 1991 (c) Julie A. Van Dyke All Rights Reserved ABSTRACT Disorders such as Cerebral Palsy and Lou Gehrig's disease produce severe physical disabilities that make normal communication impossible. This project addresses this problem by developing a syntactic prediction system. Other communication aids have previously been developed using abbreviation and prediction to enhance communication, but these have had limited success. Abbreviation systems allow the user to type pre-determined, shortened word-forms which the computer is responsible for expanding. Prediction systems attempt to predict the user's next keystrokes based on statistical data. I have combined natural language processing techniques and popular syntax theories to devise a prediction system that, unlike these previous systems, models the syntax rules that specify how words can be combined. This allows the syntactic predictor to make rule-based, linguistic determinations about what words can follow those already processed. It can be used with flexible abbreviation systems to eliminate possible expansions for personalized abbreviations. The syntactic predictor could also be used with other devices to reduce the effort required of the user by predicting what word forms he or she is likely to type next. In modelling linguistic knowledge, this system provides a more natural solution to the communication problem than many systems currently in use. WORD PREDICTION FOR DISABLED USERS: APPLYING NATURAL LANGUAGE PROCESSING TO ENHANCE COMMUNICATION By Julie A. Van Dyke Approved:_______________________________________________________ Kathleen F. McCoy, Ph.D. Professor in charge of thesis on behalf of the Advisory Committee Approved:_______________________________________________________ William J. Frawley, Ph.D. Committee member from the Linguistics Department Approved:_______________________________________________________ Roberta M. Golinkoff, Ph.D. Committee member from the University Honors Program Approved:_______________________________________________________ Robert F. Brown, Ph.D. Director, University Honors Program ACKNOWLEDGEMENTS Two years ago I began this work as a nifty way to spend a summer. Within a few weeks, I realized just how complex the English language was and that my Allen & Greenough's New Latin Grammar didn't explain grammar as completely as I thought it did. Suddenly I had embarked on a project that would forever change the way I looked at language. Here, I would like to acknowledge those individuals and organizations without whose guidance and support this magnum opus could not have been possible. First and foremost, I am grateful to my family: especially Mom, Dad, and Grandma. Ever since third grade, when I camped out in the living room for a week while building a bookworm house out of dozens of hand-made miniature books, you have graciously encouraged me through all my projects that inevitably turn out to be bigger than life. This particular project, however, was different from all those because it allowed me to grow while it grew; all the while learning, seeing, and accomplishing a great deal. For this I owe a world of thanks to Kathy McCoy, who, although never willing to admit it, is a great student of Plato the Greek. Countless times I walked into her office thinking that I had not much to talk about and left 2 hours later with a whole new direction for the project that she somehow teased out of my proliferous brain. I also owe unbounded thanks and respect to Bill Frawley, who buoyantly announced in class one day, "You'll have to bear with me here, I get excited about this stuff.". That single quote and this man's artistry in the classroom showed me how exciting linguistics could be and what going to school was all about. I must also express my sincere thanks to Pat Demasco and the A.I. DuPont Institute's natural language lab. They provided me the motivation for this project, the freedom to shape it as I pleased, and made a lowly undergraduate feel at home. In particular, I am grateful to Linda Suri for keeping me on my toes with impossible syntax questions and for her help with the final revisions of this work. In addition to guidance from the Computer Science Department, I also had the pleasure of being adopted by the University of Delaware Linguistics Department. I received invaluable help, challenges, and inspiration from many people there. Specifically, I would like to thank Tom Ernst, Peter Cole, Jim Lantolf, Roberta Golinkoff, and Gaby Hermon for their various roles in making this work a success. This is one undergraduate for whom your department has had a significant impact, and I hope one of many in the years to come. Lastly I need to recognize the faculty and staff of the University Honors Program for their incorrigible support, and for giving me a home away from home. This work was done in conjunction with the A.I. DuPont Institute's Applied Science and Engineering Laboratories and was partially funded by the University Honors Program. TABLE OF CONTENTS FIGURES vi TABLES vii ABSTRACT viii Chapter 1 AUGMENTATIVE COMMUNICATION 1 1.1 Background 1 1.2 The User 2 1.3 Available AAC Devices 3 1.4 Linguistic Improvements 13 2 THE PREDICTOR 17 2.1 Concepts and Definition s 17 2.2 ATN Formalism 22 2.3 The Prediction Problem 27 2.4 Implementation 29 3 THE GRAMMAR 45 3.1 Motivation 45 3.2 X' syntax . 46 3.3 Implementation of X' Theory 56 3.3.1 Sentence level 65 3.3.2 Determiner Phrases 70 3.3.3 Degree Phrases 75 3.3.4 Adjective Phrases 84 3.3.5 Noun Phrases 85 3.3.5.1 Prepositional Phrases 86 3.3.5.2 Relative Clauses 87 3.3.6 Verb Phrases and Complementation 91 3.4 Example Parse Trees 98 4 THE LEXICON 106 4.1 Sources and Considerations 106 4.2 Implementation 109 4.3 Lexical Entries 121 4.3.1 CATEGORY 122 4.3.2 Toggled features 124 4.3.2.1 PRE-DET 125 4.3.2.2 CENTRAL-DET 125 4.3.2.3 ART 126 4.3.2.4 REL 126 4.3.2.5 WH 126 4.3.2.6 NP 127 4.3.2.7 POSS 127 4.3.2.8 DEM 128 4.3.2.9 MENSURAL 128 4.3.2.10 QUANT 128 4.3.2.11 CARDINAL 128 4.3.2.12 MASS 129 4.3.2.13 COUNT 129 4.3.2.14 PROPER 130 4.3.2.15 DEG 130 4.3.2.16 NEG 130 4.3.2.17 UNTENSED 131 4.3.2.18 PASTPART 131 4.3.2.19 PRESPART 131 4.3.2.20 PRED 132 4.3.2.21 LA 132 4.3.2.22 LN 132 4.3.2.23 I 132 4.3.2.24 IPR 133 4.3.2.25 IP 133 4.3.2.26 INPR 134 4.3.2.27 IT 134 4.3.2.28 TN 135 4.3.2.29 TNPR 135 4.3.2.30 TNP 135 4.3.2.31 TF 136 4.3.2.32 TW 136 4.3.2.33 TT 137 4.3.2.34 TNT 137 4.3.2.35 TG 138 4.3.2.36 TNG 138 4.3.2.37 TNI 138 4.3.2.38 CNT 139 4.3.2.39 CNN 140 4.3.2.40 CNA 140 4.3.2.41 CNG 140 4.3.2.42 CNI 141 4.3.2.43 DNN 141 4.3.2.44 DNPR 141 4.3.2.45 DNF 142 4.3.2.46 DPRF 142 4.3.2.47 DNW 142 4.3.2.48 DPRW 143 4.3.2.49 DNT 143 4.3.2.50 DPRT 144 4.3.3 Value features 144 4.3.3.1 NUMBER & PNCODE 145 4.3.3.1.1 Nouns 145 4.3.3.1.2 Verbs 146 4.3.3.2 TAKES 147 4.3.3.3 ZONE 148 4.3.4 ROOT 150 5 DISCUSSION 152 5.1 Linguistic Theory 152 5.2 Other Applications 160 5.3. Future work 163 6 CONCLUSION 166 CITED BIBLIOGRAPHY 168 REFERENCE BIBLIOGRAPHY 173 APPENDIX 176 FIGURES Figure 2.1 Search Space for a Context Free Grammar 20 Figure 3.1 X' Theory grammar 58 Figure 3.1 X' Theory grammar (continued) 59 Figure 3.1 X' Theory grammar (continued) 60 Figure 3.1 X' Theory grammar (continued) 61 Figure 3.2 LUNAR grammar 62 Figure 3.2 LUNAR grammar (continued) 63 Figure 3.2 LUNAR grammar (continued) 64 TABLES Table 4.1 CATEGORY Codes and Sources 123 Table 4.2 Category Features 124 Table 4.3 Value Features and Appropriate Categories 144 Chapter 1 AUGMENTATIVE COMMUNICATION In this chapter I will outline the current state of augmentative communication which is the field providing the motivation and most immediate application of this project. I will characterize the potential users for this system and then discuss the goals that I believe this project achieves in relation to other work in this field. 1.1 Background Rehabilitation engineering endeavors to integrate technology into vocational, educational, and independent living settings in order to increase the independence of persons with physical or sensory disabilities. There are numerous sub-areas in this field: robotics research for artificial limbs, production of sensory aids like hearing aids, developing physical therapy techniques, and devising marketing strategies to promote this technology. The project described here derives its motivation from the sub-field of Rehabilitation Engineering called Augmentative and Alternative Communication (AAC). This work is an example of how natural language processing techniques and modern syntax theories can be used to improve the communication devices currently available to disabled users. Communication devices are technological interventions between the disabled user and the world he or she hopes to communicate in. The history of their development includes intervention strategies that were not able to take the needs and desires of the user into consideration because of technological limitations. Often they placed a cognitive or physical burden on the user because they were not intuitive or were complicated to operate. I will outline some of these systems in the following sections, but it is encouraging to keep in mind that the field has reached a new consciousness. The primary consideration has changed from finding any way possible for these people to communicate, to finding the communication aid that is best suited to each individual. The emphasis is now on developing intervention devices which feature easy training, flexible use, and allow a reasonable communication rate. The project I will present in the following chapters is a product of these new concerns. It will contribute to the development of an intervention device that preserves the user's ability to use language freely while maintaining speed and ease. 1.2 The User The typical user for the system developed here is cognitively intact and therefore has the mental capability and desire to use language the same way a non-disabled individual would. The user's disability affects his or her motor capability and muscular control in a way that produces limited dexterity. These users are typically non-speaking, and have difficulty typing, writing, or even controlling a joy-stick to select letters. In the worst case the user is limited to using a single-switch interface which makes communication very slow. Two types of disorders that typically produce this condition are developmental, such as Cerebral Palsy, or degenerative, like Lou Gehrig's disease. Cerebral Palsy (CP) is diagnosed at infancy in children whose normal muscular control is deficient (Griffith, 1985). The child usually exhibits unusual body postures, purposeless body movements, and poor coordination and balance. Although some children with CP may suffer from mental retardations, many have a high intelligence despite their muscular disabilities. I am targeting these individuals as possible users for the technology I am describing here. The clinical name for Lou Gehrig's disease is Amyotrophic Lateral Sclerosis (ALS). This disorder afflicts adults late in their lives, meaning that there is no previous language impairment or cognitive disability (Griffith, 1985) to hinder communication. Patients suffer from muscle twitching and weakness, beginning in the hands and spreading to the arms and legs, or from the stiffening of muscle groups usually in the extremities. They will often lose control of muscles that perform swallowing and communicative functions. Some stroke victims could also benefit from this technology; however, often they will have more severe linguistic impairments which make using this system inappropriate. Whatever the ailment, if the user can be characterized as linguistically, or cognitively, intact but with deficient motor skills, the device I have developed has the potential to facilitate their communication. 1.3 Available AAC Devices In this section I will outline some of the AAC systems that are currently available for users with the characteristics I have identified. There are still a lot of non-electronic communication aids available for disabled users; however, I am concerned here only with electronic ones because they are compatible with this project. Typically these use the user's motor capability (albeit limited) to compose messages via an electronic system. Because his or her motor capability is limited, the system must require minimal effort from the user. Many severely disabled individuals find a single switch device useful for communication. The switch is used to access letters on a one-to-one basis as he or she composes a message. One of the first of these devices to be developed was the Tufts Interactive Communicator (TIC), which consisted of a small mechanical box and switch (Foulds, 1976). Thirty-two possible characters are offered to the user on an electronic grid appearing on the face of the box and each row is scanned until the user selects the one containing the desired item with the switch. Then the machine scans each column in that row until the user selects the desired letter or word. As the user selects each character, it is typed onto a paper strip printer which the user can use for communication purposes. This sort of interface is common because it can be effectively used by many clients. It is only one of several selection methods available, all of which attempt to provide the user with a manageable way to communicate in spite of his physical limitations. Clearly an important issue in developing these devices is the speed with which the user can compose a message. It is easy to imagine that communication with a single switch device like the TIC is a slow and laborious process, and in fact average communication rates are around 2-10 words per minute (Foulds, 1980). Compare this to non-disabled typing speeds of 60-70 words per minute or to speaking speeds which are easily twice that, and the extent of the communication deficiency for these users is clear. It is a deficiency resulting only from the technology available to them, because as I have described previously, these user's cognitive and linguistic abilities are intact. The user is just without the muscular control necessary to use their skills in a normal fashion. Because of this problem, AAC research has focussed on developing strategies to increase the communication rate possible with these devices. For scanning systems like the TIC, this has principally meant devising variations in the order that letter-characters are offered to the user. Knowledge about the frequency of letter usage is used to rearrange the letters on the TIC display so that the more frequent ones are scanned before the less popular ones. For instance, in a row-column scan, the most frequent letters of the alphabet such as "S" or "T" could be placed in the upper left-hand corner of the display. This way the scan will cross these letters first and thereby avoid scanning through unlikely choices like "V" or "Q" most of the time. This technique was able to produce a 30% improvement over alphabetically-ordered letter displays (Foulds, 1976). Another improvement on this scanning technique is a type of letter prediction in which the system uses more sophisticated frequency data to control the scanning. These statistics take the form of n-gram statistics which tell what n number of letters are likely to occur together, such as "str" or "ing". With this system, if the user has already indicated an "s" the system refers to its statistics to identify the six letters that are likely to follow the "s", like for instance "t", "p", "r", "e", "i", or "h". These are immediately highlighted in sequence so that the user has the opportunity to select them before the normal row-column scanning process is resumed. By anticipating letter selection in this way, the communication rate has been improved by up to 50% (Foulds, 1976). [1] The scanning technique I have been describing can also work at the word level, and the ordering improvements in this case are based on what words are the most frequent. In contrast to TIC, which uses hard-wired letter grids, Meta4 is a software-based communication device that uses static word pages containing the most common words (Miller, 1990). Instead of having to spell out each word, the user navigates through the pages using the single switch. The system's first page might contain letter intervals such as "AA-AL" and "AL-AZ" and the scanning passes through these intervals until the user chooses the one containing the word he wants to use. Then the display changes to show a page containing vocabulary words that he can choose from using the same scanning technique. The words included on these pages are a vocabulary set, called a "book", that can be tailored for each individual user. It is possible for users to have several books to choose from and in this way a large amount of vocabulary can be made available in a way that does not require the user to spell out every word letter by letter. In the case that the word that the user wants to use is not in any of the vocabulary books the system has there is a spelling page the user can select from the initial display. This works just like the TIC system, using the scanning technique to allow the user to spell out the new vocabulary word. Dynamic communication devices, like Meta4 which has a changing display, can also be improved using prediction techniques. Word prediction systems try to determine the next word based on what has already been entered into the system. One example of prediction used this way is the PAL system developed at the University of Dundee (Swiffin, et al. 1987). This system uses frequency statistics to determine the word that is most likely to follow what has already been entered. The statistics are gathered from large samples of text, wherein the frequency of each word is tabulated and included in the dictionary entry of that word. When the user types a letter, the system displays the five most frequent words beginning with that letter in a special scanning window. The user can choose one of these words or type another letter. With each keystroke the frequency statistics are checked and possible completions for the word are offered to the user in the scanning window. In this way the system attempts to predict what the word is before the user has typed it out entirely. The number of keystrokes required of the user is reduced because the system completes the word as soon as the user indicates that the right one has been found. Those who developed PAL claim that they have been able to obtain a rate reduction of 50% based on a dictionary of 1000 words, wherein each word has its own frequency data. An effort was made to improve PAL even further by including syntactic information to their statistical data. A probability matrix for category pairs was produced to constrain the word prediction according to the syntactic class of the previous word. For example, after an adjective is entered, the probability of a specific noun as its successor is computed by multiplying the pre-computed probability of an adjective-noun pair and the noun's occurrence frequency. This was done for each noun in the dictionary and the most probable ones were offered to the user. With this use of syntax, PAL's developers were able to reduce the number of keystrokes necessary to generate sentences by an additional .5-2% (Swiffin et al., 1987). [2] While the PAL system has been successful at reducing the number of keystrokes required of the user, it is important to note that this reduction has been found using a fairly limited dictionary of 1000 words. Because of the way PAL uses syntax, dictionaries of a much larger size are likely to severely degrade the performance because it will take longer to calculate the probability of each word using the word-pair statistics and consequently it will take longer to determine the five most probable words. This raises a problem that is common to all these systems I have been discussing: they depend on statistics, rather than the rule-based linguistic information that humans actually use when they communicate. Because of this the system is only as effective as the statistics are accurate and complete. As with the PAL frequency counts, statistics are typically collected over a large texts often derived from newspapers and published reading materials. This means they are liable to be skewed by the subject matter of the text. For example, the Brown Corpus of American English, which is a text of approximately one million words and one that is often used for deriving statistics for these systems, represents words like "eggs" and "bunny" and "Easter" as being common words in everyday language use. This is a result of the time of year that the corpus was compiled, not of actual facts about English usage. This problem can be solved to some extent by using statistics derived from the user's own language use; however, the same problem can occur with these texts because a user does not always talk about the same topics and so the word statistics could change depending on his topic of conversation. In school frequently used words might be "homework" or "teacher" but when a child is playing these words will be the least likely words that he will use. A problem with statistically-based systems also arises when novel words are used. The system has no statistics for these words and so despite the statistical information in his or her AAC device, the user will still have to completely spell out words. Non-stochastic strategies for improving communication rates have centered around abbreviation systems. Instead of spelling out words letter-by-letter, the user is able to use an abbreviation. He or she can indicate fewer letters with the scanning device and the system will assume responsibility for expanding the abbreviations. A major problem with the abbreviation systems now available is that the user has to memorize specific abbreviations for words in order for the system to be helpful. This arises because a computational system that can only handle a one-to-one correspondence between a word and its abbreviation. Thus, the system may require the word "work" be abbreviated "wrk" in order to differentiate it from the word "wake", which might be abbreviated "wk". The user will consider this need when he is trying to communicate and so he may not think to use "wrk" instead of the more easily constructed "wk". Because of these predetermined abbreviations, the user must undergo specialized training to learn the system's abbreviations before he or she can even start using the system. In addition, these abbreviation systems and scanning devices assume that the user knows how to spell the word he is trying to use. A communication device called Minspeak (Baker, 1985) was an attempt to alleviate this problem as well as those associated with memorizing pre-determined abbreviations. This system used a keyboard of multi-meaning icons together with keys for morphological and rudimentary syntactic information to create sentences. For example the client might use the key sequence [boy-image] + [noun-key] + [smiley face-image] + [verb-key] + [book-image] + [building-image] + [noun-key] + [declarative-sentence key] to compose the sentence "Boy like school." Once the sentence is composed, the client will press a "speak" key and the computer will speak the phrase the user created. This use of images allows the abbreviations to be semantically meaningful to the user and presumably easier for him or her to remember. Minspeak has proven useful for many members of the disabled community, but it has also been problematic for some because it still requires the user to understand and/or memorize the associations between the images and the English words. When the system was conceived, it was intended that each user should make up his or her own icons corresponding to the words he or she used most. Clinically this proved to be an enormous and impossible task for the clinicians whose responsibility it was to set up the vocabulary for each user. As a result, the system is used with the scheme of its creator and this may not be intuitively clear to some users. Minspeak, therefore, suffers from the same disadvantage as other abbreviation systems in that the user has to undergo an extensive training period before he or she can use the device. Even after this period the user may not fully grasp the semantic justifications underlying particular icons so that he or she is never able to fully exploit the system's power. A less revolutionary technique for improving abbreviation systems has been attempted, with flexible abbreviation systems such as the "Word Compansion" project described in (Demasco et al., 1989) and in (Stum et al., 1991). These systems attempt to automate the methods humans use for creating abbreviations so that the computer can associate more than one abbreviation with a particular word. This means the computer will be able to handle "wk" as an abbreviation for any of "work", "wake", "walk", "wok", etc. The user can be freer with his abbreviations and the system's success does not rely on how well the user remembers the abbreviation the computer knows for the word he or she desires. The very benefit of accepting a single abbreviation for many words is a problem because the computer is faced with disambiguating the proper word from a list of many candidates. Since this is a computational burden rather than a burden on the user, it is a more desirable solution. In order to expand the abbreviations, the system assumes the letters in the abbreviation are in the same order that they occur in the word. This makes expansion similar to a matching task: it assumes variables between the known letters and tries to match this form to the more than 5000 words in the dictionary the system currently uses. The problem with this is that 5000 words is a small dictionary for the requirements of everyday communication so it is desirable to have a larger dictionary. With this dictionary the number of matches for a given abbreviation may be very high and this means the user could still have to expend a considerable amount of time and effort to find the desired abbreviation among many possibilities. The task as always, is to improve the behavior of the flexible abbreviation device, and in this case that will be done by reducing the number of candidates for the expansion of any user-created abbreviation. I have discussed some ways communication devices have been improved using statistics and prediction techniques; however, it has been shown that all of these strategies have their limitations. One obvious solution which has not been fully developed in the field of Augmentative Communication is to exploit linguistic knowledge. A priori this seems to be the best solution because it uses exactly the knowledge the user draws from when he constructs sentences. Only PAL, with its category pairs, attempted to use syntax. This was very rudimentary; however, as it only considered statistical word pairs, and therefore missed many of the generalities in the language. For example consider the sentence "The man saw the dog eating his food." If the system can only look at word-pairs it will not know that the word-pair "the man" is likely to be followed by a verb whereas the word-pair "his food" is not because of its overall position in the sentence. A more refined model of syntax which considers the entire sentence preceding the current word could increase accuracy and yield more efficient communication using the augmentative communication devices I have been describing (Yang et al., 90). 1.4 Linguistic Improvements In this project I develop a prediction technique that exploits the linguistic rules of syntax.This will allow capturing the generalities of language rather than artifacts of the data the statistics were taken from. Using rule-based linguistic information rather than word distributions better models the way humans join words to construct sentences that have particular syntactic structures underlying them. Sentences are created by using a grammar of word categories and production rules about how the categories can be joined. If these rules are incorporated into a communication device, the users will be limited only by the information they use anyway as part of their language faculty. He or she is not limited by the computer's statistics and therefore is allowed the utmost flexibility in using language. One of few existing systems which exploit linguistic knowledge in the way I am proposing was developed at the A.I. DuPont Institute's Applied Science and Engineering Laboratories. This is a sentence "compansion" system (McCoy et al., 1990) that takes an abbreviated, or compressed, sentence like "John walk dog" and expands it into the sentence "John walks the dog." This allows the user to produce grammatically well-formed sentences without the extra keystrokes necessary to indicate plurality and spell out non-content words like "the". This system uses a semantic parser that determines the role of each input word (i.e., verb, noun, etc.) and assigns theta roles to each noun in the sentence. The theta roles are determined according to a "frame" which is analogous to the theta-grid that Government and Binding Syntax posits as part of the lexical specification of verbs (Sells, 1985). For example the frame for the verb "study" indicates that the AGENT must be human and that it can take a THEME (i.e., an abstract or physical object) and a LOCATION (i.e., a physical place). The frame further specifies that the AGENT is required and both the THEME and LOCATION are optional. Using this semantic information about the input words, the system can fill out the sentence with the appropriate inflections and non-content words. For example the compressed sentence "John study red house" will be interpreted as either "John studies at the red house" or "John studies the red house" depending on how the semantic roles are filled. In this way the system uses semantic knowledge about the individual words the user enters to produce well-formed sentences that still require minimal effort from the user. This system uses semantics to solve the computational problem inherent in allowing compansion (i.e., COMPressed input exPANSION). This expansion problem at the sentence level is similar to the problem at the word level which I described in the flexible abbreviation systems: the user abbreviates the word and the computer must expand it. The syntactic prediction system I have developed can be used to aid word expansion in the same way that semantics was used for sentence expansion. I propose that the number of expansions of an abbreviation can be greatly reduced by considering the syntactic categories of the expansions in relation to the syntactic structure of the words the system has already processed. For example, if the user has entered the partial sentence "The boys" and the next word abbreviation is "ht", instead of offering the user a long list like "hit, hits, hot, hat, hate, hates, height, hunt, hunts, hurts, hurt, hut," the user will only be offered the plural verbs in this list because it will know that nouns and adjectives are not appropriate once the head noun of a noun phrase has been identified. [3] In this case, the user will only need to choose from the words "hit, hate, hunt, hurt" rather than a list of twelve choices. Notice that this also increases the user's communication rate because he or she has fewer words to scan through before finding the desired word. Only the words syntactically appropriate to the context will be offered as possible expansions for the user's arbitrary abbreviations. With this strategy implemented in tandem with the Word Compansion system, the goal of achieving speedy communication while making minimal demands on the user will be within reach. In addition to its usefulness with flexible abbreviation expansion, the system I have developed is a prediction system that could be used to improve other communication devices by determining the syntactic form of the word that is likely to follow what the user has already entered. For example in a dynamic system like Meta4, if the user has already entered a noun, and the user chooses the interval ST-SZ, the system could go directly to a page containing only verbs that begin with those letters. This would further increase communication rate because the user will have fewer words to look through before finding the desired one. Thus, by modeling syntactic knowledge in the computer, I can produce a system that can improve existing communication devices. The improvement provided is a more natural one for the user because it comes from the information humans use anyway when they communicate. It is not an ad hoc solution to the communication problems these people face, it is a solution motivated by the nature of the problem: an inability to use language in a "natural" unconstrained way. If we can make the machine use language the way a human does, then rather than being hindered by the technology the user's disabilities force him to use, both machine and human can cooperate to enhance the disabled person's communication. Chapter 2 THE PREDICTOR This chapter contains the implementation details of the syntactic predictor I have built, including a sample of its operation. I discuss the underlying concepts borrowed from Natural Language Processing as well as the computational formalism used. 2.1 Concepts and Definitions Syntax is the component of human language processing that describes phrase and sentence structure. It can be described with a finite set of rules specifying how word categories can be combined in well-formed sentences. Natural Language Processing (NLP) characterizes these rules as rewrite rules with the form X --> Y. These rules are meant to transform the expression on the left side of the arrow, X in this case, to the form on the right side, represented by Y (Allen, 1987). To illustrate these rules, which are also called "phrase structure rules", consider the sample context-free grammar below: (1) Rule number Left-Hand Side Right-Hand Side 1 S ---> NP VP 2 NP ---> N 3 NP ---> DET N 4 NP ---> NP PP 5 PP ---> PREP NP 6 VP ---> V 7 VP ---> V NP 8 VP ---> V NP PP 9 DET ---> a\an\the\some... 10 N ---> John\man\dog... 11 V ---> walk\hit\open... 12 PREP ---> with\of\in... Parsing is the process of applying rules like these to a sentence to break it down into its component parts. The result is a "parse tree" that shows the syntactic categories and functional relationships between the constituents in the sentence. Applying the rules above to the sentence (2) The man walked the dog. gives the following parse tree, or "parse", shown in computational notation: (3) (S (NP (DET the) (N man)) (VP (V walked) (NP (DET the) (N dog)))) A noun phrase is labeled "NP", verb phrases are "VP", and each word is given an appropriate category label such as "DET", "N", or "V". This structure represents the more commonly known tree structure below: (4) [Parse tree here] To generate this parse, the computer needs to search all the possible combinations of the rules in grammar (1). This becomes complicated because grammars normally have different ways of expanding constituents, as in the case of the NP's and VP's in (1). Any of these combinations might be possible, so the computer must try them all until it finds the right one. The final parse ends up being a subset of the overall search space. The search space itself can be very large; the search space for the small grammar in (1) looks like Figure 2.1 below. Vertical dots are used to indicate where parts of the search space have been left out. Figure 2.1 Search Space for a Context Free Grammar [Parse tree here] This is only part of the complete search space. In reality it is infinitely deep because of recursive elements like the NP. Each time an NP occurs it can be broken into three different groups of constituents, here represented by nodes 2, 3, and 4, which correlate with the grammar rules having the same numbers. Since rule 4 (as well as 5, 7, and 8) has NP's as part of its structure the search tree can never be completely expanded. I have drawn this tree in a manner that illustrates the different rules that can expand a constituent. When the daughters of a rule number are tied together with an arc it means that both these elements must be present in the input for the rule to be successful. The grammar rules 9-12 provide the primitives, which are the lexical items in the sentence, for the tree. In parsing, if the actual item in the sentence is one of the lexical items in rules 9-12, then that rule can be considered complete. In order to find the parse tree of (4) the computer traverses the search space in Figure 2.1. The method it uses might be a "top-down, depth-first" method, where it starts at the top S node and tries to make its way down to the primitives, where it can check them against the actual input. [4] Processing will start at the top and go down the tree, starting on the left side, as far it can. When it reaches a primitive or a point where no rules apply to the input, the processing backs up and goes down another branch of the tree. For example, consider the search space in Figure 2.1 while parsing to get the structure in (4). The computer first uses rule 1 to expand S into NP1. Then it tries rule 2 and finds that it needs an N. Since the first word of the sentence (2) is "The" this path fails and the processing backs up to the NP1. Next it tries rule 3 and finds that it must complete a DET and this succeeds with the word "the". Because the DET is connected via an arc to a N path, the processor must complete both paths before rule 3 will be successful. It therefore backs up to try look for the N in the other part of rule 3. This succeeds with "man" and so rule 3 is completed and the processing returns to NP1. The next branch of the tree is that generated by rule 4. In this case, the input is the word "walked" and the computer will try this rule, fail, and processing will continue to the VP. Here again there are 3 possible rules for expanding the rest of the sentence. Taking the left-most branch gives a single verb generated by rule 6. This would work with the input "walked" and so it is taken. But now the rest of the sentence is "the dog" and the processing will continue trying rules 7 and 8 to account for that noun phrase. When it reaches the end and it finds no rule that includes it because both 7 and 8 expect verbs next, the computer will back up and choose not to take 6 (undoing what it has already done). It will take rule 7 instead and since this is composed of a V and an NP, this rule will succeed. Since there is no more input the processing will stop; the parse in (4) having been found. In this way the computer tries each path in the search space, beginning from the left-most one, until a successful traversal is found 2.2 ATN Formalism From an NLP standpoint, a parser is a machine that, when given a grammar written according to its specifications, will carry out the search process I described in the previous section. The Augmented Transition Network (ATN) (Woods, 1969) is one such machine that has been very successful in NLP implementations and which is widely available (Bates, 1978). This machine actually has more computational power than what is needed for the processing described above, but here I will discuss the ATN only as it can be used for the parsing natural language grammars. The parser itself is implemented to perform a top-down, left-to-right parse using the method I described in the previous section. Note that a grammar of English is intended to account for all and only the grammatical English sentences. Consequently, there is a problem with the grammar given in (1) because it is unrestricted and would allow sentences like (6): (6) a. *Boy walk dog. b. *John hit. c. *The girl cried the man. These parses could be eliminated by adding tests of particular features of each word to the word categories specified in the rules of (1). For example sentence (6c) could be ruled out by a test checking to see if the main verb has the feature "intransitive". If the verb is intransitive, it cannot have a noun phrase following it as a transitive verb requiring a direct object would. Processing would be carried out in this case such that before the ATN executes a grammar rule, it executes any tests that are specified within the grammar to check if that particular rule is applicable to the input. Aside from eliminating ungrammatical sentences, tests can be used to make the processing more efficient. For example assume, as was the case in the example in the previous section, that the next word of input is "walked" and the rule that the parser is considering is one for expanding an NP. The test might say "if the input word can begin an NP, then execute this rule, otherwise proceed to the next rule." In this way the grammar itself can restrict the amount of searching the parser needs to do and rule out some bad sentences without complicating the phrase structure rules. In addition to declaring whether or not particular sentences are grammatical, it would be more useful to build up a parse tree like that seen in (4). The ATN does this by attaching actions to the grammar rules. When a rule is executed, these actions are performed to assign the input its grammatical category and build the tree structure. The actions might also test the structures that have already been built and on that basis interrupt a particular rule's execution. For example, if the main verb does not agree with its subject or does not have the inflection a preceding auxiliary calls for, the path the parser is following will fail and the processing will be forced to back up and try another parse. The structures that are built are stored in a set of "registers", which are place-holders for information which can be used later in tests, actions, or constructing additional structure. For example if a sentence is determined to be passive, actions could be constructed to assign the structure held in the "subject" register to the object register, thus making room for the new subject. The ATN represents actual grammar rules, such as those in (1), in the form of networks which show a transition from one state to the next. This transition is analogous to each step towards completing the rule; a phrase structure rule like "NP --> DET N" has a transition between NP and DET and one between DET an N. The transitions are depicted as arcs in a network as follows: (7) [ATN diagram here] The double-circle around the NP node identifies it as the start state of the network. The labels of the intermediate states show what constituents of the rule have been completed (i.e., NP/DET means an determiner has been processed already in the NP network). The final state is the one having the arc labeled "POP", which is an indication that the rule is complete. The formalism provides several different ways of describing the transitions between parts of a phrase structure rule. The most useful is the CAT arc, which checks to see if the category specified by the phrase structure rule matches that of the input. The CAT arc might have the following form, given in LISP notation: (8) (CAT DET t (setr DET *) (to NP/DET)) In this arc, "CAT" is a label telling the parser what sort of processing is necessary, in this case to check the category of the input word. The symbol "DET", for determiner, specifies the category that the phrase structure rule is looking for. The "t" is in the position where a test like those I described earlier would go. Since the act of checking the category will tell whether or not the transition can be made, no test is necessary and a dummy test allows processing to continue. The "(setr DET *)" is the action that assigns the word of input, represented by *, the name of its syntactic category. The "(to NP/DET)" tells the parser where to go next, in this case to the state after the DET transition has been made. Other transitions are programmed in the same way with appropriate tests and actions. The main difference is in the first label signifying what kind of processing the parser needs to do in order for the transition to be completed. One of the most important kinds of transitions, or arcs, is the "PUSH" arc. This accounts for the recurrence of constituents like the NP in many rules. It signals the parser that it needs to temporarily leave the present rule and process the rules for expanding the NP. These are represented by separate networks, and because they can be used over and over again, the size of the grammar is small in relation to the size of the sentence structure it can account for. When the NP is completed, the transition has been completed and the parser returns to the original network to continue working on a particular phrase structure rule. Other kinds of arcs include WRD arcs, which allow a phrase structure rule to specify that a particular word be in the sentence; JUMP arcs, which allow for processing to proceed to a different state without any actions or checking being done; MEM arcs, which require the word of input to be one of a particular set of words; and POP arcs which signal that a network is complete and provide for building larger structures out of the constituents most recently processed. A special kind of arc called the VIR arc helps to account for movement in English. There are certain English sentences, such as wh-questions, in which a constituent moves from its original position in the sentence into a new position at surface structure. The object of the sentence might be moved out of object position and replaced with a wh-word, as in the sentence (9) What did John eat? The underlying structure of sentence (9) is (10) John did eat what. The ATN processes (9) by using a "hold-list" and VIR arcs to return the moved constituent to its original position. When the computer encounters the wh-word "what" it is processed as an NP and put on the hold-list. A VIR arc occurs in the grammar at the place where the constituent has moved from (i.e., in object position of sentence (10)). When a VIR arc is encountered in the grammar, instead of looking for a constituent in the string of input, the NP is taken from the hold-list to satisfy the phrase structure rules. With this mechanism, the ATN can undo transformations that have occurred to derive the surface structure it is processing. The VIR arc is used to signify the positions from which a constituent could have originated and the "hold-list" allows the parser to wait before assigning a constituent its position in the final sentence structure. This process is used whenever sentences are left with "holes" after movement has occurred, as is the case with relative clauses as well as the wh-movement explained here. 2.3 The Prediction Problem As I mentioned previously, the ATN has proven very useful for problems in natural language processing. It is not useful for prediction; however, because it follows one parse at a time and backtracks if it reaches a dead-end. To do prediction, the system must take a partial sentence and return the features and category of the next input word. But because the ATN does not follow all parses at once, it does not have access to all possible next words of input. This is especially clear when words with category ambiguity are used in sentences. For example, consider the simplified grammar network below: (11) [ATN diagram here] If the system only has the partial sentence "the" and the word "gold" is entered, the parser does not know whether "gold" is an adjective or a noun. The ATN, parser as I have described it, would choose one path down the network and follow it. Consequently it may not adequately predict the category of the word that follows "gold": with network (11) it will predict a noun to be next as if the sentence were "the gold ring is beautiful." It is just as likely that a verb could be next; however, such as if the sentence were "the gold is in the bank." As a result of this "one-at-a-time" method of parsing, the ATN may be forced into continual back-tracking each time a word is entered. With each path change, possible predictions would be unaccounted for because the computer would only be following one path at a time. If the computer took "gold" to be an adjective, at that point in the processing it cannot predict that the next word could be a verb as well as a noun. This means that the prediction would be incomplete in a significant number of cases because a typical grammar requires a large number of paths to account for the many structures in English. In addition, this would make the processing much slower and therefore it would be difficult to use this system for spontaneous communication. The way to solve this problem is to change the way the ATN parses so that it will complete all possible parses at once. This is done by making the processing do a top-down, breadth-first traversal of the search space and in this way the ATN simulates a parallel processing mechanism that will generate all possible parses at once. Now the ATN analyses "gold" as a noun in one parse and as an "adjective" in another. When the next word is entered, it may eliminate one of these interpretations, or else the ATN will continue both parses until the entire sentence has been entered. Either way, the parser is able to know at any point in the sentence what type of word could be next, because it is holding all possible structures for the words entered thus far. 2.4 Implementation The parser I have built solves the prediction problem by traversing the search space in Figure 2.1 in a breadth-first, rather than top-down manner. This means that it completes the first transition in each phrase structure rule before going deeper in the tree. Essentially, the depth-first parser needs to only maintain one parse at a time; however, this breadth-first parser constantly maintains all the partial parses so that at each point it knows all the categories that could be used to complete a grammatical sentence. When a new word is given, each parse incorporates it into the structure it has been building. If the word cannot be included in a parse, that parse is eliminated from further consideration. This means the processing is done in a non-deterministic fashion, and therefore complete predictions can be made because the computer has not committed itself to a particular parse that may turn out to be different from what the user intended. This also means that when the entire sentence has been entered, the parser may have built more than one structure for a particular sequence of words. Because of this exhaustive analysis of the search space, the parser can also account for different structures underlying the same words. For example, consider the sentence: (12) The man told the woman that he loved the story. The user could have meant either that the indirect object is "the woman that he loved" and the object be "the story" or that the indirect object be "the woman" and the object be "that he loved the story." The predictor will output both these structures so that they could easily be analyzed further by a semantic or pragmatic processor that may eliminate one interpretation based on the context the user has been building. This predictor has been implemented in SUN Common LISP. There is also an early implementation in Franz Lisp. It is intended as a component in a more complex communication system and as such, there has been little attention paid to the user interface. Presently the system is activated with the command "predict" and a partial sentence given as its argument. The system goes as far as it can with that partial sentence and then goes into a "break package" where the user can decide between two methods of proceeding. The first method allows the next word in the sentence to be entered. It incorporates that word into the partial parses already created by the system and then reenters the break package. At each point when a parse is completed, the system prints out that parse tree. These parses are not final analyses, as they can still be given additional words that will be incorporated into them. The system halts only when there is no possible way of continuing the parse given the input it already has. In this case the predictor returns "nil." The second method for continuing from the break package is to enter a series of words (e.g., possible abbreviation expansions) and the system returns those words which could possibly be next, given the partial sentence it has already processed. Once the eliminations have been made, the break package resumes and the user is again given the two choices for proceeding until he signals that he wants to quit. An example of this operation is given in the following section. The grammar that the predictor uses to create and judge grammaticality is described in more detail in Chapter 3. Recall that part of the function of the grammar arcs is to carry out tests of particular features on the input words to determine if it is efficient to carry out a particular rule. These features are encoded in the dictionary entries for each word that the computer knows. The dictionary and the features within it are described in more detail in Chapter 4 on the lexicon. In order to help with adding words that the user wants to use but that the computer does not have in its dictionary there is an auxiliary package used at run-time to check each word entered against those in the dictionary. When the computer finds a word it does not know, this package allows that word to be added automatically in the dictionary. The package gives the user directions for entering the appropriate features for each word to ensure that the dictionary entry is of the form the grammar expects (cf. Chapter 4). 2.4 Performance What follows is a brief demonstration of the way the predictor works. This is actual output from the system as the sentence "The gold key on the table opened the door easily" is entered in parts. Note that it is possible to enter more than a single word at once, as was the case with the phrases "on the table" and "opened the door" in the example. During the course of entering this sentence, various lists of words were given to the system for it to choose those that could grammatically follow the part of the sentence that the system has already processed. These lists are meant to show how the predictor can eliminate in appropriate input, and do not reflect what might be logically entered in a discourse. When a complete sentence is formed with any of the words entered via method 1 or with words in a list of possibilities, the sentence parse incorporating that word is printed. (predict `(the)) The parser has gone as far as it can with this sentence. You can choose to proceed in two ways: Method 1 allows you to enter the next word to be completed. The computer will then advance the parse as far as it can with this new word and then bring you back to this point. Method 2 allows you to enter a list of words and the computer checks these words and tells you which of them could be used to advance the parse a further step. At that point you are returned to this point and you can choose to take one of those words or to check another list. The sentence parsed so far is: (THE) Now if you would like to use Method 1 type 1. If you would like to use Method 2 type 2. If you would like to quit this program type 0. Enter your selection now: 1 Enter the next word: (gold) This word was not in the dictionary GOLD Do you wish to enter it? Choose (y) or (n) (y) It will be entered now. Enter the categories for this word. Possibilities: DET PRO PREP ADV N ADJ V CONJ Enter the category in list form (i.e., enter `(N)'). If a word belongs to more than one category make this a list also (i.e., `(DET PRO)') (N ADJ) Enter the features for the word that have values. Your choices are: FOR N: NUMBER FOR ADJ: NUMBER ZONE Remember that all values for different interpretations of the word must be entered in the same dictionary entry. Enter these features with their appropriate arguments in list form. For example `(TAKES (SGCT) NUMBER (SG))'. (NUMBER (SG) ZONE (3)) Enter the toggled features for the word. Possibilities are: FOR ADJ: QUANT, CENTRAL-DET, PRE-DET, PRED FOR N: CARDINAL, MASS, COUNT, PROPER, PRE-DET, MENSURAL Remember to include all that apply to this word in all of its meanings. Enter them in list form like `(ART CENTRAL-DET)' (MASS) Enter the root of the current word. Only VERBS require roots, but other inflected words may have them. If this word has not root, enter `nil'. nil The parser has gone as far as it can with this sentence. You can choose to proceed in two ways: Method 1 allows you to enter the next word to be completed. The computer will then advance the parse as far as it can with this new word and then bring you back to this point. Method 2 allows you to enter a list of words and the computer checks these words and tells you which of them could be used to advance the parse a further step. At that point you are returned to this point and you can choose to take one of those words or to check another list. The sentence parsed so far is: (THE GOLD) Now if you would like to use Method 1 type 1. If you would like to use Method 2 type 2. If you would like to quit this program type 0. Enter your selection now: 2 Enter the list of possible next words: (key keys on in open the a) [5] (DCL (CP (SPEC NIL) (CBAR (HEAD NIL) (COMP (IP (SPEC (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD GOLD) (NU (SG)))))))) (IBAR (HEAD (AGR 3PLPRES)) (COMP (VP (SPEC NIL) (VBAR (HEAD OPEN)))))))))) The grammatical possibilities are: (KEY KEYS ON IN OPEN) The parser has gone as far as it can with this sentence. You can choose to proceed in two ways: Method 1 allows you to enter the next word to be completed. The computer will then advance the parse as far as it can with this new word and then bring you back to this point. Method 2 allows you to enter a list of words and the computer checks these words and tells you which of them could be used to advance the parse a further step. At that point you are returned to this point and you can choose to take one of those words or to check another list. The sentence parsed so far is: (THE GOLD) Now if you would like to use Method 1 type 1. If you would like to use Method 2 type 2. If you would like to quit this program type 0. Enter your selection now: 1 Enter the next word: (key) The parser has gone as far as it can with this sentence. You can choose to proceed in two ways: Method 1 allows you to enter the next word to be completed. The computer will then advance the parse as far as it can with this new word and then bring you back to this point. Method 2 allows you to enter a list of words and the computer checks these words and tells you which of them could be used to advance the parse a further step. At that point you are returned to this point and you can choose to take one of those words or to check another list. The sentence parsed so far is: (THE GOLD KEY) Now if you would like to use Method 1 type 1. If you would like to use Method 2 type 2. If you would like to quit this program type 0. Enter your selection now: 2 Enter the list of possible next words: (key keys open opened on in a the) (DCL (CP (SPEC NIL) (CBAR (HEAD NIL) (COMP (IP (SPEC (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (ATTRIB (DEGP (SPEC NIL) (DEGBAR (HEAD NIL) (COMP (AP (SPEC NIL) (ABAR (HEAD GOLD))))))) (HEAD KEY) (NU (SG)))))))) (IBAR (HEAD (AGR 3SGPAST)) The grammatical possibilities are: (OPENED ON IN) The parser has gone as far as it can with this sentence. You can choose to proceed in two ways: Method 1 allows you to enter the next word to be completed. The computer will then advance the parse as far as it can with this new word and then bring you back to this point. Method 2 allows you to enter a list of words and the computer checks these words and tells you which of them could be used to advance the parse a further step. At that point you are returned to this point and you can choose to take one of those words or to check another list. The sentence parsed so far is: (THE GOLD KEY) Now if you would like to use Method 1 type 1. If you would like to use Method 2 type 2. If you would like to quit this program type 0. Enter your selection now: 1 Enter the next word: (on the table) The parser has gone as far as it can with this sentence. You can choose to proceed in two ways: Method 1 allows you to enter the next word to be completed. The computer will then advance the parse as far as it can with this new word and then bring you back to this point. Method 2 allows you to enter a list of words and the computer checks these words and tells you which of them could be used to advance the parse a further step. At that point you are returned to this point and you can choose to take one of those words or to check another list. The sentence parsed so far is: (THE GOLD KEY ON THE TABLE) Now if you would like to use Method 1 type 1. If you would like to use Method 2 type 2. If you would like to quit this program type 0. Enter your selection now: 2 Enter the list of possible next words: (open opened in a the) (DCL (CP (SPEC NIL) (CBAR (HEAD NIL) (COMP (IP (SPEC (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (ATTRIB (DEGP (SPEC NIL) (DEGBAR (HEAD NIL) (COMP (AP (SPEC NIL) (ABAR (HEAD GOLD))))))) (HEAD KEY) (NU (SG)) (COMP (PP (SPEC NIL) (PBAR (HEAD ON) (COMP (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD TABLE) (NU (SG))))))))))))))))) (IBAR (HEAD (AGR 3SGPAST)) (COMP (VP (SPEC NIL) (DCL (CP (SPEC NIL) (CBAR (HEAD NIL) (COMP (IP (SPEC (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (ATTRIB (DEGP (SPEC NIL) (DEGBAR (HEAD NIL) (COMP (AP (SPEC NIL) (ABAR (HEAD GOLD))))))) (HEAD KEY) (NU (SG)) (ADJUNCT (PP (SPEC NIL) (PBAR (HEAD ON) (COMP (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD TABLE) (NU (SG))))))))))))))))) (IBAR (HEAD (AGR 3SGPAST)) (COMP (VP (SPEC NIL) (VBAR (HEAD OPEN)))))))))) The grammatical possibilities are: (OPENED IN) The parser has gone as far as it can with this sentence. You can choose to proceed in two ways: Method 1 allows you to enter the next word to be completed. The computer will then advance the parse as far as it can with this new word and then bring you back to this point. Method 2 allows you to enter a list of words and the computer checks these words and tells you which of them could be used to advance the parse a further step. At that point you are returned to this point and you can choose to take one of those words or to check another list. The sentence parsed so far is: (THE GOLD KEY ON THE TABLE) Now if you would like to use Method 1 type 1. If you would like to use Method 2 type 2. If you would like to quit this program type 0. Enter your selection now: 1 (ENTER THE NEXT WORD) (opened the door) This word was not in the dictionary DOOR Do you wish to enter it? Choose (y) or (n) (y) It will be entered now. Enter the categories for this word. Possibilities: DET PRO PREP ADV N ADJ V CONJ Enter the category in list form (i.e., enter `(N)'). If a word belongs to more than one category make this a list also (i.e., `(DET PRO)') (N) Enter the features for the word that have values. Your choices are: FOR N: NUMBER Remember that all values for different interpretations of the word must be entered in the same dictionary entry. Enter these features with their appropriate arguments in list form. For example `(TAKES (SGCT) NUMBER (SG))'. (NUMBER (SG)) Enter the toggled features for the word. Possibilities are: FOR N: CARDINAL, MASS, COUNT, PROPER, PRE-DET, MENSURAL Remember to include all that apply to this word in all of its meanings. Enter them in list form like `(ART CENTRAL-DET)' (COUNT) Enter the root of the current word. Only VERBS require roots, but other inflected words may have them. If this word has not root, enter `nil'. nil (DCL (CP (SPEC NIL) (CBAR (HEAD NIL) (COMP (IP (SPEC (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (ATTRIB (DEGP (SPEC NIL) (DEGBAR (HEAD NIL) (COMP (AP (SPEC NIL) (ABAR (HEAD GOLD))))))) (HEAD KEY) (NU (SG)) (COMP (PP (SPEC NIL) (PBAR (HEAD ON) (COMP (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD TABLE) (NU (SG))))))))))))))))) (IBAR (HEAD (AGR 3SGPAST)) (COMP (VP (SPEC NIL) (VBAR (HEAD OPEN) (COMP (OBJ (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD DOOR) (NU (SG)))))))))))))))))) (DCL (CP (SPEC NIL) (CBAR (HEAD NIL) (COMP (IP (SPEC (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (ATTRIB (DEGP (SPEC NIL) (DEGBAR (HEAD NIL) (COMP (AP (SPEC NIL) (ABAR (HEAD GOLD))))))) (HEAD KEY) (NU (SG)) (ADJUNCT (PP (SPEC NIL) (PBAR (HEAD ON) (COMP (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD TABLE) (NU (SG))))))))))))))))) (IBAR (HEAD (AGR 3SGPAST)) (COMP (VP (SPEC NIL) (VBAR (HEAD OPEN) (COMP (OBJ (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD DOOR) (NU (SG)))))))))))))))))) The parser has gone as far as it can with this sentence. You can choose to proceed in two ways: Method 1 allows you to enter the next word to be completed. The computer will then advance the parse as far as it can with this new word and then bring you back to this point. Method 2 allows you to enter a list of words and the computer checks these words and tells you which of them could be used to advance the parse a further step. At that point you are returned to this point and you can choose to take one of those words or to check another list. The sentence parsed so far is: (THE GOLD KEY ON THE TABLE OPENED THE DOOR) Now if you would like to use Method 1 type 1. If you would like to use Method 2 type 2. If you would like to quit this program type 0. Enter your selection now: 1 Enter the next word: (easily) (DCL (CP (SPEC NIL) (CBAR (HEAD NIL) (COMP (IP (SPEC (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (ATTRIB (DEGP (SPEC NIL) (DEGBAR (HEAD NIL) (COMP (AP (SPEC NIL) (ABAR (HEAD GOLD)))))) (HEAD KEY) (NU (SG)) (ADJUNCT (PP (SPEC NIL) (PBAR (HEAD ON) (COMP (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD TABLE) (NU (SG))))))))))))))))) (IBAR (HEAD (AGR 3SGPAST)) (COMP (VP (SPEC NIL) (VBAR (HEAD OPEN) (COMP (OBJ (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD DOOR) (NU (SG))))))))) (ADJUNCT (ADVP (SPEC NIL) (ADVBAR (HEAD EASILY) (COMP NIL))))))))))))) (DCL (CP (SPEC NIL) (CBAR (HEAD NIL) (COMP (IP (SPEC (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (ATTRIB (DEGP (SPEC NIL) (DEGBAR (HEAD NIL) (COMP (AP (SPEC NIL) (ABAR (HEAD GOLD))))))) (HEAD KEY) (NU (SG)) (COMP (PP (SPEC NIL) (PBAR (HEAD ON) (COMP (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD TABLE (NU (SG))))))))))))))))) (IBAR (HEAD (AGR 3SGPAST)) (COMP (VP (SPEC NIL) (VBAR (HEAD OPEN) (COMP (OBJ (DP (SPEC NIL) (DBAR (HEAD THE) (COMP (NP (SPEC NIL) (NBAR (HEAD DOOR) (NU (SG))))))))) (ADJUNCT (ADVP (SPEC NIL) (ADVBAR (HEAD EASILY) (COMP NIL))))))))))))) The parser has gone as far as it can with this sentence. You can choose to proceed in two ways: Method 1 allows you to enter the next word to be completed. The computer will then advance the parse as far as it can with this new word and then bring you back to this point. Method 2 allows you to enter a list of words and the computer checks these words and tells you which of them could be used to advance the parse a further step. At that point you are returned to this point and you can choose to take one of those words or to check another list. The sentence parsed so far is: (THE GOLD KEY ON THE TABLE OPENED THE DOOR EASILY) Now if you would like to use Method 1 type 1. If you would like to use Method 2 type 2. If you would like to quit this program type 0. Enter your selection now: 0 NIL > Figure 3.2 LUNAR grammar Chapter 3 THE GRAMMAR The linguistic theory I have adopted to guide construction of my grammar is discussed in this chapter. I explain my implementation of particular concepts from this theory and the implications of this approach for building a computational grammar of English. Some example parses are given to demonstrate the range of structures the grammar can produce. 3.1 Motivation This project aims to provide disabled users with a communication tool they can use for every day speech. Because of this it is necessary that the system be able to handle a wide variety of sentence structures; the grammar must be complete. Up to now, the biggest objection to using grammars for augmentative communication is that a sufficiently complete one is thought to be difficult to construct by hand. I have confronted this objection by making my grammar the embodiment of a syntactic theory from which I can derive an abstract, generalized description for a multitude of structures. This model of syntax is called X' pronounced "X-bar") theory and is borrowed from Government and Binding Theory. Its conventions make a complete grammar easy to construct and modify, while also providing a mechanism to describe the specific restrictions on what kinds of constituents can occur where. These restrictions are crucial to this project because the success of the syntactic predictor depends on it being able to eliminate categories that are not possible in a particular context. Thus, the use of X' syntax facilitates both completeness and restrictiveness in the grammar for this system. 3.2 X' syntax All of the three most popular syntactic theories, Government and Binding (GB), Generalized Phrase Structure Grammar (GPSG), and Lexical-functional Grammar (LFG), have adopted forms of X' theory because of its explanatory power (Sells, 1985). This power comes from abstracting away the content of the phrase structure rules I discussed in Chapter 2 so that only the structural description is left. Consider the phrase structure (PS) rules below: (1) NP --> N NP --> N PP VP --> V VP --> V NP VP --> V PP Notice that these rules serve two purposes: to tell what particular constituents the phrases on the left hand side of the rule can be broken into and also to give the position, or structure, of these constituents. Notice that the structure is similar among different phrases, for example both an NP and VP can be rewritten as just an N or V, respectively. In addition, they can both be rewritten with the N or V plus another constituent to the right. Thus, there is a fair amount of uniformity in the structures of different kinds of phrases. X' Theory tries to capture this similarity by claiming that the basic syntactic structure is given by the following template: (2) [Parse tree here] This generalized structure is motivated by the similar patterns found in the internal structure of different kinds of phrases (i.e., noun phrases, prepositional phrases, verb phrases): they all have a head constituent, complements and various other modifiers that can come either before or after the head. In the template, the head is represented by the variable X. This is the element that gives the phrase its character; for example, the head of the NP is an N, the head of a PP is a P, and the head of the VP is the V. The entire phrase is said to be a "projection" of the head; a structure built up using this template is called a "maximal projection" because the entire template structure has been used. It is also referred to as an "X-double-bar", reflecting the fact that it is the highest level of the template and includes all head modifiers. [6] I have adopted a formulation of X' theory that takes the intermediate X' level as the site where modifiers like adjectives and prepositional phrases are attached (Radford, 1988). The modifier, also called an adjunct, can be attached on either side of the X' so that it could be in either pre-head, as is the case with adjectives, or a post-head position as with prepositional phrases. The X' is recursive in that modifiers expand it into another X' level, making the structure: [7] (3) [Parse trees here] This means the intermediate X' level plays a crucial role in the syntactic structure. But this level is also the most non-intuitive: we would normally talk about a noun as the head of a phrase and the entire noun phrase, but we would not consider any predefined intermediate level. There is convincing evidence for the existence of this level, especially with coordination data. Coordination is when two words or phrases are conjoined with a conjunction like "and" or "but". Only constituents, which are structural units whose lexical items are immediately dominated by the same node in the syntax tree, can be conjoined, and furthermore, the constituents must be of the same type (i.e., they both must be NP's, VP's, N's, etc.). Consider the data below: (4) *John rang up his mother and up his sister. [Constituent and non-constituent conjoined] (5) John rang up his mother and his sister. [Two constituents conjoined] (6) *John wrote a letter and to Fred. *John wrote to Fred and a letter. [Constituents of different typed conjoined] (7) John wrote to Mary and to Fred. John wrote a letter and a postcard. [Constituents of the same type conjoined] Now consider the noun phrase "The king of England" in the following example, taken from (Radford, 1988, p. 174-175). Under X' theory, this noun phrase has the following structure: (8) [Parse tree here] The question is how we know that [king of England] is a predefined constituent called N'. In response to this question, observe the following coordination data: (9) Who would have dared defy the [king of England] and [ruler of the Empire]? (10) Who would have dared defy the [leader of the army] and [king of England]? Given the restrictions on what kinds of structures can be conjoined, the grammaticality of (9-10) show that [king of England] must be a constituent to itself. Notice too that sentences (11) and (12) would not be ungrammatical. In these cases, the conjoined constituent is a N" (or NP) rather than a N'. (11) Who would have dared defy [the king of England] and [the ruler of the Empire]? (12) Who would have dared defy [the leader of the army] and [the king of England]? It is also possible to find evidence for the intermediate N' constituent in Shared Coordination data. This is a type of coordination wherein a constituent is "shared" between two conjuncts. For example in the sentences: (13) John walked (and Mary ran) [up the hill.] (14) John will, and Mary may, [go to the party.] Analysis of these kinds of data have shown that Shared Coordination is only possible where the shared string is a possible constituent of each of the conjuncts (Radford, 1988, p. 78). Because of data like (15-16) below, we can conclude that [king of England] must make up a constituent in its own right, which is separate from either the head noun or the entire noun phrase. [8] (15) He was the last (and some people say the best) [king of England]. (16) He was the last (and some people say best) [king of England]. Thus, it seems that there is good reason to accept the intermediate level posited in structure (2). [9] The most basic construction of this intermediate level (i.e., not considering adjuncts) includes the phrasal head and its complement, which is also called its argument and which is structurally its sister. The head of the phrase subcategorizes for its complement, meaning that it requires a particular kind of complement to occur with it. For example if the phrasal head were a verb, it would subcategorize for a particular kind of object (often a noun phrase), and if the head were a preposition, it would subcategorize for a noun phrase. I will discuss more about the kinds of complements heads can subcategorize for below and in Chapter 4. The sister to X' is called a specifier, which has the function of expanding the X' completely into a maximal projection. There is some discussion among X' theorists about what kind of constituent can occur as a specifier. The position I am accepting for constructing this grammar is elucidated by Thomas Ernst, who treats the position as "the response of syntax to the need to give special status to some particular peripheral element: demonstratives, subjects, etc. (Ernst, 1990, p. 25)." [10] He proposes that the specifier position is to be used to ensure that certain elements are always phrase-initial position; for example, following data borrowed from (Ernst, 1990, p. 9) shows that it is not possible to reorder any words to produce emphasis: (17) a. A fancy new car. b. A new fancy car. (18) a. The many honest men. b. *The honest many men. For his definition then, the specifier position is one that is used to describe ordering constraints like these. As such, it does not involve restrictions on the kinds of structures can occur in this position (i.e., specifiers do not have to be maximal projections, contrary to a widely known Chomskian theory). [11] This interpretation is also described in (Quirk et. al., 1985), where it is explained that some words, such as determiners and particles, are "single tokens, complete to themselves." I have adopted this interpretation of the specifier position because often the words that show up in specifier position are those that Quirk calls "single tokens" and therefore they do not form maximal projections of their own. This is not always the case, for I will show that a sentences's subject occupies a specifier position; however, the single tokens occur often enough for it to be computationally inefficient to require that all specifier positions be maximal projections. In my grammar, the specifier position can contain either a lexical item or a maximal projection, depending on the head of a particular specifier's projection. In the syntactic "template" I have been discussing (i.e., structure (2) above) only the head element is required. The specifier, complement, and adjuncts are all optionally present for any given instantiation of the template. The presence of the complement is determined by the head itself. [12] For example consider the case of the head being a verb, which would make the XP a VP. A transitive verb like "kill" requires a complement, as shown in the following data: (19) John killed Mary. *John killed. Conversely, an intransitive verb like "cry" does not take a complement: (20) John cries. *John cries Mary. [13] The power of the phrasal head to dictate its complement is called subcategorization. [14] This ability is common to all phrasal heads, although there are some behaviors phrasal heads may or may not exhibit that have prompted syntacticians to distinguish different types. Here I will adopt the taxonomy explicated by Susan Rothstein (Rothstein, 1991), who distinguishes three kinds of heads: lexical, functional, and minor. Lexical heads are those like verbs or prepositions. They determine the character of their maximal projection so that if the head is a V, the projection is a verb phrase (VP), or if the head is a P, the projection is a prepositional phrase (PP). These words have very specific requirements on the number of complements they must have and this number must always be satisfied for the phrase to be realized. [15] The second kind of head is called a functional head because of its functional role in a phrase. These heads determine the nature of their maximal projection just as the lexical head does, but they are not necessarily realized as a lexical item. For instance the INFL, which is considered the head of a sentence, is typically said to be realized by the tense and agreement of the main verb, and sometimes as a modal. Since the head of the sentence is called INFL, X' theory describes the canonical sentence as an IP, or "Inflection Phrase." [16] The 4 major heads in this category are INFL, which holds inflection, DET, which holds the determiner and also determines agreement, and COMP, which holds the complementizer "that" in embedded clauses and whose specifier position holds WH-question words. I will discuss each of these in more detail in the implementation section below. The minor heads are also functional heads in the sense that they are not frequently lexicalized, but unlike the previous two head types, the minor heads do not determine the nature of their projection. This is done instead by the complement they subcategorize for depending in some cases on their position in the sentence. Typical minor heads are degree words like "too" or "as" and therefore the head is called DEG, for degree. The "degree phrase" is discussed more completely in the implementation section below. [17] The grammar I have built is based on these heads and the structures they generate via subcategorization. Since there is a finite number of instantiations for these head types, I have eliminated the need for Phrase Structure rules that tell exactly what constituents go where. Instead the structure of (2), reproduced below, provides the order and relationships of constituents to one another and all other information comes from the requirements (i.e the subcategorization) of the specific words themselves. (2) [Parse tree here] The grammar has both top-down and bottom-up motivation in the sense that while the template structure must be satisfied, the input itself determines how that will be done through subcategorization. This simplifies writing the grammar because now its not a question of including enough PS rules, but of providing the structure and allowing the subcategorization to determine when it is appropriate. 3.3 Implementation of X' Theory Recall in my discussion of the ATN formalism that I characterized a PUSH arc as the mechanism's way of handling frequently occurring constituents such as noun and prepositional phrases. The PUSH arc gives the ATN more power because it allows recursive processing (i.e., a given network can refer back to itself, as in the PS rule NP --> NP PP). It was created for its computational power; without recursion, grammars are enormous and frequently redundant. In this project I propose a linguistic motivation for the use of PUSH arcs- namely as the means for creating a projection in the X' theory template. All maximal projections are the result of 2 PUSH arcs: a PUSH arc for the X' level and a PUSH for the X'' level. Each PUSH arc adds more structure to the level below, so that the implementation of the template in (2) is done with the following networks: (21) [ATN diagram here] For every instantiation of X there is a pair of networks such as these. Each time a maximal projection appears in a network, it is done by PUSHing for that XP. Within that XP, an X' constituent is PUSHed for because it is a projection of the head. Only the head is a non-PUSHed element, just as only the head is a non-maximal projection in a rule. [18] In Figures 3.1 and 3.2 on the following pages, it is clear how the structural template of X' theory, as implemented in the two networks in (21) has greatly improved the task of writing a grammar. Figure 3.1 shows the series of networks that make up the entire grammar I have implemented. Figure 3.2, reproduced from (Bates, 1983, p. 217-219), shows the networks for the LUNAR grammar, which was one of the earliest and most complete ATN grammars. The X' theory grammar has a greater coverage than the LUNAR grammar and is a more sleek implementation. This makes modifying it to cover more syntactic phenomenon (e.g., topicalization, ellipsis, etc.) easier because it is immediately clear where new pieces of grammar need to be added to account for these structures. In the following sections I discuss the details of implementing network pairs for each of the heads. In Chapter 5, I will further discuss the implications and deviations of this implementation from Government and Binding Syntax, whence I borrowed the X' theory grammar framework. 3.3.1 Sentence level As I mentioned previously, GB theory posits a functional category called "INFL" for "inflection" in order to apply the X' template to the sentence level. This category holds the tense and person-number agreement information for the sentence. This information can be lexicalized in the form of a modal if the clause is finite or as a "to" in the case of non-finite (e.g., infinitival or participial) clauses. [19] INFL can not hold a modal and a "to" at the same time because sentences must be either finite or non-finite. It is possible for INFL to be unlexicalized, as when there is just a single verb in the sentence. In this case the agreement and tense are considered to be in INFL, but lexicalized only on the main verb. The INFL head and the X' template of (2) produce an analysis for a sentence traditionally analyzed as [NP VP] with the following form: (22) [Parse tree here] The head of this structure is INFL, the complement of the head is the VP and the specifier of I' is the NP. The overall structure is a maximal projection called an INFLection-phrase or IP. A problem with this structure arises when sentences have embedded IP's like (23) The committee may insist [that the chairman resign]. because the IP structure in (22) can not accommodate the "that", which is a complementizer that introduces the embedded clause. It would not be correctly analyzed as a determiner like "the" because it introduces the entire embedded sentence. Similarly, it would not be an object of "insist" because it has no reference as an independent pronoun in this sentence. This problem is solved in X' theory with the functional head COMP, or Complementizer. The COMP is the head of the overall sentence, holding the "that" in embedded clauses or being empty for top-level sentences. [20] The IP is in the complement position in the template, making the structure of sentences, called CP's for "Complementizer Phrase", like the following: (24) [Parse tree here] Making the sentence head a Complement is supported by the fact that Complementizers influence the content of the INFL node. Any clause that has a Complementizer must have an INFL that is compatible with it in terms of being finite or non-finite. For example a non-finite complementizer like "for" cannot introduce a sentence with a finite INFL: (25) *They are anxious [for you make up your mind.] But a non-finite INFL is acceptable: (26) They are anxious [for you to make up your mind.] My implementation of sentences reflects the structure shown in (24). This implementation has one major deviation from the GB formulation of X' theory, this being the content of the INFL node itself. In the structure my grammar produces, INFL holds a code showing tense and agreement, as X' theory stipulates, but it also holds the auxiliary verbs "have" and "be". This is counter to the notion that each X can only hold one lexical element, which is a central argument for why "have" and "be" do not appear in INFL in normal X' theory (cf. (Radford, 1988, p. 312)). [21] Aside from this, the only difference between modals and these auxiliary verbs seems to be that they determine verbal inflection on the verb that follows them (i.e., "have" requires a participial verb, and "be" requires a gerundive verb). In an attempt to capture this, whenever these auxiliary verbs appear in my INFL, they are accompanied by the appropriate inflection marker. Thus, if a sentence includes all the auxiliary verbs (e.g., "The man could have been sleeping"), the INFL node my grammar will produce has the following form: (27) (INFL (AGR 3SGPAST) (MODAL COULD) (HAVE EN) (BE ING)) The constituent AGR is the agreement marker wherein number and person agreement is combined with the verb tense. This is different from the standard X' conception of INFL as containing two binary variables: one for agreement and one for tense. I have combined these variables in order to account for lexical items whose agreement changes with their tense. Consider the sentences below: (28) The man hit the ball yesterday. [Intended past tense.] (29) *The man hit the ball today. [Intended present tense.] The problem here is that the same lexical item can be past or present, but have different agreement requirements. The sentence in (29) is ungrammatical because, in the present tense, "hit" can only agree with non-third person singular subjects. It was necessary to find a separate way to account for these two lexicalizations because it is not possible for the lexicon to have two separate entries for the same lexical item. By using these single variables, agreement could be preserved: it is possible to specify that "hit" can agree with 1SGPRESENT, 2SGPRESENT, 1PLPRESENT, 2PLPRESENT, 3PLPRESENT, and all of the PAST values (i.e., 1SGPAST, 2SGPAST, etc.). [22] The details of these lexicon entries are explained more thoroughly in Chapter 4. Implementing the INFL in this way means that the Verb phrase only contains a single V which is the main verb in the sentence. This is in contrast to the branching complex VP structure that would be necessary to hold "have" and "be" when they occur. [23] Eliminating these branching structures means extra PUSHes have been eliminated and this makes the processing more computationally efficient. In preserving the participial agreement requirements of "have" and "be," and also checking for these requirements in tests on the grammar rules, I have maintained the ability of these verbs to determine structure without any extra computation. The same result is produced; the branching structure is just different from what X' theory predicts. Subsequently I will refer to sentences as CP's headed by COMP and sentences without complementizers as IP's headed by INFL (cf. footnote 15). 3.3.2 Determiner Phrases The noun phrase can be organized around a functional head of the same type as INFL. This was demonstrated by Steven Abney when he argued for a DET head that is lexicalized by the determiner (Abney, 1987). The rationale behind this analysis stems from the fact that the determiner can subcategorize for its complement like lexical categories and therefore must have a more central role in the noun phrase than simply specifier, as shown in (8). For example in English there are some determiners that either can or can not take complements. Consider: (30) That is terrific. [Complement not required.] (31) *The is terrific. [Complement required.] (32) The boy is terrific. [Complement required.] (33) *A boys is terrific. [Particular type of complement required.] Thus, it is evident that the determiner acts just like the verb in shaping its projection. The functionality of DET (i.e., its similarity to INFL) can be seen more clearly where the determiner not only specifies a complement, but also determines its form. For example in English, as was shown in (32-33), a determiner like "a" requires a singular noun to follow it, while a determiner like "the" will allow either a singular or plural noun. This is seen more clearly in languages which mark for agreement between the determiner and the noun. For example, in Turkish we see the following data, borrowed from (Abney, 1987, p. 49): (34) a. el "the/a hand" b. (sen-in) el-in "you-GEN hand-2s" "your hand" c. (on-un) el-i "his hand" In this example it must be noted that Abney considers the personal pronouns as lexicalizations of DET in that they function to determine the form of the noun. [24] In the (34a), the generic, uninflected form of the word "hand," "el," is shown to have the indeterminate meaning "a/the hand". Datum (34b) shows that with the second person pronoun "your", the word for "hand" takes on a 2nd person inflection to agree with the pronoun: hence the form "el-in". A similar agreement is shown in (34c) where the third person pronoun-determiner requires a third person inflection on the head noun "hand". Thus, there is a relationship similar to that described by the INFL functional head between what serves as the determiner in DET and the noun. This example clearly shows that the noun is declined according to the specification of the determiner, just as INFL determines the inflection on the main verb at sentence level. [25] An analysis of DET as the functional head of the noun phrase serves to unify the X' theory analysis of the noun phrase with that of the sentence. The structure of the DP, reminiscent of IP, is shown below. Note that a relative clause can serve as either a complement or an adjunct. This is discussed in more detail in section 3.3.5.2 of this chapter.: (35) [Parse tree here] The most useful effect of this structure for the purposes of this project is that with the specifier positions of DP and NP, there is extra structure to account for the various types of words that can occur in these positions. Without this structure they would have to be considered adjuncts before the determiner or between the determiner and noun; however, there would be no significant ordering among these words. Consider the following noun phrases (36) a. [DET a] [NP-SPEC dozen] roses b. [DP-SPEC all] [DET 0] [NP-SPEC six] men c. *[DP-SPEC six] [DET 0] [NP-SPEC many] men d. [DP-SPEC all] [DET the] [NP-SPEC six thousand] men e. [DP-SPEC all] [DET the] [NP-SPEC many] men d. [DP-SPEC many] [DET the] [NP-SPEC 0] men Furthermore, if adjectives can come between determiners and nouns in relatively free order, why can't other words such as the quantifier "many"? Recall the example given in (17-18), reproduced here: (17) a. [DET A] [ADJS fancy new] car b. [DET A] [ADJS new fancy] car c. [DET The] [Q many] [ADJ honest] men d. *[DET The] [ADJ honest] [Q many] men It seems therefore, that there are certain types of words that must appear in particular positions within the noun phrase, and the extra specifier positions provided by (35) allow provide for these words. Following this analysis, I have implemented a structure wherein the Determiner is the head of the noun phrase. This means the noun phrase is a projection of Det and hence called a DP (i.e., determiner phrase). I will refer to what have traditionally been called noun phrases as Determiner Phrases throughout the rest of this work. In addition to arguing that Det is the head of the noun phrase, as I mentioned previously, Abney argues for generating pronouns in the Determiner position. While this is not to say that pronouns are determiners, it is a way of accounting for the fact that like determiners, pronouns have a primarily functional status. They provide agreement features like number, person, and gender and therefore influence the form of the verb that follows them. As we have seen in Turkish, they can also influence the form of the noun that follows them when the language is a highly inflected one. Pronouns are implemented in this grammar in the same position as determiners. This has the nice effect of allowing pronouns to be used with non-empty noun heads, as in: (37) We students are tired. Here, just as in Turkish, the number of the noun must agree with that of the pronoun (cf. "*We student are tired." and "*We student is tired."). This implementation also allows the possibility for determiners that do not require NP complements, (i.e., "that") to stand alone in DP's: (38) That is ridiculous. Note that the specifier position of DP's contains only certain kinds of determiners like "all" which can precede the articles. The other positions in the X' theory template for DP's are filled as follows: articles and other pronouns are the only elements in the head of DP position. There is only one kind of complement for DP's: a NP. It is possible to have adjective phrases occurring before the NP (cf. next section) but these occur in adjunct position rather than complement position. [26] 3.3.3 Degree Phrases To accompany his interpretation of the DP as a projection of a functional head, Abney posits an abstract head to describe adjective phrases and adverb phrases. Abney's explanation of this head, which he calls DEG, for "degree", makes it a head of the kind that Rothstein calls functional. Rothstein herself further refined the analysis of the DEG head by calling it a "minor" functional head. While empty with simple adjectives and adverbs (i.e., phrases such as "white hair" or "run quickly"), it is lexicalized by words like "how", "this", "that", "so", "too", "as", "more", "less", "all", "most", and "least". [27] Like INFL and DET, DEG can also give inflection to its head because it is also the place where comparative -er and superlative -est are specified. With his functional head, Abney tries to capture the generalities between adjective and adverb phrases, claiming that they are the projections of the same node. The structure he posits, has an adjective phrase being the only kind of complement and the adverb phrase as either a subcategory of adjective phrases or in the specifier position of an otherwise empty structure. Rothstein argues against this analysis in her characterization of the degree phrase as a "minor" functional category. As a minor category the DEG head does not determine the nature of the phrase (i.e., whether it is an adjective or adverb phrase). Instead, it subcategorizes for a particular kind of complement based on its position in the sentence, and this specifies the nature of the overall maximal projection. Rothstein accepts two kinds of complements that a DEG could call for: adjective and adverb. Examples where the same degree word can occur with different complements according to their sentence position are given below: (39) Adjective phrases as part of a noun phrase: a. too red shirt b. more rich man c. so sleepy kitten (40) Adverb phrases as part of a verb phrase: a. ran too quickly b. more frequently seen c. so completely exhausted In addition to adjectives and adverb that occur with degree words, there is a class of words, including "many", "much", "few", "several", and "little", which Abney and Jackendoff before him call quantifiers (Jackendoff, 1977). On the basis of this distinction Abney posits a third complement that the DEG could subcategorize for, making it a quantifier phrase (QP), such as: (41) a. too many b. as little c. +er few --> fewer The quantifier can occur in the complement position of the degree phrase in the same way as adjectives and adverbs do. Rothstein argues against this analysis by denying the distinction that the quantifier is a special kind of adjective. Consider the sentence below: (42) Tom bought [too many books to carry them all home]. Rothstein holds that the "many" heads a QP which is in all other respects a noun phrase. In doing this she is envisioning a structure like the following, which is taken from (Rothstein, 1991): (43) [Parse tree here] By collapsing the distinction between quantifiers and adjectives, Rothstein is able to claim that the structure in (43) suggests that sentences like (44) below should be grammatical. (44) *I met too stubborn children to help them. She claims that because (44) is ungrammatical, "many" in (43) can not be analyzed as an adjective. In addition, she believes the data in (45) shows that "many" and "one" occur in the same structural position. (45) a. many books b. two books c. one book d. a book e. the book f. *book g. *green book She says that since "one" satisfies the requirement that singular nouns have a determiner, and therefore must be a determiner, that "many" must also be a determiner. I argue that Rothstein is correct in her analysis that words like "many" are not like adjectives, but I am unconvinced that they are determiners and especially that it is unreasonable to consider them "special" adjectives. If they were "special" it is reasonable to predict that (42) is grammatical while (44) is not. Her argument based on data in (45) is particularly unconvincing because "many" requires a plural noun phrase and plural noun phrases do not need determiners. Therefore it is possible for the data in (46a-c) to be grammatical and for the words "many", "few", "several", etc., to be something other than determiners: (46) a. many books b. few books c. several books d. *many book e. *few book f. *several book Thus, I am accepting Abney's analysis of these kinds of words as quantifiers and implementing them as a special kind of adjective that can be a complement to the DEG phrase. Consequently, my DEG phrase has three possible complements: adjective phrases, adverb phrases, and quantifier phrases. Each of these phrases is a full maximal projection which can have prepositional phrases as complements. [28] In allowing the DEG word to subcategorize for a particular complement, I am accepting Rothstein's analysis for my implementation of degree phrases. A major deviation of my implementation from her work comes from the difficulty in representing a structure whose character is not determined until the complement is parsed. Part of this problem stems from the fact that in suggesting this behavior for her minor heads, Rothstein must violate one of the most basic tenets of X' Theory: that a head always determines the category of its projection. She does this to maintain "forward" subcategorization between the head and the complement, rather than allow "backwards" subcategorization (cf. footnote 12). For reasons discussed previously relating to the requirements of prediction, I cannot allow backwards subcategorization and therefore must agree with Rothstein's analysis that a degree head chooses its complement which in turn determines the character of the phrase. This does not necessitate a completely "forward" subcategorization because the complement that is chosen is often dependent on the structural position of the degree phrase. Since I know what that position is, it is possible to only allow particular complements to occur in particular places (e.g., ADVP's do not occur as noun modifiers). In this case, subcategorization is not completely dependent on the head of the DEG phrase and so it does not have the same functional importance as the DET and INFL functional heads. The appeal of the DEGP analysis is that it provides a nice way to capture the similarities between the adjective, adverb, and quantifier phrases. Rothstein suggests that depending on the complement of the DEG phrase, it is called either an ADVP, ADJP or QP. This is a problem for the implementation because the complement is analyzed after PUSHing for a particular maximal projection, and therefore it is not possible to allow the character of the maximal projection to be determined after analyzing the complement. This means the structure of my degree phrase appears more like that of Abney's, in that it is always labeled "DEGP," rather than AP or ADVP as Rothstein would prefer. The character of the complement is explicit in the structure produced so that no explanatory power is lost. As part of my implementation of DEGP's and my decision to categorize quantifiers as special kinds of adjectives, I allow them to occur in the specifier position of the DEGP. This accounts for data like the following: (47) a. I have [SPEC-DEGP much] [DEG too] [COMP-DEGP much] work to do. b. A [SPEC-DEGP few] [DEG too] [COMP-DEGP many] men attended the dance. c. A [SPEC-DEGP few] [DEG 0] [COMP-DEGP 0] men attended the dance. d. [SPEC-DEGP Several] [DEG 0] [COMP-DEGP 0] men attended the dance. These data could also be accounted for if the quantifier were in the specifier of NP position. This would also allow a simplified structure for (47c-d) because there would not be an empty headed degree phrase: (48) a. A [SPEC-NP few men attended the dance. b. [SPEC-NP several] men attended the dance. Because of this, I have also allowed quantifiers to occur in the specifier of NP position. This eliminates unnecessary computation and solves Abney's problem of having an empty NP specifier (cf. (Abney, 1987, p. 341). I will discuss NP's in more detail in the following section. One final type of phrase that Abney singles out is the "mensural phrase". These phrases have a cardinal or ordinal determiner and a mensural noun [29] as their head. Examples are: (49) a. six weeks b. ten times c. a dozen These phrases are closely related to the head of the DEG phrase when it is lexicalized, as in (50) a. ten times as quickly b. six inches too long c. a dozen fewer books Because these DP's have such a specific structure and because they closely modify the degree word, I have implemented mensural phrases as "MP's" in a separate network. They are allowed to occur in the specifier position of the DEG phrase, meaning that if the DEG is not lexicalized, the DEGP has an empty head. As was observed previously, this is not particularly difficult because the DEG head is often empty in the case of simple adjectives and adverbs. Thus, the overall structure that I have implemented as a DEG phrase (DEGP) is like: (51) [Parse tree here] This single structure accounts for adjective, adverb, and quantifier phrases including all of their degree modification. This implementation facilitates modifying the grammar because the kinds of phrases that specify quality, quantity, and description are unified into one structure. Degree Phrases can occur as adjuncts to N' in DP's, as the specifier of PP's, and as adjuncts in VP's. 3.3.4 Adjective Phrases In addition to this structural account of adjective phrases as complements of degree phrases, I have implemented a preference for adjective ordering. In this way I attempt to describe the scope particular adjectives have over others and explain why in the data below (52a) seems "better formed" than any of (52b-f). (52) a. rich white American man b. ??white rich American man c. ??American rich white man d. ??rich American white man e. ??white American rich man f. ??American white rich man Based on the work of Quirk and Bache, I have distinguished three types of adjectives (Quirk, 1985), (Bache, 1978) which occur in a particular order. I will discuss the specifics of this implementation in Chapter 4, as the distinctions are encoded as part of the lexicon entry of the adjective. With regard to implementation as Degree Phrases, each adjective is part of its own degree phrase so that the possibility of data like the following can be accounted for: (53) The [DEGP six feet too [ADJ long]], [DEGP five feet too [ADJ wide] table. [30] I have implemented this ordering by assigning a number to each type of adjective. When an adjective degree phrase is encountered, if its complement adjective is not of the same number or larger than the number of any previous adjective degree phrases, the adjective sequence will be considered ungrammatical and the sentence will be not parse. 3.3.5 Noun Phrases The implementation of the Determiner Phrase and Degree phrase I have described above accounts for many structures normally thought of as part of noun phrases. This is a side effect of this version of X' theory, which considers the noun phrase to be a complement of the determiner phrase (cf. structure (35)). Nevertheless, the noun phrase is still a full maximal projection which has a specifier and complement of its own. As expected, the head of the noun phrase is a noun, and this head can have either prepositional phrases or relative clauses as complements. Restrictive relative clauses can also serve as adjuncts to N, as will be discussed in the section on relative clauses below (cf. 3.3.5.2). As I mentioned in the discussion of degree phrases, the specifier position of the NP can hold a quantifier, but is more often an empty position. The noun phrase fits into the overall structure of the determiner phrase like in the tree below: (54) [Parse tree here] In the following sections I will discuss the parts of this structure in more detail. 3.3.5.1 Prepositional Phrases Prepositional phrases have a straightforward implementation as shown in the structure below: (55) [Parse tree here] The specifier position holds a Quantifier Phrase, which as discussed previously, is implemented as a Degree Phrase with a quantifier complement. The head is a preposition and the complement a Determiner phrase, which itself could contain prepositional phrases. 3.3.5.2 Relative Clauses Relative clause implementation is a little more tricky. Relative clauses are CP's (i.e., a sentence in the X' notation) that are either introduced by "that" or a wh-pronoun, which is the head of the CP, or not introduced, in which case the CP has an empty head. The tricky part is that the CP is missing a determiner phrase, or other phrase (e.g., prepositional phrase), usually either in subject or object position. This missing phrase is the one containing the noun that the relative clause is modifying. My implementation captures this relationship between the relative clause and the moved phrase by putting a copy of the moved noun head back into its original position in the relative clause. This copied noun serves as a kind of "trace" in the noun's original position and maintains number and reference in the relative clause. [31] Agreement occurs between this trace noun just as it would with a normal noun. The schematic tree structure in (54) shows that, following the analysis given in (Radford, 1988), relative clauses can appear in two places in the noun phrase. This is to account for the differences seen in the following noun phrases, which are taken from (Radford, 1988, p. 218): (56) a. the claim [CP [COMP that] you made a mistake] b. *the claim [CP [COMP which] you made a mistake] c. *the claim [CP [COMP 0] you made a mistake] d. the claim [CP [COMP that] you made] e. the claim [CP [COMP which] you made] f. the claim [CP [COMP 0] you made] In this example, the NP's in (56a-c) are "Noun Complement Clauses" which occur as complements to the noun head. They require the complementizer "that" to introduce them and can be introduced by no other relative pronoun. Conversely, it is evident in (56d-f) that these noun phrases are grammatical regardless of what, or if any, relative pronoun introduces them. These relative clauses are called "Restrictive Relative Clauses" and serve only to give extra information about the noun. They are therefore in adjunct position in the overall noun phrase structure. Thus, a noun phrase with a noun complement relative clause like that in (56a) will have the structure in (57a) in my implementation: (56a) The claim that you made a mistake. (57a) [Parse tree here] For a noun phrase with a restrictive relative clause like that in (56d), my grammar will produce the structure in (57b): (57b) [Parse tree here] In practice, the grammar will produce both structures for all relative clauses having the "that" complementizer and other factors like semantics must be applied to choose the correct interpretation. For structures with relative pronouns in them, the grammar will only produce structures like that in (57b). The most important thing to note is that to analyze the relative clauses, this grammar uses the same structure and implementation of a main clause CP like I discussed in Section 3.3.1. Therefore anything that occurs at the level of the main clause, like for example Degree Phrases, can also be accounted for in relative clauses. The only difference is that when the parsing reaches the place in the sentence where the noun is missing, it puts in the noun copy and continues through the parse. This allows for a great economy of structure and accounts for all possible variation within the constituents of relative clauses. 3.3.6 Verb Phrases and Complementation Under the current X' theory interpretation of sentence structure, VP is the complement of the INFL head. INFL dictates the main verb's person, number, and tense but the main verb is still the head of its own maximal projection. The relationship between INFL and the verb phrase is shown below: (58) [Parse tree here] While syntacticians are not convinced about what structure actually occurs in the specifier position of the VP, based on the data given below, I have implemented an optional ADVP (i.e., DEGP with ADVP complement) in this position. (59) a. John [DEGP [DEG 0] [ADVP quickly]] ran down the street. b. Jane was [DEGP [DEG so] [ADVP completely]] exhausted that he could barely walk. Adverb degree phrases have also been implemented as adjuncts on V', as have prepositional phrases, finite clauses, and particle words such as "up", or "away". [32] These are licensed to occur with particular kinds of intransitive verbs, since because they are adjuncts, the verb does not subcategorize for them to be in argument positions. Recall that the head of the determiner phrase selects for an NP complement and that the head of the degree phrase selects for an adjective phrase, adverb phrase, or quantifier phrase complement. In the same way, the head of the verb phrase subcategorizes for its complement. Here, the range of possible complements is much greater and when a particular kind of complement can occur depends on the verb itself. For example the verb "believe" can be followed by a full sentence as in (60) John believes Mary is sleeping. but with the verb "take", this structure is ungrammatical: (61) *John takes Mary is sleeping. Instead, "take" needs a single noun phrase object and perhaps a prepositional phrase following it, such as: (62) John takes Mary to the store. Conversely, the verb "believe" can not have this structure, but can also have a single noun phrase object such as: (63) John believes Mary. My implementation accounts for the different kinds of verbs and the different complements they can take with codes in the dictionary entry for each verb. The codes are based on the verb pattern codes in the Oxford Advanced Learners Dictionary (OALD) (Cowie, 1989). The details of these codes are explained in Chapter 4, but here I will discuss the types of complements they can specify. There are six types of verb complements which occur in various combinations according to the number of arguments a verb subcategorizes for. These are the adjective phrase (DEGP with AP complement), determiner phrase (DP), prepositional phrase (PP), a sentential phrase (CP), small clause (SC), and exceptional clause (EC). Adjective phrases as complements occur primarily with linking, or copular, verbs such as (64) a. John is intelligent. b. The sky became dark. Determiner phrases may also occur with copular verbs like in (65a), but are most common as direct objects such as (65b-c): (65) a. John is a farmer. b. The dog eats his food. c. The man hit the ball. Prepositional phrases are usually adjuncts to verb phrase as in (66a), but they can also occur as objects, as in (66b): (66) a. The man was crying in the living room. b. The meeting lasted for two hours. In (66b) the prepositional phrase is a complement rather than an adjunct because the sentence "The meeting lasted" is ungrammatical without it (cf. "The man was crying"). It is evident that the verb "lasted" requires a complement because of sentences like "The meeting lasted a week." Sentential complements such as that in (60) take an entire CP as their complement even though there is no complementizer introducing the embedded clause "Mary is sleeping" in (60). Other examples of this complement are sentences like (67) a. Jane thought [that Mary would take care of her]. [33] b. The man hoped the train would come on schedule. c. The man hoped that the train would come on schedule. The implementation of this is simply to allow a CP to be PUSHed for in the complement position. Because it is a full CP, all of the structures possible in the main clause (i.e., degree phrases, relative clauses) are also possible in the complement clause. Small and Exceptional Clauses, only appear in complement positions and have therefore not been mentioned previously. They lack elements that are part of ordinary CP's: for example a small clause does not have tense because it does not have an INFL node and therefore can not independently constitute a sentence. Small Clauses (SC) also do not have a Complementizer node, meaning that they can not be introduced with words like "that" and can not serve as relative clauses because there is no structural position for the relative pronoun. Instead they are of the form [DP XP] where XP is any of the other phrasal possibilities (i.e., DP, DEGP, VP, and PP). Examples of Small Clause complements, taken from (Radford, 1988, p. 324), are given below: (68) a. I believe [the President incapable of deception.] (DP DEGP) b. I consider [John extremely intelligent.] (DP DEGP) c. They want [Zola off the team.] (DP PP) d. Could you let [the cat into the house.] (DP PP) e. Most people find [Syntax a real drag.] (DP DP) f. Why not let [everyone go home.] (DP VP) There is sufficient evidence in the syntax literature showing that these structures are in fact clauses rather than a sequence of different complements (cf (Radford, 1988, p. 324-331) and references there). I will not go into this here except to stress that there is a difference between Small clause structures and structures with multiple objects. This becomes clear with verbs that allow single complements versus those that allow more than one. The Small clause is a single constituent and therefore can account for one role in the sentence (i.e., object, direct object, location, etc.). If a verb allows more than one complement to account for different roles, as in a verb that takes both a direct and an indirect object, the small clause could only fill one of these roles. I have implemented Small Clauses in a separate network of the form [DP XP] where the XP can be a DEGP, a DP, a PP, or one of 3 kinds of VP's: gerundive V-ing forms, participial V-en forms, or infinitival V-0 forms. The network has the structure: (69) [ATN diagram here] It is possible for the subject DP to be either overt or covert, in which case I will fill this position with a "TRACE" marker, indicating that it is lexicalized elsewhere in the sentence. [34] Exceptional Clauses also differ from ordinary CP's, as they lack the Complementizer position. They do have an INFL node, but it must always contain "to" and therefore requires the verb to have an infinitival head. Consequently, their basic structure is of the form: (70) [Parse tree here] The verbs which normally take EC's as complements are usually "cognitive" verbs, such as those shown in (Radford, 1988, p. 317): (71) a. I believe [the President to be right.] b. I've never known [the Prime Minister to lie.] c. They reported [the patient to be in great pain]. d. I consider [my students to be conscientious.] Exceptional Clauses have been implemented as a separate network of the form [DP to VP] where the VP is infinitival and the DP can either be overt or the covert DP called "PRO". [35] The network has the form: (72) [ATN diagram here] As I mentioned previously, wh