A SYNTACTIC PREDICTOR TO ENHANCE COMMUNICATION FOR DISABLED USERS Julie A. Van Dyke Department of Computer and Information Sciences University of Delaware Newark, Delaware 19716 Technical Report 92-03 August 1991 (c) Julie A. Van Dyke All Rights Reserved ABSTRACT Disorders such as Cerebral Palsy and Lou Gehrig's disease produce severe physical disabilities that leave their victims unable to communicate in typical ways. In order to overcome this barrier, rehabilitation engineers have developed communication aids which make use of electronic technology to shift the burden of communication away from the user. Some strategies that have been employed include prediction techniques that use statistics to predict the user's next keystrokes. Unfortunately, the statistical data used in these systems is often biased or incomplete and consequently, these systems have had only limited success. This document describes a solution which combines natural language processing techniques and linguistic theory to produce a prediction system that, unlike previous systems, models our natural rules of syntax. This allows the syntactic predictor to make rule-based, linguistic determinations about what words can follow those already processed. It can be used with other devices to reduce the effort required of the user by predicting what word forms he or she is likely to type next. Because this system models human linguistic knowledge, it provides a more natural solution to the communication problem than do many other systems currently available to disabled users. CONTENTS 1 INTRODUCTION: Augmentative Communication 1 1.1 The User 1 1.2 Presently Available Devices 1 1.2.1 Statistic-based 2 1.2.2 Non-stochastic 6 1.3 A Linguistic-based Solution 8 2 THE PREDICTOR 10 2.1 Natural Language Processing Strategies 10 2.2 Syntax Rules and the ATN Formalism 13 2.3 The Prediction Problem 17 2.4 Solving the Problem 18 2.5 Implementation Details 19 3 THE GRAMMAR 21 3.1 X' syntax 21 3.2 Implementing X' Theory 29 3.2.1 Sentence level 31 3.2.2 Determiner Phrases 43 3.2.2.1 Pronouns 46 3.2.3 Degree Phrases 47 3.2.4 Adjective Phrases 52 3.2.5 Noun Phrases 53 3.2.5.1 Prepositional Phrases 54 3.2.5.2 Relative Clauses 55 3.2.6 Verb Phrases and Complementation 59 3.3 Example Parse Trees 66 4 LINGUISTIC IMPLICATIONS 67 4.1 Government and Binding Syntax 67 4.2 Human Language Parsing 73 5 OTHER APPLICATIONS 75 6 FUTURE WORK 78 7 CONCLUSION 80 CITED BIBLIOGRAPHY 82 REFERENCE BIBLIOGRAPHY 87 FIGURES Figure 2.1 Search Space for a Context Free Grammar 12 Figure 3.1 X' Theory grammar 32 Figure 3.1 X' Theory grammar (continued) 33 Figure 3.1 X' Theory grammar (continued) 34 Figure 3.1 X' Theory grammar (continued) 35 Figure 3.2 LUNAR grammar 36 Figure 3.2 LUNAR grammar (continued) 37 Figure 3.2 LUNAR grammar (continued) 38 1 INTRODUCTION: Augmentative Communication 1.1 The User The typical user for the system developed here is cognitively intact and therefore has the mental capability and desire to use language the same way a non-disabled individual would. The user's disability affects his or her motor capability and muscular control in a way that produces limited dexterity. These users are typically non-speaking, and have difficulty typing, writing, or controlling a joy-stick to select letters. In the worst case the user is limited to using a single-switch interface which makes communication very slow. There are two types of disorders that typically produce this condition: developmental, such as Cerebral Palsy, and degenerative, like Lou Gehrig's disease. Some stroke victims could also benefit from this technology; however, often they will have more severe linguistic impairments which make using this system inappropriate. The particular ailment is not important, however, because if the user can be characterized as linguistically, or cognitively, intact but with deficient motor skills, the system I have developed has the potential to facilitate their communication. 1.2 Presently Available Devices The kinds of communication devices that I am describing here are known as Alternative or Augmentative Communication (AAC) devices. Electronic AAC devices try to exploit whatever motor capability the user might have and direct it toward composing messages on a computational machine. Because motor capability is frequently limited, the interfaces of these systems are often single switches. The switch is used to access letters on a one-to-one basis as the user composes a message. The machine is then either programed or hard-wired with various strategies to help the user as he or she composes. Clearly an important issue in developing these devices is the speed with which the user can compose a message. One of the first single-switch devices to be developed was the Tufts Interactive Communicator (TIC) (Foulds, 1976). The average communication rate measured with this device was around 2-10 words per minute (Foulds, 1980). Compare this to non-disabled typing speeds of 60-70 words per minute or to speaking speeds which are easily twice that, and the extent of the communication deficiency for these users is clear. 1.2.1 Statistic-based The kinds of strategies these AAC machines are equipped with try to increase the user's communication rate with these devices. For scanning systems like the TIC, this has principally meant devising variations in the order that letter-characters are offered to the user. Knowledge about the frequency of letter usage is used to rearrange the letters on the TIC display so that the more frequent ones are scanned before the less popular ones. For instance, in a row-column scan, the most frequent letters of the alphabet such as "S" or "T" could be placed in the upper left-hand corner of the display. This way the scan will cross these letters first and thereby avoid scanning through unlikely choices like "V" or "Q" most of the time. This technique was able to produce a 30% improvement over alphabetically-ordered letter displays (Foulds, 1976). [1] Another improvement on this scanning technique is a type of letter prediction which uses n-gram statistics that tell what n number of letters are likely to occur together, such as "str" or "ing". With this system, if the user has already indicated an "s" the system refers to its statistics to identify the six letters that are likely to follow the "s", like for instance "t", "p", "r", "e", "i", or "h". These are immediately highlighted in sequence so that the user has the opportunity to select them before the normal row-column scanning process is resumed. By anticipating letter selection in this way, the communication rate has been improved by up to 50% (Foulds, 1976). This scanning technique can also work at the word level, where the ordering improvements are based on what words are the most frequent. In contrast to TIC, which uses hard-wired letter grids, Meta4 is a software-based communication device that uses static word pages containing the most common words (Miller, 1990). Instead of having to spell out each word, the user navigates through the pages using the single switch. The system's first page might contain letter intervals such as "AA-AL" and "AL-AZ" and the scanning passes through these intervals until the user chooses the one containing the word he wants to use. Then the display changes to show a page containing vocabulary words that he or she can choose from using the same scanning technique. The words included on these pages are a vocabulary set, called a "book", that can be tailored for each individual user. It is possible for users to have several books to choose from and in this way a large amount of vocabulary can be made available in a way that does not require the user to spell out every word letter by letter. Meta4 is a dynamic communication device because the display changes as the user composes his or her message. Word prediction systems are even more dynamic, as they actually attempt to guess the next word based on what has already been entered. The PAL system developed at the University of Dundee (Swiffin, et al. 1987) is typical of these prediction systems in that it uses frequency statistics to make its predictions. When the user types a letter, the system displays the five most frequent words beginning with that letter in a special scanning window. The user can choose one of these words or type another letter. With each keystroke the frequency statistics are checked and possible completions for the word are offered to the user in the scanning window. In this way the system attempts to predict what the word is before the user has typed it out entirely. The number of keystrokes required of the user is reduced because the system completes the word as soon as the user indicates that the right one has been found. Those who developed PAL claim that they have been able to obtain a rate reduction of 50% based on a dictionary of 1000 words, wherein each word has its own frequency data. It is important to point out, however, that 1000 words is a very small dictionary for a communication system meant to be used for everyday communication. Because of the way PAL uses syntax, dictionaries of a much larger size are likely to severely degrade the performance because it will take longer to calculate the probability of each word using the word-pair statistics and consequently it will take longer to determine the five most probable words. This raises a problem that is common to all these systems I have been discussing: they depend on statistics, rather than the rule-based linguistic information that humans actually use when they communicate. This makes the system only as effective as the statistics are accurate and complete. As with the PAL frequency counts, statistics are typically collected over a large texts often derived from newspapers and published reading materials. This means they are liable to be skewed by the subject matter of the text. For example, the Brown Corpus of American English, which is a text of approximately one million words and one that is often used for deriving statistics for these systems, represents words like "eggs" and "bunny" and "Easter" as being common words in everyday language use. This is a result of the time of year that the corpus was compiled, not of actual facts about English usage. This problem can be solved to some extent by using statistics derived from the user's own language use; however, the same problem can occur with these texts because a user does not always talk about the same topics and so the word statistics could change depending on his topic of conversation. In school frequently used words might be "homework" or "teacher" but when a child is playing these words will be the least likely words that he will use. A problem with statistically-based systems also arises when novel words are used. The system has no statistics for these words and so despite the statistical information in his or her AAC device, the user will still have to completely spell out words. 1.2.2 Non-stochastic Abbreviation systems endeavor to improve communication rates without statistics by allowing the user to abbreviate his or her words rather than spelling them out completely. The user indicates a few letters with the scanning device and the system assumes responsibility for expanding the abbreviations into full words. A major problem with these systems, however, is that the user is required to memorize specific abbreviations for words. This arises because a computational system can only handle a one-to-one correspondence between a word and its abbreviation. Thus, the system may require that the word "work" be abbreviated "wrk" in order to differentiate it from the word "wake", which might be abbreviated "wk". But for the user, "wk" seems like an abbreviation that will work equal well for both words and so he or she may not think to use "wrk" instead of the more easily constructed "wk". This setup means the user must undergo specialized training to learn the system's abbreviations before he or she can begin communicating. In addition, these abbreviation systems and scanning devices assume that the user knows how to spell the word he is trying to use. A communication device called Minspeak (Baker, 1985) was an attempt to alleviate this problem as well as those associated with memorizing pre-determined abbreviations. This system used a keyboard of multi-meaning icons together with keys for morphological and rudimentary syntactic information to create sentences. For example the user might use the key sequence [boy-image] + [noun-key] + [smiley face-image] + [verb-key] + [book-image] + [building-image] + [noun-key] + [declarative-sentence key] to compose the sentence "Boy like school." Once the sentence is composed, the user will press a "speak" key and the computer will speak the phrase the user created. This use of images allows the abbreviations to be semantically meaningful to the user and presumably easier for him or her to remember. Minspeak has proven useful for many members of the disabled community, but it has also been problematic for some because it still requires the user to understand and/or memorize the associations between the images and the English words. A different approach to improving abbreviation systems has been attempted with flexible abbreviation systems such as the "Word Compansion" project described in (Demasco et al., 1989) and in (Stum et al., 1991). This system attempts to automate the methods humans use for creating abbreviations so that the computer can associate more than one abbreviation with a particular word. This means the computer will be able to handle "wk" as an abbreviation for any of "work", "wake", "walk", "wok", etc. The user can be freer with his abbreviations and the system's success does not rely on how well the user remembers the abbreviation the computer knows for the word he or she desires. Thus, word compansion shifts the burden for abbreviation expansion away from the user and makes it a computational problem. In order to expand the abbreviations, the system assumes the letters in the abbreviation are in the same order that they occur in the word. This makes expansion similar to a matching task: it assumes variables between the known letters and tries to match this form to the more than 5000 words in the dictionary the system currently uses. The problem with this is that there may be a large number of matches for any given abbreviation and so the user could still have to expend a considerable amount of time and effort to find the desired word among many possibilities. 1.3 A Linguistic-based Solution I have discussed some ways communication devices have been improved using statistics and prediction techniques; however, it has been shown that all of these strategies have their limitations. One obvious solution which has not been fully developed in the field of Augmentative Communication is to exploit linguistic knowledge. A priori this seems to be the best solution because it uses exactly the knowledge the user draws from when he constructs sentences. I have developed a prediction technique that exploits a grammar of English to make its predictions: it is driven by the generalities of language rather than artifacts of the data a system's statistics were taken from. A system like the one discussed here could easily be brought to bear on the abbreviation expansion problem: the number of abbreviation expansions can be greatly reduced by considering the syntactic categories of the expansions in relation to the syntactic structure of the words the system has already processed. For example, if the user has entered the partial sentence "The boys" and the next word abbreviation is "ht", instead of offering the user a long list like "hit, hits, hot, hat, hate, hates, height, hunt, hunts, hurts, hurt, hut," the user will only be offered the plural verbs in this list because it will know that nouns and adjectives are not appropriate once the head noun of a noun phrase has been identified. [2] In this case, the user will only need to choose from the syntactically appropriate words in the list: "hit, hate, hunt, hurt." Notice that this also increases the user's communication rate because he or she has fewer words to scan through before finding the desired word. In addition to its usefulness with flexible abbreviation expansion, the system I have developed is a prediction system that could be used to improve other communication devices by determining the syntactic form of the word that is likely to follow what the user has already entered. For example in a dynamic system like Meta4, if the user has already entered a noun, and the user chooses the interval ST-SZ, the system could go directly to a page containing only verbs that begin with those letters. Thus, by modeling syntactic knowledge in the computer, I can produce a system that can improve existing communication devices. The improvement provided is a more natural one for the user because it comes from the information humans use anyway when they communicate. It is not an ad hoc solution to the communication problems these people face, it is a solution motivated by the nature of the problem: an inability to use language in a "natural" unconstrained way. If we can make the machine use language the way a human does, then rather than being hindered by the technology the user's disabilities force him to use, both machine and human can cooperate to enhance the disabled person's communication. 2 THE PREDICTOR 2.1 Natural Language Processing Strategies Natural Language Processing (NLP) provides a mechanism for the formal representation of syntax rules. These rules are applied to sentences in a process called "parsing" which breaks the sentence down into its component parts The result is a "parse tree" that shows the syntactic categories and functional relationships between the constituents in the sentence. For example, applying NLP syntax rules to the sentence (1) The man walked the dog. gives the following parse tree, or "parse", shown in computational notation: (2) (S (NP (DET the) (N man)) (VP (V walked) (NP (DET the) (N dog)))) A noun phrase is labeled "NP", verb phrases are "VP", and each word is given an appropriate category label such as "DET", "N", or "V". This structure represents the more commonly known tree structure below: (3) [Parse tree here] To generate this parse, the computer needs to search all the possible combinations of grammar rules. This becomes complicated because grammars normally have different ways of expanding constituents, for example, an NP could be composed of a determiner-adjective-noun sequence or it could simply be a proper noun. Many combinations might be possible, so the computer must try them all until it finds the right one. The final parse ends up being a subset of the overall search space, which can be very large, as in Figure 2.1. Vertical dots are used to indicate where parts of the search space have been left out. The complete search space is infinitely deep because of recursive elements like the NP. Each time an NP occurs it can be broken into three different groups of constituents, here represented by nodes 2, 3, and 4. Since the NP expansion in rule (4) also has a NP as part of its structure, the search tree can never be completely expanded. In order to tackle a search space like this the computer might use a "top-down, depth-first" method, wherein processing starts at the top S node and goes down the tree as far as it can in a left-to-right direction. [3] When it reaches a primitive or a point where no rules apply to the input, the processing backs up and goes down another branch of the tree. Consider, for example, the processing that produced the parse in (3) from the search space in Figure 2.1: The computer first uses rule 1 to expand S into NP1. Then it tries rule 2 and finds that it needs an N. Since the first word of the sentence (1) is "The" this path fails and the processing backs up to the NP1. Next it tries rule 3 and finds that it must complete a DET and this succeeds with the word "the". Because the DET is also connected to a N path, the processor must complete both paths before rule 3 will be successful. It therefore backs up to try look for the N in the other part of rule 3. This succeeds with "man" and so rule 3 is completed and the processing returns to NP1. The next branch of the tree is that generated by rule 4. In this case, the input is the word "walked" and the computer will try this rule, fail, and processing will continue to the VP. Here again there are 3 possible rules for expanding the rest of the sentence. Taking the left-most branch gives a single verb generated by rule 6. This would work with the input "walked" and so it is taken. But now the rest of the sentence is "the dog" and the processing will continue trying rules 7 and 8 to account for that noun phrase. When parser finds that these rules fail because they include verbs in their structures, it will back up and choose not to take 6 (undoing what it has already done). It will take rule 7 instead and since this is composed of a V and an NP, this rule will succeed. Since there is no more input the processing will stop; the parse in (3) having been found. In this way the computer tries each path in the search space, beginning from the left-most one, until it completes a successful traversal through the search space. 2.2 Syntax Rules and the ATN Formalism The syntactic predictor runs on a grammar constructed as an Augmented Transition Network (ATN) (Woods, 1969), (Bates, 1978). This is a parsing formalism which represents syntax rules in the form of networks showing a transition from one state to the next. This transition is analogous to each step towards completing the rule; a phrase structure rule like "NP --> DET N" has a transition between NP and DET and one between DET an N. The transitions are depicted as arcs in a network as follows: (4) [ATN network here] The double-circle around the NP node identifies it as the start state of the network. The labels of the intermediate states show what constituents of the rule have been completed (i.e., NP/DET means an determiner has been processed already in the NP network). The final state is the one having the arc labeled "POP", which is an indication that the rule is complete. The formalism provides different kinds of transitions between parts of a phrase structure syntax rule. The most useful is the CAT arc, which checks to see if the category specified by the phrase structure rule matches that of the input. The CAT arc might have the following form, given in LISP notation: (5) (CAT DET t (setr DET *) (to NP/DET)) In this arc, "CAT" is a label telling the parser what sort of processing is necessary, in this case to check the category of the input word. The symbol "DET", for determiner, specifies the category that the phrase structure rule is looking for. The "t" is in the position where a test on the input might go. These might include a check on noun-verb agreement, the presence of a particular feature in the word's lexical entry, or any other checks that might inform the parser of an ungrammatical sentence before it has gone too deep in the search space. In this case, the act of checking the category will tell whether or not the transition can be made, so no test is necessary and a dummy test (i.e., one that is always true) in this position allows processing to continue. The "(setr DET *)" is the action that assigns the word of input, represented by *, the name of its syntactic category. The "(to NP/DET)" tells the parser where to go next, in this case to the state after the DET transition has been made. Other transitions are programmed in the same way with appropriate tests and actions. The main difference is in the first label signifying what kind of processing the parser needs to do in order for the transition to be completed. One of the most important kinds of transitions, or arcs, is the "PUSH" arc. This accounts for the recurrence of constituents like the NP in many rules. It signals the parser that it needs to temporarily leave the present rule and process the rules for expanding the NP. These are represented by separate networks, and because they can be used over and over again, the size of the grammar is small in relation to the size of the sentence structure it can account for. When the NP is completed, the transition has been completed and the parser returns to the original network to continue working on a particular phrase structure rule. Other kinds of arcs include WRD arcs, which allow a phrase structure rule to specify that a particular word be in the sentence; JUMP arcs, which allow for processing to proceed to a different state without any actions or checking being done; MEM arcs, which require the word of input to be one of a particular set of words; and POP arcs which signal that a network is complete and provide for building larger structures out of the constituents most recently processed. A special kind of arc called the VIR arc helps to account for movement in English. There are certain English sentences, such as wh-questions, in which a constituent moves from its original position in the sentence into a new position at surface structure. The object of the sentence might be moved out of object position and replaced with a wh-word, as in the sentence (6) What did John eat? The underlying structure of sentence (6) is (7) John did eat what. The ATN processes (6) by using a "hold-list" and VIR arcs to return the moved constituent to its original position. When the computer encounters the wh-word "what" it is processed as an NP and put on the hold-list. A VIR arc occurs in the grammar at the place where the constituent has moved from (i.e., in object position of sentence (7)). When a VIR arc is encountered in the grammar, instead of looking for a constituent in the string of input, the NP is taken from the hold-list to satisfy the phrase structure rules. With this mechanism, the ATN can undo transformations that have occurred to derive the surface structure it is processing. The VIR arc is used to signify the positions from which a constituent could have originated and the "hold-list" allows the parser to wait before assigning a constituent its position in the final sentence structure. This process is used whenever sentences are left with "holes" after movement has occurred, as is the case with relative clauses as well as the wh-movement explained here. 2.3 The Prediction Problem The ATN has proven very useful for problems in natural language processing. A simple parser that works like I described in the previous section is not useful for prediction, however, because it follows one parse at a time and backtracks if it reaches a dead-end. To do prediction, the system must take a partial sentence and return the features and category of the next input word. This can not be achieved by following a single parse at a time because often there are category or attachment ambiguities in sentences that can only be resolved when the entire sentence is known. For example, consider the simplified grammar network below: (8) [ATN network here] If the system only has the partial sentence "the" and the word "gold" is entered, the parser does not know whether "gold" is an adjective or a noun. The ATN parser as I have described it would choose one path (e.g., the top one) and follow it down the network as far as it can go. Consequently it may not adequately predict the category of the word that follows "gold": with network (8) it will predict a noun to be next as if the sentence were "the gold ring is beautiful." However, it is just as likely that a verb could be next as if the sentence were "the gold is in the bank." As a result of this "one-at-a-time" method of parsing, the ATN is be forced into continual back-tracking each time a word is entered. With each path change, possible predictions would be unaccounted for because the computer would only be following one path at a time. If the computer took "gold" to be an adjective, at that point in the processing it cannot predict that the next word could be a verb as well as a noun. Given the wide variety of structures available in English, this means that the prediction would be incomplete for a significant number of cases. In addition, this would make the processing much slower and therefore it would be difficult to use this system for the kind of spontaneous communication that AAC devices strive to offer. 2.4 Solving the Problem The predictor I have built solves the prediction problem by traversing the search space in Figure 2.1 in a breadth-first, rather than top-down manner. This means that it completes the first transition in each phrase structure rule before going deeper in the tree. Using the previous example, the predictor will analyze "gold" as a noun in one parse and as an "adjective" in another. When the next word is entered, it may eliminate one of these interpretations, or else continue both parses until the entire sentence has been entered. Either way, the parser is able to know at any point in the sentence what type of word could be next, because it is holding all possible structures for the words entered thus far. This means the processing is done in a non-deterministic fashion, and therefore complete predictions can be made because the computer has not committed itself to a particular parse that may turn out to be different from what the user intended. This also means that when the entire sentence has been entered, the parser may have built more than one structure for a particular sequence of words. For example, consider the sentence: (9) The man told the woman that he loved the story. The user could have meant either that the indirect object is "the woman that he loved" and the object be "the story" or that the indirect object be "the woman" and the object be "that he loved the story." The predictor will output both these structures so that they could easily be analyzed further by a semantic or pragmatic processor that can choose the correct interpretation based on the context the user has built. 2.5 Implementation Details This predictor has been implemented in SUN Common LISP. There is also an early implementation in Franz Lisp. It is intended as a component in a more complex communication system and as such, there has been little attention paid to the user interface. Presently the system is activated with the command "predict" and a partial sentence given as its argument. The system goes as far as it can with that partial sentence and then goes into a "break package" where the user can decide between two methods of proceeding. The first method allows the next word in the sentence to be entered. It incorporates that word into the partial parses already created by the system and then reenters the break package. At each point when a parse is completed, the system prints out that parse tree. These parses are not final analyses, as they can still be given additional words that will be incorporated into them. The system halts only when there is no possible way of continuing the parse given the input it already has. In this case the predictor returns "nil." The second method is where the prediction is carried out. Presently it is tailored to eliminate inappropriate words (e.g. possible abbreviation expansions) from a list entered by the user. The system returns only those words which could be next according to the grammar it runs from and the partial sentence it has already processed. Once the eliminations have been made, the break package resumes and the user is again given the two choices for proceeding until he signals that he wants to quit. [4] The grammar that the predictor uses to create and judge grammaticality is described in more detail in Section 3. Recall that part of the function of the grammar arcs is to carry out tests of particular features on the input words to determine if it is efficient to carry out a particular rule. These features are encoded in the dictionary entries for each word that the computer knows. The dictionary and the features within it are described in more detail in Section 4 on the lexicon. In order to help with adding words to the computer's dictionary there is an auxiliary package used at run-time to check each word entered against those in the dictionary. When the computer finds a word it does not know, this package allows that word to be added automatically in the dictionary. The package gives the user directions for entering the appropriate features for each word to ensure that the dictionary entry is of the form the grammar expects (cf. Section 4). 3 THE GRAMMAR The substance of the syntactic predictions comes from the grammar the predictor runs from and so if the predictions are to be complete, the system's grammar must be complete. Up to now, the biggest objection to using grammars for augmentative communication is that a sufficiently complete one is thought to be difficult to construct by hand. I have confronted this objection by making my grammar the embodiment of a linguistic theory called X' (pronounced "X-bar") theory which provides an abstract, generalized description for a multitude of structures. Its conventions make a complete grammar easy to construct and modify, while also providing a mechanism to describe the specific restrictions on what kinds of constituents can occur where. These restrictions are crucial to this project because the game of prediction is to eliminate syntactic categories that are not possible in a particular context. 3.1 X' syntax All of the three most popular syntactic theories, Government and Binding (GB), Generalized Phrase Structure Grammar (GPSG), and Lexical-functional Grammar (LFG), have adopted forms of X' theory because of its explanatory power (Sells, 1985). This power comes from invoking a purely structural description of syntax rules. Before X' Theory it was common to talk about syntax rules as sets of phrase structure rules like those below: (10) NP --> N NP --> N PP VP --> V VP --> V NP VP --> V PP Notice that these rules serve two purposes: to tell what particular constituents the phrases on the left hand side of the rule can be broken into and to give the position, or structure, of these constituents. However there are similar structures among different phrases, for example both an NP and VP can be rewritten as just an N or V, respectively. In addition, they can both be rewritten with the N or V plus another constituent to the right. X' Theory captures this similarity by claiming that the basic syntactic structure is given by the following template: (11) [Diagram here] This generalized structure captures patterns found in the internal structure of many different kinds of phrases (i.e., noun phrases, prepositional phrases, verb phrases): they all have a head constituent, complements and various other modifiers that can come either before or after the head. In the template, the head is represented by the variable X. This is the element that gives the phrase its character; for example, the head of the NP is an N, the head of a PP is a P, and the head of the VP is the V. The entire phrase is said to be a "projection" of the head; a structure built up using this template is called a "maximal projection" because the entire template structure has been used. It is also referred to as an "X-double-bar", reflecting the fact that it is the highest level of the template and includes all head modifiers. [5] I have adopted a formulation of X' theory that takes the intermediate X' level as the site where modifiers like adjectives and prepositional phrases are attached (Radford, 1988). The modifier, also called an adjunct, can be attached on either side of the X' so that it could be in either pre-head, as is the case with adjectives, or a post-head position as with prepositional phrases. The X' is recursive in that modifiers expand it into another X' level, generating the structure: [6] This means the intermediate X' level plays a crucial role in the syntactic structure. [7] The most basic construction of this intermediate level (i.e., not considering adjuncts) includes the phrasal head and its complement, which is also called its argument and which is structurally its sister. The head of the phrase subcategorizes for its complement, meaning that it requires a particular kind of complement to occur with it. For example if the phrasal head were a verb, it would subcategorize for a particular kind of object (often a noun phrase), and if the head were a preposition, it would subcategorize for a noun phrase. More about the kinds of complements heads can subcategorize will follow below, but for a complete discussion, cf. (Van Dyke, 1991b). The sister to X' is called a specifier, which has the function of expanding the X' completely into a maximal projection. There is some discussion among X' theorists about what kind of constituent can occur as a specifier. The position I am accepting for constructing this grammar is elucidated by Thomas Ernst, who treats the position as "the response of syntax to the need to give special status to some particular peripheral element: demonstratives, subjects, etc. (Ernst, 1990, p. 25.)" [8] He proposes that the specifier position is to be used to ensure that certain elements are always phrase-initial position; for example, following data borrowed from (Ernst, 1990, p. 9) shows that it is not possible to reorder any words to produce emphasis: (13) a. A fancy new car. b. A new fancy car. (14) a. The many honest men. b. *The honest many men. Thus, under Ernst's definition and contrary to a widely know Chomskian theory, specifiers do not have to be maximal projections. [9] This formulation is also described in (Quirk et. al., 1985), where it is explained that some words, such as determiners and particles, are "single tokens, complete to themselves." While it is not always the case that specifiers are non-maximal projections (I will soon show that a sentences's subject occupies a specifier position), the single tokens occur often enough for it to be computationally inefficient to require that all specifier positions be maximal projections. In my grammar, the specifier position can contain either a lexical item or a maximal projection, depending on the head of a particular specifier's projection. In the syntactic "template" I have been discussing (i.e., structure (11) above) only the head element is required. The specifier, complement, and adjuncts are all optionally present for any given instantiation of the template. The presence of the complement is determined by the head itself. [10] For example consider the case of the head being a verb, which would make the XP a VP. A transitive verb like "kill" requires a complement, as shown in the following data: (15) John killed Mary. *John killed. Conversely, an intransitive verb like "cry" does not take a complement: (16) John cries. *John cries Mary. [11] The power of the phrasal head to dictate its complement is called subcategorization. [12] This ability is common to all phrasal heads, although there are some behaviors phrasal heads may or may not exhibit that have prompted syntacticians to distinguish different types. Here I will adopt the taxonomy explicated by Susan Rothstein (Rothstein, 1991), who distinguishes three kinds of heads: lexical, functional, and minor. Lexical heads are those like verbs or prepositions. They determine the character of their maximal projection so that if the head is a V, the projection is a verb phrase (VP), or if the head is a P, the projection is a prepositional phrase (PP). These words have very specific requirements on the number of complements they must have and this number must always be satisfied for the phrase to be realized. [13] The second kind of head is called a functional head because of its functional role in a phrase. These heads determine the nature of their maximal projection just as the lexical head does, but they are not necessarily realized as a lexical item. For instance the INFL, which is considered the head of a sentence, is typically said to be realized by the tense and agreement of the main verb, and sometimes as a modal. Since the head of the sentence is called INFL, X' theory describes the canonical, top-level sentence as an IP, or "Inflection Phrase." The 3 major heads in this category are INFL, which holds inflection, DET, which holds the determiner and also determines agreement, and COMP, which holds the complementizer "that" in embedded clauses and whose specifier position holds WH-question words. I will discuss each of these in more detail in the implementation section below. The minor heads are also functional heads in the sense that they are not frequently lexicalized, but unlike the previous two head types, the minor heads do not determine the nature of their projection. This is done instead by the complement they subcategorize for depending in some cases on their position in the sentence. Typical minor heads are degree words like "too" or "as" and therefore the head is called DEG, for degree. The "degree phrase" is discussed more completely in the implementation section below. [14] The grammar I have built is based on these heads and the structures they generate via subcategorization. Since there is a finite number of instantiations for these head types, I have eliminated the need for phrase structure rules that tell exactly what constituents go where. Instead the structure of (11), reproduced below, provides the order and relationships of constituents to one another and all other information comes from the requirements (i.e., the subcategorization) of the specific words themselves. (11) [Diagram here] The grammar has both top-down and bottom-up motivation in the sense that while the template structure must be satisfied, the input itself determines how that will be done through subcategorization. This simplifies writing the grammar because now its not a question of including enough phrase structure rules, but of providing the structure and allowing the subcategorization to determine when it is appropriate. 3.2 Implementing X' Theory Recall in my discussion of the ATN formalism that I characterized a PUSH arc as the mechanism's way of handling frequently occurring constituents such as noun and prepositional phrases. The PUSH arc gives the ATN more power because it allows recursive processing (i.e., a given network can refer back to itself, as in the PS rule NP --> NP PP). It was created for its computational power; without recursion, grammars are enormous and frequently redundant. In this project I propose a linguistic motivation for the use of PUSH arcs- namely as the means for creating a projection in the X' theory template. All maximal projections are the result of 2 PUSH arcs: a PUSH arc for the X' level and a PUSH for the X'' level. Each PUSH arc adds more structure to the level below, so that the implementation of the template in (11) is done with the following networks: (17) [Network diagrams here] For every instantiation of X there is a pair of networks like these. Each time a maximal projection appears in a network, it is done by PUSHing for that XP. Within that XP, an X' constituent is PUSHed for because it is a projection of the head. Only the head is a non-PUSHed element, just as only the head is a non-maximal projection in a rule. [15] In Figures 3.1 and 3.2 on the following pages, it is clear how the structural template of X' theory, as implemented in the two networks in (21) has greatly improved the task of writing a grammar. Figure 3.1 shows the series of networks that make up the entire grammar I have implemented. Figure 3.2, reproduced from (Bates, 1983, p. 217-219), shows the networks for the LUNAR grammar, which was one of the earliest and most complete ATN grammars. The X' theory grammar has a greater coverage than the LUNAR grammar and is a more sleek implementation. This makes modifying it to cover more syntactic phenomenon (e.g., topicalization, ellipsis, etc.) easier because it is immediately clear where new pieces of grammar should be added. In the following sections I discuss the details of implementing network pairs for each of the heads. In Section 5, I will further discuss the implications and deviations of this implementation from Government and Binding Syntax, whence I borrowed this X' theory grammar framework. 3.2.1 Sentence level As I mentioned previously, GB theory posits a functional category called "INFL" for "inflection" in order to apply the X' template to the sentence level. This category holds the tense and person-number agreement information for the sentence. This information can be lexicalized in the form of a modal if the clause is finite or as a "to" in the case of non-finite (e.g., infinitival or participial) clauses. INFL can not hold a modal and a "to" at the same time because sentences must be either finite or non-finite. It is possible for INFL to be unlexicalized, as when there is just a single verb in the sentence. In this case the agreement and tense are considered to be in INFL, but lexicalized only on the main verb. The INFL head and the X' template of (11) produce an analysis for a sentence traditionally analyzed as [NP VP] with the following form: (18) [Diagram here] The head of this structure is INFL, the complement of the head is the VP and the specifier of I' is the NP. The overall structure is a maximal projection called an INFLection-phrase or IP. In this implementation, the INFL node holds a code showing tense and agreement, as X' theory stipulates, but it also holds modals and the auxiliary verbs "have" and "be", a characteristic that does not follow typical X' theory (cf. (Radford, 1988, p. 312)). [16] Thus, if a sentence includes all the auxiliary verbs (e.g., "The man could have been sleeping"), the INFL node my grammar will produce has the following form: (19) (INFL (AGR 3SGPAST) (MODAL COULD) (HAVE EN) (BE ING)) The constituent AGR is the agreement marker wherein number and person agreement is combined with the verb tense. This is also different from the standard X' theory conception of INFL as containing two binary variables: one for agreement and one for tense. I have combined these variables in order to account for lexical items whose agreement changes with their tense. Consider the sentences below: (20) The man hit the ball yesterday. [Intended past tense.] (21) *The man hit the ball today. [Intended present tense.] The problem here is that the same lexical item can be past or present, but have different agreement requirements. The sentence in (29) is ungrammatical because, in the present tense, "hit" can only agree with non-third person singular subjects. It was necessary to find a separate way to account for these two lexicalizations because it is not possible for the lexicon to have two separate entries for the same lexical item. By using these single variables, agreement could be preserved: it is possible to specify that "hit" can agree with 1SGPRESENT, 2SGPRESENT, 1PLPRESENT, 2PLPRESENT, 3PLPRESENT, and all of the PAST values (i.e., 1SGPAST, 2SGPAST, etc.). [17] The details of these lexicon entries are explained more thoroughly in Section 4. Implementing the INFL in this way means that the Verb phrase only contains a single V which is the main verb in the sentence. This is in contrast to the branching complex VP structure that would be necessary to hold "have" and "be" when they occur [18]. Eliminating these branching structures means extra PUSHes have been eliminated and this makes the processing more computationally efficient. In preserving the participial agreement requirements of "have" and "be," and also checking for these requirements in tests on the grammar rules, I have maintained the ability of these verbs to determine structure without any extra computation. The same result is produced; the branching structure is just different from what X' theory predicts. A problem with the IP structure shown in (18) arises when sentences have embedded IP's. Consider: (22) The committee may insist [that the chairman resign]. The IP structure in (18) can not accommodate the "that", which is a complementizer introducing the embedded clause. It would be incorrectly analyzed as a determiner like "the" because it introduces the entire embedded sentence. Similarly, it would not be an object of "insist" because it has no reference as an independent pronoun in this sentence. This problem is solved in X' theory with the functional head COMP, or Complementizer. The COMP is the head of the overall sentence, holding the "that" in embedded clauses or being empty for top-level sentences. [19] The IP is in the complement position in the template, making the structure of sentences, called CP's for "Complementizer Phrase", like the following: (23) [Diagram here] That the complement is the head of the sentence is supported by the fact that Complementizers influence the content of the INFL node. Any clause that has a Complementizer must have an INFL that is compatible with it in terms of being finite or non-finite. For example a non-finite complementizer like "for" cannot introduce a sentence with a finite INFL: (24) *They are anxious [for you make up your mind.] But a non-finite INFL is acceptable: (25) They are anxious [for you to make up your mind.] For the rest of this discussion I will refer to sentences as CP's headed by COMP and sentences without complementizers as IP's headed by INFL (cf. footnote 19). 3.2.2 Determiner Phrases The noun phrase can be organized around a functional head of the same type as INFL. This was demonstrated by Steven Abney when he argued for a DET head that is lexicalized by the determiner (Abney, 1987). The rationale behind this analysis stems from the fact that the determiner can subcategorize for its complement like lexical categories and therefore must have a more central role in the noun phrase than simply specifier, as shown in (8). For example in English there are some determiners that either can or can not take complements. Consider: (26) That is terrific. [Complement not required.] (27) *The is terrific. [Complement required.] (28) The boy is terrific. [Complement required.] (29) *A boys is terrific. [Particular type of complement required.] Thus, it is evident that the determiner acts just like the verb in shaping its projection. [20] An analysis of DET as the functional head of the noun phrase serves to unify the X' theory analysis of the noun phrase with that of the sentence. The structure of the DP, reminiscent of IP, is shown below. Note that a relative clause can serve as either a complement or an adjunct. This is discussed in more detail in section 3.3.5.2 of this section. (30) [Diagram here] The most useful effect of this structure for the purposes of this project is that with the specifier positions of DP and NP, there are structural positions to account for the various types of words that can occur in these positions. Without these positions they would have to be considered adjuncts before the determiner or between the determiner and noun; however, there would be no significant ordering among these words. Consider the following noun phrases (31) a. [DET a] [NP-SPEC dozen] roses b. [DP-SPEC all] [DET 0] [NP-SPEC six] men c. *[DP-SPEC six] [DET 0] [NP-SPEC many] men d. [DP-SPEC all] [DET the] [NP-SPEC six thousand] men e. [DP-SPEC all] [DET the] [NP-SPEC many] men f. [DP-SPEC many] [DET the] [NP-SPEC 0] men Furthermore, if adjectives can come between determiners and nouns in relatively free order, why can't other words such as the quantifier "many"? Recall the example given in (13-14), reproduced here as (32): (32) a. [DET A] [ADJS fancy new] car b. [DET A] [ADJS new fancy] car c. [DET The] [Q many] [ADJ honest] men d. *[DET The] [ADJ honest] [Q many] men It seems therefore, that there are certain types of words that must appear in particular positions within the noun phrase, and the extra specifier positions provided by (30) allow for these words. Following this analysis, I have implemented a structure wherein the Determiner is the head of the noun phrase. This means the noun phrase is a projection of Det and hence called a DP (i.e., determiner phrase). I will refer to what have traditionally been called noun phrases as Determiner Phrases throughout the rest of this work. 3.2.2.1 Pronouns In addition to arguing that Det is the head of the noun phrase, Abney argues for generating pronouns in the Determiner position. While this is not to say that pronouns are determiners, it is a way of accounting for the fact that like determiners, pronouns have a primarily functional status. They provide agreement features like number, person, and gender and therefore influence the form of the verb that follows them. In this grammar, pronouns have been implemented in the same position as determiners. This has the nice effect of allowing pronouns to be used with non-empty noun heads, as in: (33) We students are tired. Here, just as in regular determiner phrases, the number of the noun must agree with the determiner phrase head-- in this case, the pronoun (cf. "*We student are tired." and "*We student is tired."). This implementation also allows the possibility for determiners like "that" which do not require NP complements to stand alone in DP's: (34) That is ridiculous. Note that the specifier position of DP's contains only certain kinds of determiners like "all" which can precede the articles. The other positions in the X' theory template for DP's are filled as follows: articles and other pronouns are the only elements in the head of DP position. There is only one kind of complement for DP's: a NP. It is possible to have adjective phrases occurring before the NP (cf. next section) but these occur in adjunct position rather than complement position. [21] 3.2.3 Degree Phrases To accompany his interpretation of the DP as a projection of a functional head, Abney posits an abstract head to describe adjective phrases and adverb phrases. Abney's explanation of this head, which he calls DEG, for "degree", makes it a head of the kind that Rothstein calls functional. Rothstein herself further refined the analysis of the DEG head by calling it a "minor" functional head. While empty with simple adjectives and adverbs (i.e., phrases such as "white hair" or "run quickly"), it is lexicalized by words like "how", "this", "that", "so", "too", "as", "more", "less", "all", "most", and "least". [22] Like INFL and DET, DEG can also give inflection to its head because it is also the place where comparative -er and superlative -est are specified. With his functional head, Abney tries to capture the generalities between adjective and adverb phrases, claiming that they are the projections of the same node. The structure he posits, however, has an adjective phrase being the only kind of complement and the adverb phrase as either a subcategory of adjective phrases or in the specifier position of an otherwise empty structure. Rothstein argues against this analysis in her characterization of the degree phrase as a "minor" functional category. [23] In this implementation, I accept Abney's analysis of these kinds of words as quantifiers and classify them as a special kind of adjective that can be a complement to the DEG phrase. Consequently, my DEG phrase has three possible complements: adjective phrases, adverb phrases, and quantifier phrases. Each of these phrases is a full maximal projection which can have prepositional phrases as complements. [24] I have also accepted significant portions of Rothstein's position, as this implementation allows the DEG word to subcategorize for a particular complement. The major deviation of my implementation from her analysis comes from the difficulty in representing a structure whose character is not determined until the complement is parsed. She suggests that depending on the complement of the DEG phrase, it is called either an ADVP, ADJP or QP. As a result, the structure of my degree phrase appears more like that of Abney's, in that it is always labeled "DEGP," rather than AP or ADVP as Rothstein would prefer. The character of the complement is explicit in the structure produced; however, so no explanatory power is lost. Part of the problem her argument presents for this implementation stems from the fact that in suggesting this behavior for her minor heads, Rothstein must violate one of the most basic tenets of X' Theory and a notion crucial to the theory's usefulness in this project: that a head always determines the category of its projection. She does this to maintain "forward" subcategorization between the head and the complement, rather than allow "backwards" subcategorization (cf. footnote 10). As I discussed previously, the requirements of prediction make backwards subcategorization inappropriate for this implementation. Therefore I must agree with Rothstein's analysis that a degree head chooses its complement which in turn determines the character of the phrase. However, this does not necessitate a completely "forward" subcategorization because the complement that is chosen is often dependent on the structural position of the degree phrase. Since the parser knows what that position is for any given degree phrase, it is possible to only allow particular complements to occur in particular places (e.g., ADVP's do not occur as noun modifiers). In this case, subcategorization is not completely dependent on the head of the DEG phrase and so it does not have the same functional importance as the DET and INFL functional heads. Thus, while the DEGP analysis may not conform to the standards of a normal functional head, it is appealing because it provides a nice way to capture the similarities between the adjective, adverb, and quantifier phrases. As part of my implementation of DEGP's and my decision to categorize quantifiers as special kinds of adjectives, I allow them to occur in the specifier position of the DEGP. This accounts for data like the following: (35) a. I have [SPEC-DEGP much] [DEG too] [COMP-DEGP much] work to do. b. A [SPEC-DEGP few] [DEG too] [COMP-DEGP many] men attended the dance. c. A [SPEC-DEGP few] [DEG 0] [COMP-DEGP 0] men attended the dance. d. [SPEC-DEGP Several] [DEG 0] [COMP-DEGP 0] men attended the dance. These data could also be accounted for if the quantifier were in the specifier of NP position. This would also allow a simplified structure for (35c-d) because there would not be an empty headed degree phrase: (36) a. A [SPEC-NP few men attended the dance. b. [SPEC-NP several] men attended the dance. Because of this, I have also allowed quantifiers to occur in the specifier of NP position. This eliminates unnecessary computation and solves Abney's problem of having an empty NP specifier (cf. (Abney, 1987, p. 341). I will discuss NP's in more detail in the following section. One final type of phrase that Abney singles out is the "mensural phrase". These phrases have a cardinal or ordinal determiner and a mensural noun [25] as their head. Examples are: (37) a. six weeks b. ten times c. a dozen These phrases are closely related to the head of the DEG phrase when it is lexicalized, as in (38) a. ten times as quickly b. six inches too long c. a dozen fewer books Because these DP's have such a specific structure and because they closely modify the degree word, I have implemented mensural phrases as "MP's" in a separate network. They are allowed to occur in the specifier position of the DEG phrase, meaning that if the DEG is not lexicalized, the DEGP has an empty head. As was observed previously, this is not particularly difficult because the DEG head is often empty in the case of simple adjectives and adverbs. Thus, the overall structure that I have implemented as a DEG phrase (DEGP) is like: (39) [Diagram here] This single structure accounts for adjective, adverb, and quantifier phrases including all of their degree modification. This implementation facilitates modifying the grammar because the kinds of phrases that specify quality, quantity, and description are unified into one structure. Degree Phrases can occur as adjuncts to N' in DP's, as the specifier of PP's, and as adjuncts in VP's. 3.2.4 Adjective Phrases In addition to this structural account of adjective phrases as complements of degree phrases, I have implemented a preference for adjective ordering. In this way I attempt to describe the scope particular adjectives have over others and explain why in the data below (40a) seems "better formed" than any of (40b-f). (40) a. rich white American man b. ??white rich American man c. ??American rich white man d. ??rich American white man e. ??white American rich man f. ??American white rich man Based on the work of Quirk and Bache, I have distinguished three types of adjectives (Quirk, 1985), (Bache, 1978) which occur in a particular order. I will discuss the specifics of this implementation in Section 4, as the distinctions are encoded as part of the lexicon entry of the adjective. With regard to implementation as Degree Phrases, each adjective is part of its own degree phrase so that the possibility of data like the following can be accounted for: (41) The [DEGP six feet too [ADJ long]], [DEGP five feet too [ADJ wide] table. [26] I have implemented this ordering by assigning a number to each type of adjective. When an adjective degree phrase is encountered, if its complement adjective is not of the same number or larger than the number of any previous adjective degree phrases, the adjective sequence will be considered ungrammatical and the sentence will be not parse. 3.2.5 Noun Phrases The implementation of the Determiner Phrase and Degree phrase I have described above accounts for many structures normally thought of as part of noun phrases. This is a side effect of this version of X' theory, which considers the noun phrase to be a complement of the determiner phrase (cf. structure (30)). Nevertheless, the noun phrase is still a full maximal projection which has a specifier and complement of its own. As expected, the head of the noun phrase is a noun, and this head can have either prepositional phrases or relative clauses as complements. Restrictive relative clauses can also serve as adjuncts to N, as will be discussed in the section on relative clauses below (cf. 3.3.5.2). As I mentioned in the discussion of degree phrases, the specifier position of the NP can hold a quantifier, but is more often an empty position. The noun phrase fits into the overall structure of the determiner phrase like in the tree below: (42) [Diagram here] In the following sections I will discuss the parts of this structure in more detail. 3.2.5.1 Prepositional Phrases Prepositional phrases have a straightforward implementation as shown in the structure below: (43) [Diagram here] The specifier position holds a Quantifier Phrase, which as discussed previously, is implemented as a Degree Phrase with a quantifier complement. The head is a preposition and the complement a Determiner phrase, which itself could contain prepositional phrases. 3.2.5.2 Relative Clauses Relative clause implementation is a little more tricky. Relative clauses are CP's (i.e., a sentence in the X' notation) that are either introduced by "that" or a wh-pronoun, which is the head of the CP, or not introduced, in which case the CP has an empty head. The tricky part is that the CP is missing a determiner phrase, or other phrase (e.g., prepositional phrase), usually either in subject or object position. This missing phrase is the one containing the noun that the relative clause is modifying. My implementation captures this relationship between the relative clause and the moved phrase by putting a copy of the moved noun head back into its original position in the relative clause. This copied noun serves as a kind of "trace" in the noun's original position and maintains number and reference in the relative clause. [27] Agreement occurs between this trace noun just as it would with a normal noun. The schematic tree structure in (54) shows that, following the analysis given in (Radford, 1988), relative clauses can appear in two places in the noun phrase. This is to account for the differences seen in the following noun phrases, which are taken from (Radford, 1988, p. 218): (44) a. the claim [CP [COMP that] you made a mistake] b. *the claim [CP [COMP which] you made a mistake] c. *the claim [CP [COMP 0] you made a mistake] d. the claim [CP [COMP that] you made] e. the claim [CP [COMP which] you made] f. the claim [CP [COMP 0] you made] In this example, the NP's in (44a-c) are "Noun Complement Clauses" which occur as complements to the noun head. They require the complementizer "that" to introduce them and can be introduced by no other relative pronoun. Conversely, it is evident in (44d-f) that these noun phrases are grammatical regardless of what, or if any, relative pronoun introduces them. These relative clauses are called "Restrictive Relative Clauses" and serve only to give extra information about the noun. They are therefore in adjunct position in the overall noun phrase structure. Thus, a noun phrase with a noun complement relative clause like that in (44a) will have the structure in (45a) in my implementation: (44a) The claim that you made a mistake. (45a) [Diagram here] For a noun phrase with a restrictive relative clause like that in (56d), my grammar will produce the structure in (45b): (45b) [Diagram here] In practice, the grammar will produce both structures for all relative clauses having the "that" complementizer and other factors like semantics must be applied to choose the correct interpretation. For structures with relative pronouns in them, the grammar will only produce structures like that in (45b). The most important thing to note is that to analyze the relative clauses, this grammar uses the same structure and implementation of a main clause CP like I discussed in Section 3.3.1. Therefore anything that occurs at the level of the main clause, like for example Degree Phrases, can also be accounted for in relative clauses. The only difference is that when the parsing reaches the place in the sentence where the noun is missing, it puts in the noun copy and continues through the parse. This allows for a great economy of structure and accounts for all possible variation within the constituents of relative clauses. 3.2.6 Verb Phrases and Complementation Under the current X' theory interpretation of sentence structure, VP is the complement of the INFL head. INFL dictates the main verb's person, number, and tense but the main verb is still the head of its own maximal projection. The relationship between INFL and the verb phrase is shown below: (46) [Diagram here] While syntacticians are not convinced about what structure actually occurs in the specifier position of the VP, based on the data given below, I have implemented an optional ADVP (i.e., DEGP with ADVP complement) in this position. (47) a. John [DEGP [DEG 0] [ADVP quickly]] ran down the street. b. Jane was [DEGP [DEG so] [ADVP completely]] exhausted that he could barely walk. Adverb degree phrases have also been implemented as adjuncts on V', as have prepositional phrases, finite clauses, and particle words such as "up", or "away". [28] These are licensed to occur with particular kinds of intransitive verbs, since because they are adjuncts, the verb does not subcategorize for them to be in argument positions. Recall that the head of the determiner phrase selects for an NP complement and that the head of the degree phrase selects for an adjective phrase, adverb phrase, or quantifier phrase complement. In the same way, the head of the verb phrase subcategorizes for its complement. Here, the range of possible complements is much greater and when a particular kind of complement can occur depends on the verb itself. For example the verb "believe" can be followed by a full sentence as in (48) John believes Mary is sleeping. but with the verb "take", this structure is ungrammatical: (49) *John takes Mary is sleeping. Instead, "take" needs a single noun phrase object and perhaps a prepositional phrase following it, such as: (50) John takes Mary to the store. Conversely, the verb "believe" can not have this structure, but can also have a single noun phrase object such as: (51) John believes Mary. My implementation accounts for the different kinds of verbs and the different complements they can take with codes in the dictionary entry for each verb. The codes are based on the verb pattern codes in the Oxford Advanced Learners Dictionary (OALD) (Cowie, 1989). The details of these codes are explained in Section 4, but here I will discuss the types of complements they can specify. [29] There are six types of verb complements which occur in various combinations according to the number of arguments a verb subcategorizes for. These are the adjective phrase (DEGP with AP complement), determiner phrase (DP), prepositional phrase (PP), a sentential phrase (CP), small clause (SC), and exceptional clause (EC). Adjective phrases as complements occur primarily with linking, or copular, verbs such as (52) a. John is intelligent. b. The sky became dark. Determiner phrases may also occur with copular verbs like in (65a), but are most common as direct objects such as (65b-c): (53) a. John is a farmer. b. The dog eats his food. c. The man hit the ball. Prepositional phrases are usually adjuncts to verb phrase as in (54a), but they can also occur as objects, as in (54b): (54) a. The man was crying in the living room. b. The meeting lasted for two hours. In (54b) the prepositional phrase is a complement rather than an adjunct because the sentence "The meeting lasted" is ungrammatical without it (cf. "The man was crying"). It is evident that the verb "lasted" requires a complement because of sentences like "The meeting lasted a week." Sentential complements such as that in (48) take an entire CP as their complement even though there is no complementizer introducing the embedded clause "Mary is sleeping" in (48). Other examples of this complement are sentences like (55) a. Jane thought [that Mary would take care of her]. [30] b. The man hoped the train would come on schedule. c. The man hoped that the train would come on schedule. The implementation of this is simply to allow a CP to be PUSHed for in the complement position. Because it is a full CP, all of the structures possible in the main clause (i.e., degree phrases, relative clauses) are also possible in the complement clause. Small and Exceptional Clauses, only appear in complement positions and have therefore not been mentioned previously. They lack elements that are part of ordinary CP's: for example a small clause does not have tense because it does not have an INFL node and therefore can not independently constitute a sentence. Small Clauses (SC) also do not have a Complementizer node, meaning that they can not be introduced with words like "that" and can not serve as relative clauses because there is no structural position for the relative pronoun. Instead they are of the form [DP XP] where XP is any of the other phrasal possibilities (i.e., DP, DEGP, VP, and PP). Examples of Small Clause complements, taken from (Radford, 1988, p. 324), are given below: (56) a. I believe [the President incapable of deception.] (DP DEGP) b. I consider [John extremely intelligent.] (DP DEGP) c. They want [Zola off the team.] (DP PP) d. Could you let [the cat into the house.] (DP PP) e. Most people find [Syntax a real drag.] (DP DP) f. Why not let [everyone go home.] (DP VP) There is sufficient evidence in the syntax literature showing that these structures are in fact clauses rather than a sequence of different complements (cf (Radford, 1988, p. 324-331) and references there). I will not go into this here except to stress that there is a difference between Small clause structures and structures with multiple objects. This becomes clear with verbs that allow single complements versus those that allow more than one. The Small clause is a single constituent and therefore can account for one role in the sentence (i.e., object, direct object, location, etc.). If a verb allows more than one complement to account for different roles, as in a verb that takes both a direct and an indirect object, the small clause could only fill one of these roles. I have implemented Small Clauses in a separate network of the form [DP XP] where the XP can be a DEGP, a DP, a PP, or one of 3 kinds of VP's: gerundive V-ing forms, participial V-en forms, or infinitival V-0 forms. The network has the structure: (57) [Diagram here] It is possible for the subject DP to be either overt or covert, in which case I will fill this position with a "TRACE" marker, indicating that it is lexicalized elsewhere in the sentence. [31] Exceptional Clauses also differ from ordinary CP's, as they lack the Complementizer position. They do have an INFL node, but it must always contain "to" and therefore requires the verb to have an infinitival head. Consequently, their basic structure is of the form: (58) [Diagram here] The verbs which normally take EC's as complements are usually "cognitive" verbs, such as those shown in (Radford, 1988, p. 317): (59) a. I believe [the President to be right.] b. I've never known [the Prime Minister to lie.] c. They reported [the patient to be in great pain]. d. I consider [my students to be conscientious.] Exceptional Clauses have been implemented as a separate network of the form [DP to VP] where the VP is infinitival and the DP can either be overt or the covert DP called "PRO". [32] The network has the form: (60) [Diagram here] As I mentioned previously, when these complements occur depends on the particular verb in the sentence and the code it has in its lexicon. The lexicon is therefore crucial to determining the structure of a sentence. This is predicted in Government and Binding Theory by the Projection Principle, which states that "representations at each syntactic level are projected from the lexicon, in that they observe the subcategorization properties of lexical items. (Sells, 1985)." I will discuss this in more detail in Section 5, but would like to stress that this "lexical determinism" should in no way be considered a problem of the implementation. It does make using the grammar for alternate computational applications somewhat demanding on the computational environment because the lexical entries must be tailored as described in Section 4. This is not unexpected, because it is exactly this type of dependence that the linguistic theory predicts. [33] 3.3 Example Parse Trees A bountiful selection of parse trees showing the structures this grammar produces is available in (Van Dyke, 1991c). 4 LINGUISTIC IMPLICATIONS 4.1 Government and Binding Syntax Here I would like to characterize this implementation with respect to Government and Binding syntax, whose formulation of X' theory I have adopted in the grammar described here. I explained in Section 3 that I have made use of the functional categories INFL, COMP, DET, and DEG. The existence of these abstract categories is what distinguishes the GB interpretation of X' theory from that of other syntax theories such as Generalized Phrase Structure Grammar and Lexical-Functional Grammar. I chose to adopt these abstract, functional categories because they facilitated applying the X' structure template to all instantiations of X. This allowed subcategorization to be used for explaining not only the relationship between verbs and complements, but also that between determiners and head nouns. Using the DEG functional head enabled capturing the similarities between adjective and adverb phrases. But by far the strongest reason for adopting the DET and DEG functional heads, is because through them I was able to develop a structure to account for the various types of constituents that occur before the head determiner or between the determiner and the noun. The branching structure the functional heads provide allowed me to eliminate ordering tests on the grammar arcs; tests which would have been necessary to ensure well-formed word sequences. For example, without a structural position at the beginning of the determiner phrase (i.e., the specifier position), in order to account for the sequence "all the many men" I would have needed a looping determiner category arc. Instead I can implement a series of arcs with different category and feature requirements motivated by an overall grammatical theory. This makes the implementation something more than an ad hoc solution to the problem. It was also important to be able to apply the X' template to any position in a sentence, including the sentence level itself, because this facilitated producing a complete grammar. It was therefore possible to confront a common objection to using grammars in augmentative communication: that complete ones are difficult to construct. A complete grammar is crucial for communication devices because a user will be using the device to produce normal, everyday language. Consequently they must be able to produce all of the syntactic structures that a human language user could think of constructing. The X' template eliminates this difficulty because it gives a standard structure that underlies all syntactic structures. The problem is reduced to providing structures in the grammar and using subcategorization to eliminate those that are inappropriate for particular lexical items. Borrowing these concepts from GB, the system performs a number of functions in the way GB predicts. It accepts a surface structure and undoes the transformations that show up there so that the structure it produces is akin to the sentence's deep structure. Movement occurs from argument positions and is only allowed to land at appropriate landing sites. Landing sites can be easily determined with this formalism because the process of undoing a transformation must be explicitly invoked (i.e., a hold action is performed in a grammar arc). In this way, the grammar controls for what constituents can move and to where: if a constituent is encountered that is not in an appropriate landing site, then the parser will be unable to complete a parse for that sentence. In this way, Government and Binding theory's constraints on NP and WH-movement are obeyed, even though they are not overtly implemented as such (i.e., there is no instance in the grammar or processing when I invoke some procedure called "NP-movement.") [34] But even with these GB movement characteristics, a GB motivated X' structure, and adherence to subcategorization in the way that GB's over-arching Projection Principle suggests, the structures this grammar produces are often not those that GB theory would predict. [35] For example, this grammar analyzes a relative clause as the result of a movement of a DP out of an embedded clause and into a higher position in the tree. This analysis is significantly different from the Government and Binding theory analysis, which posits that the position seen in surface structure is the position where the noun phrase originated, or was "base generated." To represent the movement analysis, the grammar restores the "moved element" to its original position in the sentence while at the same time leaving a copy of that element in the position where it was found. [36] This has the strange side-effect of generating two occurrences of the moved item in the deep structure of the sentence. This action is crucial for the prediction to be effective. Consider, for example, a sentence where the subject of the relative clause is the subject of the main sentence: (1) The man who walks the dog was late today. When the predictor has the partial sentence "The man who", if the moved noun phrase "the man" is not put back into the position following "who", the predictor will not be able to eliminate the sentence (2) *The man who walk the dog was late today. The predictor will not know what kind of inflection the verb of the relative clause must have. This problem motivates the necessity of a deep structure with the form: (3) The man [who the man walks the dog] was late today. It is necessary for both occurrences of "the man" to be in the sentence in order for that clause's subject-verb agreement to be checked. [37] Conversely, there are some cases of movement where the deep structure produced by my grammar is faithful to the GB analysis. Consider, for example the structure produced for control sentences like the following: (4) a. John expected Mary to wash the dishes. b. John expected to wash the dishes. The GB analysis would predict that the real surface structure of these sentences is like: (5) a. John expected [Mary to wash the dishes]. b. John expected [PRO to wash the dishes.] Here, PRO is an empty category that refers back to John. The structure that my grammar will produce is exactly that in (5a-b). The parser neither performed or undid any movement to derive the structure; however, contrary to the GB analysis that "Mary" is the object of "expected" because the NP moves in order to satisfy the Theta Criterion. [38] The grammar developed here can therefore be characterized as one that borrows significantly from GB syntax, but is not a complete representation of the Government and Binding Theory of grammar. This stems from the fact that GB is a descriptive theory of grammar. Its definitions of C-command, government, and the empty category principle are theoretical definitions used to describe relationships between words or within syntax trees. The relationships must hold for a sentence to be grammatical: they are a way of describing what has gone wrong in an ungrammatical sentence's derivation. It became clear in this project that GB is not well suited for parsing or computational implementation. As a theoretical framework, its practitioners gives little emphasis to the details of grammar structures (i.e., what gets attached where). Those structures that are analyzed in detail tend to be only the anomalous or "interesting" ones that test the limits of GB principles. The result is that there is no standard interpretation of X' theory attachments or about the interpretations of particular kinds of complements. For example, it is important for a computational implementation to know where adjectives and adverbs can be attached in the structure or what kinds of constituents can appear in the Specifier positions of all instantiations of X, but these are topics that have received little treatment from the theorists. Nevertheless, more than any other grammatical theory, Government and Binding syntax was easily adaptable to the requirements of the ATN computational formalism.The formulation of X' theory found in GB exploits the generality of the XP structural template to the fullest. [39] In addition, it allows minimal changes to a generic lexicon of English (i.e., one that includes syntactic categories and little more than perhaps number and agreement information) and this is most often all the information that computational systems have access to. In contrast, grammatical theories like Generalized Phrase Structure Grammar and Lexical-Functional grammar exploit syntactic features and complicated coding systems provided in the lexical entry of each word. Since subcategorization is the only real idiosyncrasy of a GB grammar, it is easy to integrate a GB-based grammar into other already existing systems, such as the flexible abbreviation system discussed in Section 1. 4.2 Human Language Parsing I have claimed that the system presented here offers a more "natural" solution to the communication problem facing disable users. However, my purpose in building this grammar and basing this system on a linguistic theory is not to make claims about what our natural grammar or syntax rules might look like. The goal of this grammar is to direct the syntactic predictor so that a person using it will produce only grammatical sentences. I do not suggest that the sentence parsing this grammar facilitates is a model for what goes on in the user's head, only that the two procedures are exploiting the same kinds of regularities in language. The fact that the predictor can use simple characteristics about words rather than contrived statistics is what makes this a natural parsing solution. The distinction between what a grammar tells about human language parsing and the parsing process itself has been discussed by Roger Berwick and Amy Weinberg as the "Type Transparency Hypothesis" (Berwick & Weinberg, 1983). They question to what extent a computational grammar of English can perform sentence parsing the way the theory of grammar predicts; in other words, whether or not the grammar theory is equivalent, or transparent, to the method of parsing. Assuming for this discussion that GB makes particular claims about the parsing process, with my grammar and parsing implementation, I have not preserved a transparency between these two components. Rather than explicitly implementing Government and Binding Theory notions like Government, C-command, and the Empty category principle, I have used these principles to guide the construction of the grammar. This means that while the parser does not explicitly check for the relationships these principles denote, they are implicitly at work within it because of the way the grammar has been constructed. For example, government is a relationship that describes what constituents a head can determine (or influence): it defines the scope of the head. [40] In most cases, government amounts to the sister relationship that holds between the head and its complement. Among others, the concept of subcategorization is said to occur under the relationship of government. This is exactly the case in my implementation: a verb or determiner subcategorizes for only its complement because that is the only position it governs. Government and Binding theory is an excellent guide for constructing a grammar, but I have found in this project that it is insufficient to describe the predictable qualities in language. This explains the deviations from GB-predicted structures, an example being my analysis of relative clauses, that occur in my grammar. It also explains how my grammar is licensed to produce a deep structure for control sentences like those GB predicts without adhering to the method that GB claims brings them about. These results come from a pragmatic usage of grammar theory to attack a real world problem. Willingness to reject the Transparency Hypothesis as I have done here, and as Berwick and Weinberg have argued must be done, has brought about a simple and efficient solution with potential for widespread use in natural language understanding systems. 5 OTHER APPLICATIONS In Section 1, I described this project in relation to its application in Augmentative Communication. I described its usefulness for improving a flexible abbreviation system and as a syntax module for prediction systems in general. Other uses within Augmentative Communication can be found because the system does not prohibit using statistical information in addition to the syntax it exploits. For example, statistics could be used to rank predicted categories: the next word in the partial sentence "the gold" might have a higher probability of being a noun than a verb. Significantly, this work also has application outside the field of Augmentative Communication. A speech recognition system addressing issues similar to those I have described here has been developed at Carnegie-Mellon University (CMU) (Hauptmann, et al., 1988)). The ANGEL speech recognition system shares my goal of applying linguistic knowledge to solving problems in language processing: in this case analyzing speech input so that speech can control a machine's actions. The problem the CMU researchers must overcome is that analyzing speech input is a computationally difficult and costly task. Initial solutions are reminiscent of the flexible abbreviation expansion I have discussed previously. For example, the CMU's ANGEL speech recognition system tries to solve this task by generating several hundred word candidates for every word actually spoken. Researchers are currently working to efficiently reduce the number of these possibilities by applying linguistic constraints as early as possible. To that end they have developed the MINDS system, a Multi-modal, INteractive Dialog System (Young, et al., 1989), (Hauptmann, et al., 1988). MINDS tries to use knowledge gained from studies of discourse, especially notions of focus, user goals, and dialog structure to reduce the computer's search space for determining what speech patterns could mean. The MINDS system uses discourse where my project uses syntax, but both systems attempt to predict what the user will talk about next. My project uses prediction to reduce the searching required by a disabled user when trying to identify the word he or she wants to use. Using prediction in this way increases the communication rate the user can achieve while communicating with his or her AAC device. Similarly, the MINDS system is able to improve speech recognition by reducing the searching required by the machine to identify the word it has "heard". This allows the machine's speech processing rate to increase. In addition, the MINDS project comes from a background similar to the one found in Augmentative Communication. Until MINDS, speech recognition was done with statistics of word frequencies and collocations. These were based on sequences of two or three words, called Markov models. These same Markov models were used in previous AAC prediction models and the speech recognition systems suffered from the same problems found in those systems: the two and three word transition tables give limited success because their look-ahead is too small and so they erroneously eliminate interpretations that turn out to be correct. Also, they are dependent on word frequencies gathered from relatively small amounts of data and so they may not be accurate. MINDS runs primarily on semantics, or concepts, that its discourse-tracking capability identifies. It combines these concepts with "a set of syntactic networks" to derive possible sentence structures for the concepts. This means the only syntax done is to determine the lexical realizations of the concepts; the syntax is not comparable to that of natural language users. Consider, for example, that within their Navy ship knowledge base, that the frigate "Spark" has been established as being disabled. MINDS predicts the user will ask about the Spark's capabilities next. The semantic concepts for the dialog exchange are identified as follows: "shipname" is restricted to the value "Spark", and any "ship-capabilities" concepts. They then expand these concepts into the syntactic realizations of ways to refer to the Spark- they allow "the ship", "this ship", "the ship's", "it", "its", "Spark" and "Spark's". The notion of "ship-capabilities" generates the syntactic realizations of "all capabilities", "radar", "sonar", "Harpoon", "Phalanx", etc. They then combine these to generate a highly constrained search space of phrases like "Does it/Spark/this ship/ the ship have Phalanx/Harpoon/radar/sonar?" or "What capabilities/radar/sonar does the ship/this ship/it/Spark have?". This works well in their constrained environment, but in real-world, unconstrained speech recognition, this type of syntactic generation would be impossible as there could easily be an infinite number of lexicalizations. If the system could use a syntactic prediction system like the one I have outlined in conjunction with the discourse and focus information, then the recognition could be increased without depending on a restricted domain. It is not clear what role syntax plays in the MINDS system because they are mainly concerned with issues at a higher level of language processing (i.e., discourse and focus). Nevertheless, it would seem that when trying to recognize individual spoken words that the system would benefit from some syntactic prediction that could give information about the structure of the partial sentence and use this to predict the category of the next word. This would limit the search space for speech recognition in the same way it does for abbreviation expansion. Given that the motivation for the speech recognition problem is so similar to that of the project I have described here, it is likely that syntactic prediction could be successfully applied to this field of research. 6 FUTURE WORK Here I have described my work aimed at making augmentative communication devices more efficient and "usable" for the disabled user. This work has focused on how syntax can be used to eliminate the possible expansions of a creative abbreviation entered at run-time. Using a parallel parsing strategy, I have found it possible to reduce the effort required of the user because he or she is offered only the grammatically appropriate words as abbreviation expansions. A pleasant result of this is that the user is less likely to be confused by the words the computer offers as choices since they are always syntactically relevant to the situation. Other ways that the list of possibilities can be reduced and relevancy be maintained could come from applying other kinds of linguistic knowledge of the sort humans use to understand language. For example, discourse tracking is a kind of pragmatics that could be used to give the system knowledge like "since we have been talking about eating breakfast, it is probably the case that "tbl" stands for "table" and not "tablet". Semantics could also be used to reflect the fact that if the user has used the verb "drink" then we expect the following NP to be some inanimate, consumable object rather than a person's name or things like "table". This sort of information would add to the power that syntactic prediction gives the system and eventually the user will have an extremely small and precise set of words to choose from. More work could also be done at the syntax level, in the form of adding to the kinds of structures the grammar is able to handle. For example, currently the grammar can not handle coordination, ellipsis, or topicalization; all of which are reasonably common in spoken language. The appendix at the end of this work includes a test suite that demonstrates the coverage of the grammar as it stands at this writing. From this it is easy to see where additions to the grammar could be made. I feel that the present implementation could also be improved through a more critical analysis of the Degree phrase, especially regarding the lexicalizations of the degree head and the relationship of the head to the elements it subcategorizes for. In particular, it would be useful to re-analyze the status of the quantifier phrase and constituents that can occur in specifier position. In (Ernst, 1991), a positional interpretation of these items is given that may allow a more exact specification of word order in the pre-head noun positions of DP's. The task will be to find an explanation for these kinds of phrases that does not sacrifice capturing the generalities between them (cf. Section 3) Finally, in order for this system to be most useful, it must be implemented in conjunction with a large dictionary that includes the subcategorization codes it requires. Suggestions for carrying out this process are mentioned in (Van Dyke, 1991b), but most important will be to automate the process of assigning the subcategorization codes. I have mentioned previously that this is facilitated by working with learner's-type dictionaries like Longman's or Oxford's which exist in computerized form. I have provided the starting point for this by including references to the Brown corpus tags and explicit descriptions of the requirements for assigning a particular code to a word. On this basis, the task of generating a large dictionary for the system should prove easy to overcome. 7 CONCLUSION This thesis represents a successful application of linguistic information to the problem of augmentative communication. A syntactic predictor has been built which relies on a syntactic grammar of English to speed the communication rate possible with an AAC device. Because the system draws on the same rules for creating a sentence that the disabled user exercises as he or she forms sentences, the computer is able to intelligently anticipate the word-form the user will type next. This technology is a first step toward endowing the computer with the ability to disambiguate language in order to achieve understanding. Instrumental to the success of this system is how well the grammar it exploits has captured the generalities of the language. Through adopting a Government and Binding theory of English syntax, I have provided for a significant number of constructs, including relative clauses, yes-no and wh-questions, passives, and both matrix and embedded sentences with 39 different types of verb complements. The use of X' theory has also allowed my grammar to be uncomplicated and therefore amenable to additions. I believe that with this grammar, I have developed a strong base to which other constructions could easily be added. This makes the grammar highly applicable to many research problems, including analyses of English and modeling human language use in a machine. Thus, not only have I devised an enhancement for disabled users' communication, but I have proceeded toward a more complete computational model of language. CITED BIBLIOGRAPHY Abney, S. (1987). The English Noun Phrase in its Sentential Aspect. Ph.D. dissertation, MIT. Allen, J. (1987). Natural Language Understanding. CA: Benjamin/Commings. American Heritage Dictionary, Revised Second College Edition. (1976). Boston: Houghton Mifflin Company. Baker, B. R., & Stuart, S. (1985). Communication Mapping for Semantic Compaction Systems. Proceedings of the 8th Annual Conference on Rehabilitation Technology, Memphis, TN: RESNA, 122-124. Bache, C. (1978). The Order of Premodifying Adjectives in Present-Day English. Odense University Studies in English. vol. 3. Bates, M. (1978). The Theory and Practice of Augmented Transition Network Grammars. In L. Bloc (ed.), Natural Language Communication with Computers. New York: Springer. Berwick, R.C. (1981). Computational Complexity and Lexical Functional Grammar. Proceedings of the 19th Annual Meeting of the Association for Computational Linguistics, Stanford, CA: ACL,7-12. Berwick, R.C. & Weinberg, A. (1983). The Role of Grammars in Models of Language Use. Cognition, vol. 13, 1-61. Chomsky, N. (1986). Barriers. Cambridge, MA: MIT Press. Cowie, A.P. (1989). Oxford Advanced Learner's Dictionary of Current English, Fourth Edition. Oxford: Oxford University Press. Demasco, P.W., Lillard, M.,& McCoy, K.F. (1989). Word Compansion: Allowing Dynamic Word Abbreviations. Proceedings of the 12th Annual Conference on Rehabilitation Technology, New Orleans, LA: RESNA, 282-283. Ernst, T. (1990). A Phrase Structure Theory for Tertiaries. In S. Rothstein, ed., Perspectives on Phrase Structure: Heads and Licensing. Syntax and Semantics 26, New York: Academic Press. Francis.W. & Kucera, H. (1982). Frequency Analysis of English Usage: Lexicon and Grammar. Boston: Houghton Mifflin Company. Frazier, L. (1991). Parsing Novel Words. Presented at Cognitive Science Colloquium, May 13, 1991. University of Delaware. Foulds, R.A. (1980). Communication rates for non-speech expression as a function of manual tasks and linguistic constraints. Proceedings of the International Conference on Rehabilitation Engineering, Toronto: RESNA, 83-87. Foulds, R.A., Baletsa, G., Crochetiere, W.J., & Meyer, C. (1976). The Tufts Non-vocal Communication Program. Presented at the Conference on Medical Devices in Rehabilitation. Boston. Garside, R., Leech, G., & Sampson, G., eds. (1987). The Computational Analysis of English. London: Longman. Griffith, H.W. (1985) Guide to Symptoms, Illness, and Surgery. Tucson, AZ: Body Press. Hauptmann, A.G., Young, S.R., & Ward, W.H. (1988). Using Dialog-Level Knowledge Sources to Improve Speech Recognition. Proceedings of the 7th National Conference on Artificial Intelligence, Saint Paul, MN: AAAI, 729-733. Jackendoff, R. (1977). X Syntax. Cambridge, MA: MIT Press. Kaplan, R. & Bresnan, J. (1981) Lexical-functional Grammar: A Formal System for Grammatical Representation. In Bresnan, ed., The Mental Representation of Grammatical Relations. Cambridge, MA: MIT Press. Keulen, F. (1986): The Dutch Computer Corpus Pilot Project. M.A. Thesis, University of Nijmegen. Marcus, M.P., Santorini, B., & Magerman, D. (1990). First Steps Toward an Annotated Databaseq of American English. Department of Computer and Information Science, Technical Report MS-CIS-90-46. Philadelphia, PA, University of Pennsylvania. McCoy, K.F., Demasco, P., Jones, M., Pennington, C., & Rowe, C. (1990). A Domain Independent Semantic Parser for Compansion. Proceedings of the 13th Annual Conference on Rehabilitation Technology, Washington, D.C.: RESNA, 187-188. Miller, L.J., Demasco, P.W., & Elkins, R.A. (1990). Automatic Data Collection and Analysis in an Augmentative Communication System. Proceedings of the 13th Annual Conference on Rehabilitation Technology, Washington, D.C.: RESNA, 99-100. Quirk, R., et al. (1985). A Comprehensive Grammar of the English Language. London: Longman. Radford, A. (1988). Transformational Grammar. Cambridge: Cambridge University Press. Rothstein, S. (1991). Heads, Projections, and Category Determination. To Appear in Kathleen Leffel and Denis Buchard (eds.), Anthology of Phrase Structure Theory (Tentative title). Dordrecht: Kluwer. Sells, P. (1985). Lectures on Contemporary Syntactic Theories. CSLI Lecture Notes, no. 3. Stum, G., Demasco, P.W., McCoy, K.F. (1991). Automatic Abbreviation Generation. Forthcoming, RESNA. Swiffin, A. L., Arnott, J.L., & Newell, A.F. (1987). The use of syntax in a predictive communication aid for the physically handicapped. Proceedings of the 10th Annual Conference on Rehabilitation Technology, San Jose, CA: RESNA, 124-126. Van Dyke, J. (1991a). Word Prediction for Disabled Users: Applying Natural Language Processing to Enhance Communication. Honors BA Thesis, University of Delaware. Van Dyke, J. (1991b). Tagging Guide for the X' Theory Grammar. Technical Report. Center for Applied Science and Engineering, A.I. DuPont Institute. Van Dyke, J. (1991c). An Annotated Test Suite for the X' Theory Grammar. Technical Report. Center for Applied Science and Engineering, A.I. DuPont Institute. Wehrli, E. (1988). Parsing with a GB Grammar. In U. Reyle & C. Rohrer, eds., Natural Language Parsing and Linguistic Theories. Dordrecht: Kluwer. Woods, W.A. (1969). Augmented Transition Networks for Natural Language Analysis. Harvard Computation Laboratory Report No. CS-1, Cambridge, MA: Harvard University. Yang, G., McCoy, K., Demasco, P. (1990). Word Prediction Using a Systemic Tree Adjoining Grammar. Proceedings of the 13th Annual Conference on Rehabilitation Technology, Washington, D.C.: RESNA, 185-186. Young, S.R., Hauptmann, A.G., Ward, W.H., Smith, E.T., Werner, P. (1989). High Level Knowledge Sources in Usable Speech Recognition Systems. Communications of the ACM, vol. 32, no. 2, 183-193. Zagona, K. (1988). Verb Phrase Syntax. Dordrecht, Holland: Kluwer. REFERENCE BIBLIOGRAPHY Baumgart, D., Johnson, J., & Helmstetter, E. (1990). Augmentative and Alternative Communication Systems for Persons with Moderate and Severe Disabilities. Baltimore: Brookes. Berwick, R. & Weinberg, A. (1983). Syntactic Constraints and Efficient Parsability. Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics, Cambridge, MA: ACL, 119-122. Berwick, R. (1985). The Acquisition of Syntactic Knowledge. Cambridge, MA: MIT Press. Borer, H. (1990). V+ing: It Walks like an Adjective, It Talks like an Adjective. Linguistic Inquiry, vol. 21, no.1, 95-103. Bowers, J.S. (1981) Theory of Grammatical Relations. Ithaca: Cornell University Press. Bresnan, J.W. (1979). Theory of Complementation in English Syntax. New York: Garland Publishing. Dowty, D.R., Karttunen, L. & Zwicky, A.M. (1985). Natural Language Parsing. Cambridge: Cambridge University Press. Emonds, J.E. (1985). A Unified Theory of Syntactic Categories. Dordrecht, Holland: Foris Publications. Ernst, T. (1991). The Phrase Structure of English Negation. Unpublished manuscript, University of Delaware. Grimshaw, J. (1982). Subcategorization and Grammatical Relations. Subjects and Other Subjects: Proceedings of the Harvard Conference on the Representation of Grammatical Relations. Bloomington: IULC, 35-56. Hawkins, J. (1990). A Parsing Theory of Word Order Universals. Linguistic Inquiry, vol. 21, no. 2, 223-261. Hoekstra, T., van der Hulst, H., & Moortgat, M., eds. (1981). Lexical Grammar. Dordrecht: Foris. Hornby, A.S. (1975). Guide to Patterns and Usage in English. London: Oxford University Press. Hudson, R. (1984). Word Grammar. Oxford: Basil Blackwell. Jackendoff, R. (1990). On Larson's Treatment of the Double Object Construction. Linguistic Inquiry, vol. 21, no. 3, 427-455. Kimball, J. (1973). Seven Principles of Surface Structure Parsing in Natural Language. Cognition, vol. 2, 15-47. Lasnik, H. & Uriagereka, J. (1988). A Course in GB Syntax: Lectures on Binding and Empty Categories. Cambridge, MA: MIT Press. Li, Y. (1991). X0 Binding and Verb Incorporation. Linguistic Inquiry, vol. 21, no. 3, 399-426. Melcuk, I.A. (1988). Dependency Syntax. Albany, NY: SUNY Press. McCawley, J.D. (1981). The Syntax and Semantics of English Relative Clauses. Lingua 53, 99-149. Musselwhite, C.R. & St. Louis, K.W. (1988). Communication Programming for Persons with Severe Handicaps. Boston, MA: College-Hill. Rothstein, S. (1985). Syntactic Forms of Predication. Bloomington: IULC Sager, Naomi. (1981). Natural Language Information Processing: A Computer Grammar of English and Its Applications. London: Addison-Wesley. Siegel, M. (1980). Capturing the Adjective. New York: Garland Publishing. Speas, M.J. (1990). Phrase Structure in Natural Language. Dordrecht, Holland: Kluwer Academic Publishers. Tennant, H.R., Ross, K.M., Saenz, R.M., Thompson, C.W., & Miller, J.R. (1983). Menu-based Natural Language Understanding. Proceedings of the 21st Annual Conference of the Association for Computational Linguistics, Cambridge, MA: ACL, 151-158. ENDNOTES [1] This is based on knowing 3 prior characters so the statistics would take the form of quadgrams such as "stri". [2] For this example I am assuming noun-noun modification is not allowed. [3] I am describing a top-down method because the computational formalism I am using in this project as well as the predictor I have constructed from this formalism work in a top-down way. It is also possible to traverse the search space in a number of other ways, including a bottom-up method. [4] It would be just as easy for the prediction to be carried out in a more general way (i.e., to produce a grammatical specification of the next word) but my desire at this point is to show how this solution can be easily applied to present problems in AAC. [5] In this formulation, I am assuming that a maximal projection can only have one double-bar level. [6] In structure (2) "Specifier" and "Complement" are not syntactic categories or constituents. They are only labels describing the function of the constituents that hold these positions in the structure. [7] For arguments on the syntactic reality of the X' level, cf. (Van Dyke, 1991a) and references there. [8] Presently I will show that the subject occupies the specifier position when the X' template is ap plied at the sentence level. [9] Chomsky's Maximality principal described in (Chomsky, 1986) says that all non-head elements in a maximal projection (i.e., instance of the X' template) must themselves be maximal projections. This view is accepted by Susan Rothstein and other syntacticians working in Chomsky's tradition. [10] It is argued in (Ernst, 1990) that the head can also determine what is in the specifier position, meaning that it is able to do what I will call "backwards subcategorization." This suggestion falls out from the fact that Ernst is not accepting the DP analysis of the noun phrase (discussed below) but still needs to account for the agreement facts that raised the argument for the DET functional head in the first place. As will become clear in what follows, I have accepted the DP analysis of the noun phrase and consequently, I explain the agreement as the DET subcategorizing (forward) for the noun phrase. This is necessary to do prediction, because it would not be useful to have to wait for the head to be entered before being able to determine if what has already been parsed in the specifier position is properly licensed. Instead, I specify particular entities that can exist in the specifier position based on knowing the instantiation of XP. For example, if the XP is a NP I know that only certain elements can occur as prenominal modifiers and these are coded into the grammar [11] In the sentence "John cries for Mary" the "for Mary" would be an adjunct preposition, rather than a complement. [12] The phrasal head selects its complement, but it does not select for adjuncts. They are present only as "extra" information in the sentence. [13] Here I am alluding to the fact that they have theta-grids and that all theta roles must be dis charged. This comes from GB's conception that syntax is a projection of lexical properties and so each head gets exactly the number of arguments that is specified for it in the lexicon. The "Theta Criterion" of GB ensures that heads and their arguments are in proper distribution. [14] Conjunctions are also said to be minor functional heads. Although they are not implemented here, they could be done in a way similar to the degree phrase. [15] This explanation recalls Chomsky's Maximality constraint; however as I mentioned previously, I am not adopting this because of my position that specifiers can also be lexical items. Hence, the statement about heads being the only non-PUSHed element must be qualified in the case of a spec ifier. Sometimes specifiers are maximal projections, and therefore PUSHed constituents, and sometimes they are unprojected words. Not requiring specifiers to be maximal projections is com putationally preferable because when the specifier is a single word unnecessary PUSHes do not need to be done only for the sake of the Maximality Constraint. [16] GB does hold that "have" and "be" can appear in INFL at surface structure if there is no modal in INFL. This is the result of "have" or "be" raising into the INFL position from their original po sition as part of a complex VP. It is therefore not unheard of for these words to appear in INFL; the major deviation in my implementation is that more than one of them can appear in INFL. [17] Prima facie it seems that an indication that "hit" can occur with all forms except 3SGPRESENT might be a better way to explain its distribution. The lexicon is set up so that the actual lexical en try allows for short cuts like this; however it is useful to be able to specify which exact combina tions a word can occur with- especially in the case of the personal pronouns. This implementation seems to facilitate handling all agreement, even though some realism is lost through positing these "agreement codes". Quid pro quo. [18] This structure would look like: [Diagram here] where the first branching verb, "have" in this case, would move into INFL position from its posi tion shown here. (Zagona, 1988) [19] All sentences are CP's, but because top level sentences rarely have a lexicalized Complemen tizer, they are sometimes referred to simply as IP's [20] Further arguments for the DET functional head can be found in (Van Dyke, 1991a) and refer ences there. [21] This is a significant point of departure from Abney's discussion of the structure of the DP. His makes the adjective a complement to DP, a move motivated by his opinion that a structure where an X' expands into an X' is undesirable. I have adopted this very structure based on the arguments of (Radford, 1987, p. 179-196). Consequently, I am inclined towards positing that the adjective phrase is in adjunct position, attached to an X'. [22] Abney explains that the head being empty is not problematic since the same thing happens to DET when there is no overt determiner in noun phrases. [23] For a discussion of these two views and my position regarding them, cf. (Van Dyke, 1991a). [24] The implementation of prepositional phrases as complements of ADJP, ADVP, or QP is motivated by the discussion in (Radford, 1987, p. 241-246). This discussion does not conceive of these phrases as part of DEGP and therefore represents the overall structure of the ADJP, ADVP, and QP differently from what I have describing. Nevertheless, I have found no other explanation for what can serve as constituents in an ADJP, ADVP, or QP and so I have adopted the portion of Radford's analysis that is appropriate. As the implementation stands, the specifier and adjunct positions of these phrases are always empty. [25] A noun which specifies a countable unit, such as "dozen", "bushel", "bundle", "feet". [26] The grammar allows no punctuation so that a phrase like "six feet too long five feet too wide" may actually occur. [27] The noun head rather than the entire DP is sufficient as a trace because the noun is all that is necessary to maintain number and reference. It is used because it is the most accessible structure at the point in the parse where relative clauses have been encountered (i.e., a DP has not been com pleted because the relative clause is part of it and so a full DP is not available to move back to its original position.). The use of the word "trace" here is not equated with any of the traces in GB (i.e., NP-trace or WH-trace), and is only meant to recall that notion. [28] Particles are taken to be bare adverbs and therefore do not adhere to Chomsky's Maximality constraint in the same way that my implementation of specifiers does not. Refer to footnote 15 for a further discussion of this. [29] Example sentence parses for each verb code can be found in (Van Dyke, 1991c). [30] According to GB, this example shows that "that Mary would take care of her" is a clause sepa rate from the main clause because "her" is a pronoun and not a reflexive such as in the ungrammat ical sentence "*Janei thought that Mary would take care of herselfi. [31] This analysis accounts for the structure derived by subject raising even though the actual pro cess of raising is not implemented. This structure is only possible with verbs having the code CNI. See (Van Dyke, 1991b) for details. [32] "Big PRO," as PRO is called, has a specific meaning and distribution in GB. This meaning is not pertinent to this project except to say that it is possible to have a PRO subject in an EC because the verb is always infinitival. [33] I would also like to note that of the three syntactic theories that use X' Theory (i.e., Generalized Phrase Structure Grammar (GPSG), Lexical-Functional Grammar (LFG), and Government and Binding Theory (GB)), GB demands the least amount of information of its lexicon. This is largely because it uses functional heads to build up structure according to X' theory. [34] This is a consequence of the fact that I am using a rule-based parser rather than a principle-based GB parser such as that described in (Wehrli, 1988). The principle-based parser works with a base-generated structure and explicitly applies the GB principles such as Binding, Theta-Criterion, Government, and the Empty Category Principle. In comparison, a rule-based parser focuses on sur face structure, and applies pre-determined rules to assign a structure to the input sentence. [35] The Projection Principle, which applies at all levels of syntactic analysis (i.e., deep structure, surface structure, phonetic form, and logical form) was originally given by Chomsky in his Lec tures in Government and Binding, 1981. The original formulation is given here, taken from (Sells, 1985): Representations at each syntactic level are projected from the lexicon, in that they observe the subcategorization properties of lexical items. [36] This occurs only with relative clauses and wh-questions. The movement done to analyze pas sive sentences is simply an exchange of argument positions. The trace DP's are generated only for clauses where there is a "hole" in the surface structure. [37] Recall from section 2 that in practice, only the noun "man" is replaced into the original posi tion. The head noun of the determiner phrase holds all the agreement information necessary to cor rectly analyze the sentence and is therefore the only part of the moved constituent that must be maintained. [38] Notice that it is clear that "Mary" serves the object role in the sentence because if the name were to be replaced with the female pronoun, it would be the accusative pronoun "her" rather than a nominative "she". The Theta Criterion explains that verbs have particular theta roles which must always be discharged. The verb "expect"" requires an object, so it discharges that role by causing the subject of the embedded clause to move into object position in the main clause. [39] The versions of X' theory used in Generalized Phrase Structure Grammar and Lexical-Functional Grammar do not make use of functional heads and therefore they can not apply the abstract template to the sentence level or to minor categories. [40] I am assuming the definition given in (Sells, 1985): a governs b iff (a) a c-commands b, and (b) a is an X, i.e., (N, V, P, A, INFL), and (c) every maximal projection dominating b dominates a.