A SYNTACTIC PREDICTOR TO ENHANCE COMMUNICATION FOR DISABLED USERS

Julie A. Van Dyke

Department of Computer and Information Sciences
University of Delaware
Newark, Delaware  19716

Technical Report 92-03
August 1991

(c) Julie A. Van Dyke
All Rights Reserved

ABSTRACT

Disorders such as Cerebral Palsy and Lou Gehrig's disease produce
severe physical disabilities that leave their victims unable to
communicate in typical ways.  In order to overcome this barrier,
rehabilitation engineers have developed communication aids which make
use of electronic technology to shift the burden of communication away
from the user.  Some strategies that have been employed include
prediction techniques that use statistics to predict the user's next
keystrokes.  Unfortunately, the statistical data used in these systems
is often biased or incomplete and consequently, these systems have had
only limited success.  This document describes a solution which
combines natural language processing techniques and linguistic theory
to produce a prediction system that, unlike previous systems, models
our natural rules of syntax. This allows the syntactic predictor to
make rule-based, linguistic determinations about what words can follow
those already processed. It can be used with other devices to reduce
the effort required of the user by predicting what word forms he or
she is likely to type next. Because this system models human
linguistic knowledge, it provides a more natural solution to the
communication problem than do many other systems currently available
to disabled users.

CONTENTS

1 	INTRODUCTION: Augmentative Communication		1
	1.1 	The User					1
	1.2 	Presently Available Devices			1
		1.2.1 	Statistic-based				2
		1.2.2 	Non-stochastic				6
	1.3 	A Linguistic-based Solution			8
2 	THE PREDICTOR 						10
	2.1 	Natural Language Processing Strategies		10
2.2 	Syntax Rules and the ATN Formalism			13
	2.3 	The Prediction Problem				17
	2.4 	Solving the Problem 				18
	2.5 	Implementation Details 				19
3 	THE GRAMMAR 						21
	3.1 	X' syntax 				 	21
	3.2	Implementing X' Theory 				29
		3.2.1 	Sentence level 		 		31
		3.2.2 	Determiner Phrases 		 	43
			3.2.2.1   Pronouns 	 		46
		3.2.3 	Degree Phrases 		 		47
		3.2.4 	Adjective Phrases 		 	52	
		3.2.5 	Noun Phrases 		 		53
			3.2.5.1   Prepositional Phrases 	54
			3.2.5.2   Relative Clauses 		55
		3.2.6 	Verb Phrases and Complementation 	59
	3.3 	Example Parse Trees 				66
4 	LINGUISTIC IMPLICATIONS 				67
	4.1 	Government and Binding Syntax 			67
	4.2 	Human Language Parsing 				73
5 	OTHER APPLICATIONS 					75
6 	FUTURE WORK 						78
7 	CONCLUSION 						80

CITED BIBLIOGRAPHY 						82
REFERENCE BIBLIOGRAPHY 						87


FIGURES
Figure 2.1 Search Space for a Context Free Grammar	12
Figure 3.1 X' Theory grammar				32
Figure 3.1 X' Theory grammar (continued)		33
Figure 3.1 X' Theory grammar (continued)		34
Figure 3.1 X' Theory grammar (continued)		35
Figure 3.2 LUNAR grammar				36
Figure 3.2 LUNAR grammar (continued)			37
Figure 3.2 LUNAR grammar (continued)			38

1 INTRODUCTION: Augmentative Communication

1.1 The User

The typical user for the system developed here is cognitively intact
and therefore has the mental capability and desire to use language the
same way a non-disabled individual would.  The user's disability
affects his or her motor capability and muscular control in a way that
produces limited dexterity.  These users are typically non-speaking,
and have difficulty typing, writing, or controlling a joy-stick to
select letters.  In the worst case the user is limited to using a
single-switch interface which makes communication very slow.

There are two types of disorders that typically produce this
condition: developmental, such as Cerebral Palsy, and degenerative,
like Lou Gehrig's disease.  Some stroke victims could also benefit
from this technology; however, often they will have more severe
linguistic impairments which make using this system inappropriate.
The particular ailment is not important, however, because if the user
can be characterized as linguistically, or cognitively, intact but
with deficient motor skills, the system I have developed has the
potential to facilitate their communication.

1.2 Presently Available Devices

The kinds of communication devices that I am describing here are known
as Alternative or Augmentative Communication (AAC) devices.
Electronic AAC devices try to exploit whatever motor capability the
user might have and direct it toward composing messages on a
computational machine.  Because motor capability is frequently
limited, the interfaces of these systems are often single switches.
The switch is used to access letters on a one-to-one basis as the user
composes a message.  The machine is then either programed or
hard-wired with various strategies to help the user as he or she
composes.

Clearly an important issue in developing these devices is the speed
with which the user can compose a message.  One of the first
single-switch devices to be developed was the Tufts Interactive
Communicator (TIC) (Foulds, 1976).  The average communication rate
measured with this device was around 2-10 words per minute (Foulds,
1980).  Compare this to non-disabled typing speeds of 60-70 words per
minute or to speaking speeds which are easily twice that, and the
extent of the communication deficiency for these users is clear.

1.2.1 Statistic-based

The kinds of strategies these AAC machines are equipped with try to
increase the user's communication rate with these devices.  For
scanning systems like the TIC, this has principally meant devising
variations in the order that letter-characters are offered to the
user.  Knowledge about the frequency of letter usage is used to
rearrange the letters on the TIC display so that the more frequent
ones are scanned before the less popular ones.  For instance, in a
row-column scan, the most frequent letters of the alphabet such as "S"
or "T" could be placed in the upper left-hand corner of the display.
This way the scan will cross these letters first and thereby avoid
scanning through unlikely choices like "V" or "Q" most of the time.
This technique was able to produce a 30% improvement over
alphabetically-ordered letter displays (Foulds, 1976). [1]

Another improvement on this scanning technique is a type of letter
prediction which uses n-gram statistics that tell what n number of
letters are likely to occur together, such as "str" or "ing".  With
this system, if the user has already indicated an "s" the system
refers to its statistics to identify the six letters that are likely
to follow the "s", like for instance "t", "p", "r", "e", "i", or "h".
These are immediately highlighted in sequence so that the user has the
opportunity to select them before the normal row-column scanning
process is resumed.  By anticipating letter selection in this way, the
communication rate has been improved by up to 50% (Foulds, 1976).

This scanning technique can also work at the word level, where the
ordering improvements are based on what words are the most frequent.
In contrast to TIC, which uses hard-wired letter grids, Meta4 is a
software-based communication device that uses static word pages
containing the most common words (Miller, 1990).  Instead of having to
spell out each word, the user navigates through the pages using the
single switch.  The system's first page might contain letter intervals
such as "AA-AL" and "AL-AZ" and the scanning passes through these
intervals until the user chooses the one containing the word he wants
to use.  Then the display changes to show a page containing vocabulary
words that he or she can choose from using the same scanning
technique.  The words included on these pages are a vocabulary set,
called a "book", that can be tailored for each individual user.  It is
possible for users to have several books to choose from and in this
way a large amount of vocabulary can be made available in a way that
does not require the user to spell out every word letter by letter.

Meta4 is a dynamic communication device because the display changes as
the user composes his or her message.  Word prediction systems are
even more dynamic, as they actually attempt to guess the next word
based on what has already been entered.  The PAL system developed at
the University of Dundee (Swiffin, et al. 1987) is typical of these
prediction systems in that it uses frequency statistics to make its
predictions.  When the user types a letter, the system displays the
five most frequent words beginning with that letter in a special
scanning window.  The user can choose one of these words or type
another letter.  With each keystroke the frequency statistics are
checked and possible completions for the word are offered to the user
in the scanning window.  In this way the system attempts to predict
what the word is before the user has typed it out entirely.  The
number of keystrokes required of the user is reduced because the
system completes the word as soon as the user indicates that the right
one has been found.  Those who developed PAL claim that they have been
able to obtain a rate reduction of 50% based on a dictionary of 1000
words, wherein each word has its own frequency data.  It is important
to point out, however, that 1000 words is a very small dictionary for
a communication system meant to be used for everyday communication.
Because of the way PAL uses syntax, dictionaries of a much larger size
are likely to severely degrade the performance because it will take
longer to calculate the probability of each word using the word-pair
statistics and consequently it will take longer to determine the five
most probable words.

This raises a problem that is common to all these systems I have been
discussing: they depend on statistics, rather than the rule-based
linguistic information that humans actually use when they communicate.
This makes the system only as effective as the statistics are accurate
and complete.  As with the PAL frequency counts, statistics are
typically collected over a large texts often derived from newspapers
and published reading materials. This means they are liable to be
skewed by the subject matter of the text.  For example, the Brown
Corpus of American English, which is a text of approximately one
million words and one that is often used for deriving statistics for
these systems, represents words like "eggs" and "bunny" and "Easter"
as being common words in everyday language use.  This is a result of
the time of year that the corpus was compiled, not of actual facts
about English usage.  This problem can be solved to some extent by
using statistics derived from the user's own language use; however,
the same problem can occur with these texts because a user does not
always talk about the same topics and so the word statistics could
change depending on his topic of conversation. In school frequently
used words might be "homework" or "teacher" but when a child is
playing these words will be the least likely words that he will use.
A problem with statistically-based systems also arises when novel
words are used.  The system has no statistics for these words and so
despite the statistical information in his or her AAC device, the user
will still have to completely spell out words.

1.2.2 Non-stochastic

Abbreviation systems endeavor to improve communication rates without
statistics by allowing the user to abbreviate his or her words rather
than spelling them out completely.  The user indicates a few letters
with the scanning device and the system assumes responsibility for
expanding the abbreviations into full words.  A major problem with
these systems, however, is that the user is required to memorize
specific abbreviations for words.  This arises because a computational
system can only handle a one-to-one correspondence between a word and
its abbreviation.  Thus, the system may require that the word "work"
be abbreviated "wrk" in order to differentiate it from the word
"wake", which might be abbreviated "wk".  But for the user, "wk" seems
like an abbreviation that will work equal well for both words and so
he or she may not think to use "wrk" instead of the more easily
constructed "wk".  This setup means the user must undergo specialized
training to learn the system's abbreviations before he or she can
begin communicating.

In addition, these abbreviation systems and scanning devices assume
that the user knows how to spell the word he is trying to use.  A
communication device called Minspeak (Baker, 1985) was an attempt to
alleviate this problem as well as those associated with memorizing
pre-determined abbreviations.  This system used a keyboard of
multi-meaning icons together with keys for morphological and
rudimentary syntactic information to create sentences.  For example
the user might use the key sequence [boy-image] + [noun-key] + [smiley
face-image] + [verb-key] + [book-image] + [building-image] +
[noun-key] + [declarative-sentence key] to compose the sentence "Boy
like school."  Once the sentence is composed, the user will press a
"speak" key and the computer will speak the phrase the user created.
This use of images allows the abbreviations to be semantically
meaningful to the user and presumably easier for him or her to
remember.

Minspeak has proven useful for many members of the disabled community,
but it has also been problematic for some because it still requires
the user to understand and/or memorize the associations between the
images and the English words.  A different approach to improving
abbreviation systems has been attempted with flexible abbreviation
systems such as the "Word Compansion" project described in (Demasco et
al., 1989) and in (Stum et al., 1991).  This system attempts to
automate the methods humans use for creating abbreviations so that the
computer can associate more than one abbreviation with a particular
word.  This means the computer will be able to handle "wk" as an
abbreviation for any of "work", "wake", "walk", "wok", etc.  The user
can be freer with his abbreviations and the system's success does not
rely on how well the user remembers the abbreviation the computer
knows for the word he or she desires.  Thus, word compansion shifts
the burden for abbreviation expansion away from the user and makes it
a computational problem.

In order to expand the abbreviations, the system assumes the letters
in the abbreviation are in the same order that they occur in the word.
This makes expansion similar to a matching task: it assumes variables
between the known letters and tries to match this form to the more
than 5000 words in the dictionary the system currently uses.  The
problem with this is that there may be a large number of matches for
any given abbreviation and so the user could still have to expend a
considerable amount of time and effort to find the desired word among
many possibilities.

1.3 A Linguistic-based Solution

I have discussed some ways communication devices have been improved
using statistics and prediction techniques; however, it has been shown
that all of these strategies have their limitations.  One obvious
solution which has not been fully developed in the field of
Augmentative Communication is to exploit linguistic knowledge.  A
priori this seems to be the best solution because it uses exactly the
knowledge the user draws from when he constructs sentences.

I have developed a prediction technique that exploits a grammar of
English to make its predictions: it is driven by the generalities of
language rather than artifacts of the data a system's statistics were
taken from.  A system like the one discussed here could easily be
brought to bear on the abbreviation expansion problem: the number of
abbreviation expansions can be greatly reduced by considering the
syntactic categories of the expansions in relation to the syntactic
structure of the words the system has already processed.  For example,
if the user has entered the partial sentence "The boys" and the next
word abbreviation is "ht", instead of offering the user a long list
like "hit, hits, hot, hat, hate, hates, height, hunt, hunts, hurts,
hurt, hut," the user will only be offered the plural verbs in this
list because it will know that nouns and adjectives are not
appropriate once the head noun of a noun phrase has been
identified. [2] In this case, the user will only need to choose from
the syntactically appropriate words in the list: "hit, hate, hunt,
hurt."  Notice that this also increases the user's communication rate
because he or she has fewer words to scan through before finding the
desired word.

In addition to its usefulness with flexible abbreviation expansion,
the system I have developed is a prediction system that could be used
to improve other communication devices by determining the syntactic
form of the word that is likely to follow what the user has already
entered.  For example in a dynamic system like Meta4, if the user has
already entered a noun, and the user chooses the interval ST-SZ, the
system could go directly to a page containing only verbs that begin
with those letters.

Thus, by modeling syntactic knowledge in the computer, I can produce a
system that can improve existing communication devices.  The
improvement provided is a more natural one for the user because it
comes from the information humans use anyway when they communicate.
It is not an ad hoc solution to the communication problems these
people face, it is a solution motivated by the nature of the problem:
an inability to use language in a "natural" unconstrained way.  If we
can make the machine use language the way a human does, then rather
than being hindered by the technology the user's disabilities force
him to use, both machine and human can cooperate to enhance the
disabled person's communication.

2 THE PREDICTOR

2.1 Natural Language Processing Strategies

Natural Language Processing (NLP) provides a mechanism for the
formal representation of syntax rules.  These rules are applied to
sentences in a process called "parsing" which breaks the sentence
down into its component parts The result is a "parse tree" that
shows the syntactic categories and functional relationships between
the constituents in the sentence.  For example, applying NLP syntax
rules to the sentence

(1) 	The man walked the dog.

gives the following parse tree, or "parse", shown in computational
notation:

(2) 	(S (NP (DET the) (N man))
	     (VP (V walked) (NP (DET the) (N dog))))

A noun phrase is labeled "NP", verb phrases are "VP", and each word is
given an appropriate category label such as "DET", "N", or "V".  This
structure represents the more commonly known tree structure below:

(3) [Parse tree here]

To generate this parse, the computer needs to search all the possible
combinations of grammar rules.  This becomes complicated because
grammars normally have different ways of expanding constituents, for
example, an NP could be composed of a determiner-adjective-noun
sequence or it could simply be a proper noun.  Many combinations might
be possible, so the computer must try them all until it finds the
right one.  The final parse ends up being a subset of the overall
search space, which can be very large, as in Figure 2.1. Vertical dots
are used to indicate where parts of the search space have been left
out.  The complete search space is infinitely deep because of
recursive elements like the NP.  Each time an NP occurs it can be
broken into three different groups of constituents, here represented
by nodes 2, 3, and 4.  Since the NP expansion in rule (4) also has a
NP as part of its structure, the search tree can never be completely
expanded.

In order to tackle a search space like this the computer might use a
"top-down, depth-first" method, wherein processing starts at the top S
node and goes down the tree as far as it can in a left-to-right
direction. [3] When it reaches a primitive or a point where no rules
apply to the input, the processing backs up and goes down another
branch of the tree.  Consider, for example, the processing that
produced the parse in (3) from the search space in Figure 2.1: The
computer first uses rule 1 to expand S into NP1.  Then it tries rule 2
and finds that it needs an N.  Since the first word of the sentence
(1) is "The" this path fails and the processing backs up to the NP1.
Next it tries rule 3 and finds that it must complete a DET and this
succeeds with the word "the".  Because the DET is also connected to a
N path, the processor must complete both paths before rule 3 will be
successful.  It therefore backs up to try look for the N in the other
part of rule 3.  This succeeds with "man" and so rule 3 is completed
and the processing returns to NP1.  The next branch of the tree is
that generated by rule 4.  In this case, the input is the word
"walked" and the computer will try this rule, fail, and processing
will continue to the VP.  Here again there are 3 possible rules for
expanding the rest of the sentence.  Taking the left-most branch gives
a single verb generated by rule 6.  This would work with the input
"walked" and so it is taken.  But now the rest of the sentence is "the
dog" and the processing will continue trying rules 7 and 8 to account
for that noun phrase.  When parser finds that these rules fail because
they include verbs in their structures, it will back up and choose not
to take 6 (undoing what it has already done).  It will take rule 7
instead and since this is composed of a V and an NP, this rule will
succeed.  Since there is no more input the processing will stop; the
parse in (3) having been found.  In this way the computer tries each
path in the search space, beginning from the left-most one, until it
completes a successful traversal through the search space.

2.2 Syntax Rules and the ATN Formalism

The syntactic predictor runs on a grammar constructed as an Augmented
Transition Network (ATN) (Woods, 1969), (Bates, 1978).  This is a
parsing formalism which represents syntax rules in the form of
networks showing a transition from one state to the next.  This
transition is analogous to each step towards completing the rule; a
phrase structure rule like "NP --> DET N" has a transition between NP
and DET and one between DET an N.  The transitions are depicted as
arcs in a network as follows:

(4) [ATN network here]

The double-circle around the NP node identifies it as the start state
of the network.  The labels of the intermediate states show what
constituents of the rule have been completed (i.e., NP/DET means an
determiner has been processed already in the NP network).  The final
state is the one having the arc labeled "POP", which is an indication
that the rule is complete.

The formalism provides different kinds of transitions between parts of
a phrase structure syntax rule.  The most useful is the CAT arc, which
checks to see if the category specified by the phrase structure rule
matches that of the input.  The CAT arc might have the following form,
given in LISP notation:

(5)  	(CAT DET t (setr DET *) (to NP/DET)) 

In this arc, "CAT" is a label telling the parser what sort of
processing is necessary, in this case to check the category of the
input word.  The symbol "DET", for determiner, specifies the category
that the phrase structure rule is looking for.  The "t" is in the
position where a test on the input might go.  These might include a
check on noun-verb agreement, the presence of a particular feature in
the word's lexical entry, or any other checks that might inform the
parser of an ungrammatical sentence before it has gone too deep in the
search space.  In this case, the act of checking the category will
tell whether or not the transition can be made, so no test is
necessary and a dummy test (i.e., one that is always true) in this
position allows processing to continue.  The "(setr DET *)" is the
action that assigns the word of input, represented by *, the name of
its syntactic category.  The "(to NP/DET)" tells the parser where to
go next, in this case to the state after the DET transition has been
made.

Other transitions are programmed in the same way with appropriate
tests and actions.  The main difference is in the first label
signifying what kind of processing the parser needs to do in order for
the transition to be completed.  One of the most important kinds of
transitions, or arcs, is the "PUSH" arc.  This accounts for the
recurrence of constituents like the NP in many rules.  It signals the
parser that it needs to temporarily leave the present rule and process
the rules for expanding the NP.  These are represented by separate
networks, and because they can be used over and over again, the size
of the grammar is small in relation to the size of the sentence
structure it can account for.  When the NP is completed, the
transition has been completed and the parser returns to the original
network to continue working on a particular phrase structure rule.

Other kinds of arcs include WRD arcs, which allow a phrase structure
rule to specify that a particular word be in the sentence; JUMP arcs,
which allow for processing to proceed to a different state without any
actions or checking being done; MEM arcs, which require the word of
input to be one of a particular set of words; and POP arcs which
signal that a network is complete and provide for building larger
structures out of the constituents most recently processed.

A special kind of arc called the VIR arc helps to account for movement
in English.  There are certain English sentences, such as
wh-questions, in which a constituent moves from its original position
in the sentence into a new position at surface structure.  The object
of the sentence might be moved out of object position and replaced
with a wh-word, as in the sentence

(6) 	What did John eat? 

The underlying structure of sentence (6) is

(7) 	John did eat what.

The ATN processes (6) by using a "hold-list" and VIR arcs to return
the moved constituent to its original position.  When the computer
encounters the wh-word "what" it is processed as an NP and put on the
hold-list.  A VIR arc occurs in the grammar at the place where the
constituent has moved from (i.e., in object position of sentence (7)).
When a VIR arc is encountered in the grammar, instead of looking for a
constituent in the string of input, the NP is taken from the hold-list
to satisfy the phrase structure rules.

With this mechanism, the ATN can undo transformations that have
occurred to derive the surface structure it is processing.  The VIR
arc is used to signify the positions from which a constituent could
have originated and the "hold-list" allows the parser to wait before
assigning a constituent its position in the final sentence structure.
This process is used whenever sentences are left with "holes" after
movement has occurred, as is the case with relative clauses as well as
the wh-movement explained here.

2.3 The Prediction Problem

The ATN has proven very useful for problems in natural language
processing. A simple parser that works like I described in the
previous section is not useful for prediction, however, because it
follows one parse at a time and backtracks if it reaches a dead-end.
To do prediction, the system must take a partial sentence and return
the features and category of the next input word.  This can not be
achieved by following a single parse at a time because often there are
category or attachment ambiguities in sentences that can only be
resolved when the entire sentence is known. For example, consider the
simplified grammar network below:

(8) [ATN network here]

If the system only has the partial sentence "the" and the word "gold"
is entered, the parser does not know whether "gold" is an adjective or
a noun.  The ATN parser as I have described it would choose one path
(e.g., the top one) and follow it down the network as far as it can
go.  Consequently it may not adequately predict the category of the
word that follows "gold": with network (8) it will predict a noun to
be next as if the sentence were "the gold ring is beautiful."
However, it is just as likely that a verb could be next as if the
sentence were "the gold is in the bank."

As a result of this "one-at-a-time" method of parsing, the ATN is be
forced into continual back-tracking each time a word is entered.  With
each path change, possible predictions would be unaccounted for
because the computer would only be following one path at a time.  If
the computer took "gold" to be an adjective, at that point in the
processing it cannot predict that the next word could be a verb as
well as a noun.  Given the wide variety of structures available in
English, this means that the prediction would be incomplete for a
significant number of cases.  In addition, this would make the
processing much slower and therefore it would be difficult to use this
system for the kind of spontaneous communication that AAC devices
strive to offer.

2.4 Solving the Problem

The predictor I have built solves the prediction problem by
traversing the search space in Figure 2.1 in a breadth-first, rather
than top-down manner.  This means that it completes the first
transition in each phrase structure rule before going deeper in the
tree.  Using the previous example, the predictor will analyze "gold"
as a noun in one parse and as an "adjective" in another.  When the
next word is entered, it may eliminate one of these interpretations,
or else continue both parses until the entire sentence has been
entered.  Either way, the parser is able to know at any point in the
sentence what type of word could be next, because it is holding all
possible structures for the words entered thus far.  This means the
processing is done in a non-deterministic fashion, and therefore
complete predictions can be made because the computer has not
committed itself to a particular parse that may turn out to be
different from what the user intended.  This also means that when the
entire sentence has been entered, the parser may have built more than
one structure for a particular sequence of words.  For example,
consider the sentence:

(9) 	The man told the woman that he loved the story.

The user could have meant either that the indirect object is "the
woman that he loved" and the object be "the story" or that the
indirect object be "the woman" and the object be "that he loved the
story."  The predictor will output both these structures so that they
could easily be analyzed further by a semantic or pragmatic processor
that can choose the correct interpretation based on the context the
user has built.

2.5 Implementation Details 

This predictor has been implemented in SUN Common LISP.  There is also
an early implementation in Franz Lisp.  It is intended as a component
in a more complex communication system and as such, there has been
little attention paid to the user interface.  Presently the system is
activated with the command "predict" and a partial sentence given as
its argument.  The system goes as far as it can with that partial
sentence and then goes into a "break package" where the user can
decide between two methods of proceeding.  The first method allows the
next word in the sentence to be entered.  It incorporates that word
into the partial parses already created by the system and then
reenters the break package.  At each point when a parse is completed,
the system prints out that parse tree.  These parses are not final
analyses, as they can still be given additional words that will be
incorporated into them.  The system halts only when there is no
possible way of continuing the parse given the input it already has.
In this case the predictor returns "nil."

The second method is where the prediction is carried out.  Presently
it is tailored to eliminate inappropriate words (e.g. possible
abbreviation expansions) from a list entered by the user.  The system
returns only those words which could be next according to the grammar
it runs from and the partial sentence it has already processed.  Once
the eliminations have been made, the break package resumes and the
user is again given the two choices for proceeding until he signals
that he wants to quit. [4]

The grammar that the predictor uses to create and judge grammaticality
is described in more detail in Section 3.  Recall that part of the
function of the grammar arcs is to carry out tests of particular
features on the input words to determine if it is efficient to carry
out a particular rule.  These features are encoded in the dictionary
entries for each word that the computer knows.  The dictionary and the
features within it are described in more detail in Section 4 on the
lexicon.

In order to help with adding words to the computer's dictionary there
is an auxiliary package used at run-time to check each word entered
against those in the dictionary.  When the computer finds a word it
does not know, this package allows that word to be added automatically
in the dictionary.  The package gives the user directions for entering
the appropriate features for each word to ensure that the dictionary
entry is of the form the grammar expects (cf. Section 4).

 3 THE GRAMMAR

The substance of the syntactic predictions comes from the grammar the
predictor runs from and so if the predictions are to be complete, the
system's grammar must be complete.  Up to now, the biggest objection
to using grammars for augmentative communication is that a
sufficiently complete one is thought to be difficult to construct by
hand.  I have confronted this objection by making my grammar the
embodiment of a linguistic theory called X' (pronounced "X-bar")
theory which provides an abstract, generalized description for a
multitude of structures.  Its conventions make a complete grammar easy
to construct and modify, while also providing a mechanism to describe
the specific restrictions on what kinds of constituents can occur
where.  These restrictions are crucial to this project because the
game of prediction is to eliminate syntactic categories that are not
possible in a particular context.

3.1 X' syntax

All of the three most popular syntactic theories, Government and
Binding (GB), Generalized Phrase Structure Grammar (GPSG), and
Lexical-functional Grammar (LFG), have adopted forms of X' theory
because of its explanatory power (Sells, 1985). This power comes from
invoking a purely structural description of syntax rules.  Before X'
Theory it was common to talk about syntax rules as sets of phrase
structure rules like those below:

(10)	NP --> N
	NP --> N PP
	VP --> V
	VP --> V NP
	VP --> V PP

Notice that these rules serve two purposes: to tell what particular
constituents the phrases on the left hand side of the rule can be
broken into and to give the position, or structure, of these
constituents.  However there are similar structures among different
phrases, for example both an NP and VP can be rewritten as just an N
or V, respectively.  In addition, they can both be rewritten with the
N or V plus another constituent to the right.

 X' Theory captures this similarity by claiming that the basic
syntactic structure is given by the following template:

(11) [Diagram here]

This generalized structure captures patterns found in the internal
structure of many different kinds of phrases (i.e., noun phrases,
prepositional phrases, verb phrases): they all have a head
constituent, complements and various other modifiers that can come
either before or after the head.  In the template, the head is
represented by the variable X.  This is the element that gives the
phrase its character; for example, the head of the NP is an N, the
head of a PP is a P, and the head of the VP is the V.  The entire
phrase is said to be a "projection" of the head; a structure built up
using this template is called a "maximal projection" because the
entire template structure has been used.  It is also referred to as an
"X-double-bar", reflecting the fact that it is the highest level of
the template and includes all head modifiers. [5]

I have adopted a formulation of X' theory that takes the intermediate
X' level as the site where modifiers like adjectives and prepositional
phrases are attached (Radford, 1988).  The modifier, also called an
adjunct, can be attached on either side of the X' so that it could be
in either pre-head, as is the case with adjectives, or a post-head
position as with prepositional phrases.  The X' is recursive in that
modifiers expand it into another X' level, generating the structure:
[6]

This means the intermediate X' level plays a crucial role in the
syntactic structure. [7] The most basic construction of this
intermediate level (i.e., not considering adjuncts) includes the
phrasal head and its complement, which is also called its argument and
which is structurally its sister.  The head of the phrase
subcategorizes for its complement, meaning that it requires a
particular kind of complement to occur with it.  For example if the
phrasal head were a verb, it would subcategorize for a particular kind
of object (often a noun phrase), and if the head were a preposition,
it would subcategorize for a noun phrase.  More about the kinds of
complements heads can subcategorize will follow below, but for a
complete discussion, cf. (Van Dyke, 1991b).

The sister to X' is called a specifier, which has the function of
expanding the X' completely into a maximal projection.  There is some
discussion among X' theorists about what kind of constituent can occur
as a specifier.  The position I am accepting for constructing this
grammar is elucidated by Thomas Ernst, who treats the position as "the
response of syntax to the need to give special status to some
particular peripheral element: demonstratives, subjects, etc. (Ernst,
1990, p. 25.)" [8] He proposes that the specifier position is to be
used to ensure that certain elements are always phrase-initial
position; for example, following data borrowed from (Ernst, 1990,
p. 9) shows that it is not possible to reorder any words to produce
emphasis:

(13) 	a. A fancy new car.
	b. A new fancy car.

(14)	a. The many honest men.
	b. *The honest many men. 

Thus, under Ernst's definition and contrary to a widely know Chomskian
theory, specifiers do not have to be maximal projections. [9] This
formulation is also described in (Quirk et. al., 1985), where it is
explained that some words, such as determiners and particles, are
"single tokens, complete to themselves."  While it is not always the
case that specifiers are non-maximal projections (I will soon show
that a sentences's subject occupies a specifier position), the single
tokens occur often enough for it to be computationally inefficient to
require that all specifier positions be maximal projections.  In my
grammar, the specifier position can contain either a lexical item or a
maximal projection, depending on the head of a particular specifier's
projection.

In the syntactic "template" I have been discussing (i.e., structure
(11) above) only the head element is required.  The specifier,
complement, and adjuncts are all optionally present for any given
instantiation of the template.  The presence of the complement is
determined by the head itself. [10] For example consider the case of
the head being a verb, which would make the XP a VP.  A transitive
verb like "kill" requires a complement, as shown in the following
data:

(15)	 John killed Mary.
 	*John killed.

Conversely, an intransitive verb like "cry" does not take a
complement:

(16) 	John cries.
	*John cries Mary. [11]

The power of the phrasal head to dictate its complement is called
subcategorization. [12] This ability is common to all phrasal heads,
although there are some behaviors phrasal heads may or may not exhibit
that have prompted syntacticians to distinguish different types.  Here
I will adopt the taxonomy explicated by Susan Rothstein (Rothstein,
1991), who distinguishes three kinds of heads: lexical, functional,
and minor.

Lexical heads are those like verbs or prepositions.  They determine
the character of their maximal projection so that if the head is a V,
the projection is a verb phrase (VP), or if the head is a P, the
projection is a prepositional phrase (PP).  These words have very
specific requirements on the number of complements they must have and
this number must always be satisfied for the phrase to be
realized. [13]

The second kind of head is called a functional head because of its
functional role in a phrase.  These heads determine the nature of
their maximal projection just as the lexical head does, but they are
not necessarily realized as a lexical item.  For instance the INFL,
which is considered the head of a sentence, is typically said to be
realized by the tense and agreement of the main verb, and sometimes as
a modal.  Since the head of the sentence is called INFL, X' theory
describes the canonical, top-level sentence as an IP, or "Inflection
Phrase."  The 3 major heads in this category are INFL, which holds
inflection, DET, which holds the determiner and also determines
agreement, and COMP, which holds the complementizer "that" in embedded
clauses and whose specifier position holds WH-question words.  I will
discuss each of these in more detail in the implementation section
below.

The minor heads are also functional heads in the sense that they are
not frequently lexicalized, but unlike the previous two head types,
the minor heads do not determine the nature of their projection.  This
is done instead by the complement they subcategorize for depending in
some cases on their position in the sentence.  Typical minor heads are
degree words like "too" or "as" and therefore the head is called DEG,
for degree.  The "degree phrase" is discussed more completely in the
implementation section below. [14]

The grammar I have built is based on these heads and the structures
they generate via subcategorization.  Since there is a finite number
of instantiations for these head types, I have eliminated the need for
phrase structure rules that tell exactly what constituents go where.
Instead the structure of (11), reproduced below, provides the order
and relationships of constituents to one another and all other
information comes from the requirements (i.e., the subcategorization)
of the specific words themselves.

(11) [Diagram here]

The grammar has both top-down and bottom-up motivation in the sense
that while the template structure must be satisfied, the input itself
determines how that will be done through subcategorization.  This
simplifies writing the grammar because now its not a question of
including enough phrase structure rules, but of providing the
structure and allowing the subcategorization to determine when it is
appropriate.

3.2 Implementing X' Theory

Recall in my discussion of the ATN formalism that I characterized a
PUSH arc as the mechanism's way of handling frequently occurring
constituents such as noun and prepositional phrases.  The PUSH arc
gives the ATN more power because it allows recursive processing (i.e.,
a given network can refer back to itself, as in the PS rule NP --> NP
PP).  It was created for its computational power; without recursion,
grammars are enormous and frequently redundant.  In this project I
propose a linguistic motivation for the use of PUSH arcs- namely as
the means for creating a projection in the X' theory template.  All
maximal projections are the result of 2 PUSH arcs: a PUSH arc for the
X' level and a PUSH for the X'' level.  Each PUSH arc adds more
structure to the level below, so that the implementation of the
template in (11) is done with the following networks:

(17) [Network diagrams here]

For every instantiation of X there is a pair of networks like these.
Each time a maximal projection appears in a network, it is done by
PUSHing for that XP.  Within that XP, an X' constituent is PUSHed for
because it is a projection of the head.  Only the head is a non-PUSHed
element, just as only the head is a non-maximal projection in a
rule. [15]

In Figures 3.1 and 3.2 on the following pages, it is clear how the
structural template of X' theory, as implemented in the two networks
in (21) has greatly improved the task of writing a grammar.  Figure
3.1 shows the series of networks that make up the entire grammar I
have implemented.  Figure 3.2, reproduced from (Bates, 1983,
p. 217-219), shows the networks for the LUNAR grammar, which was one
of the earliest and most complete ATN grammars.  The X' theory grammar
has a greater coverage than the LUNAR grammar and is a more sleek
implementation.  This makes modifying it to cover more syntactic
phenomenon (e.g., topicalization, ellipsis, etc.) easier because it is
immediately clear where new pieces of grammar should be added.

In the following sections I discuss the details of implementing
network pairs for each of the heads.  In Section 5, I will further
discuss the implications and deviations of this implementation from
Government and Binding Syntax, whence I borrowed this X' theory
grammar framework.

3.2.1 Sentence level

As I mentioned previously, GB theory posits a functional category
called "INFL" for "inflection" in order to apply the X' template to
the sentence level.  This category holds the tense and person-number
agreement information for the sentence.  This information can be
lexicalized in the form of a modal if the clause is finite or as a
"to" in the case of non-finite (e.g., infinitival or participial)
clauses.  INFL can not hold a modal and a "to" at the same time
because sentences must be either finite or non-finite.  It is possible
for INFL to be unlexicalized, as when there is just a single verb in
the sentence.  In this case the agreement and tense are considered to
be in INFL, but lexicalized only on the main verb.

The INFL head and the X' template of (11) produce an analysis for a
sentence traditionally analyzed as [NP VP] with the following form:

(18) [Diagram here]

The head of this structure is INFL, the complement of the head is the
VP and the specifier of I' is the NP.  The overall structure is a
maximal projection called an INFLection-phrase or IP.  In this
implementation, the INFL node holds a code showing tense and
agreement, as X' theory stipulates, but it also holds modals and the
auxiliary verbs "have" and "be", a characteristic that does not follow
typical X' theory (cf. (Radford, 1988, p. 312)). [16] Thus, if a sentence
includes all the auxiliary verbs (e.g., "The man could have been
sleeping"), the INFL node my grammar will produce has the following
form:

(19) (INFL (AGR 3SGPAST) (MODAL COULD) (HAVE EN) (BE ING)) 

The constituent AGR is the agreement marker wherein number and person
agreement is combined with the verb tense.  This is also different
from the standard X' theory conception of INFL as containing two
binary variables: one for agreement and one for tense.  I have
combined these variables in order to account for lexical items whose
agreement changes with their tense.  Consider the sentences below:

(20) The man hit the ball yesterday. 
	[Intended past tense.]

(21) *The man hit the ball today.
	[Intended present tense.]

The problem here is that the same lexical item can be past or present,
but have different agreement requirements.  The sentence in (29) is
ungrammatical because, in the present tense, "hit" can only agree with
non-third person singular subjects.  It was necessary to find a
separate way to account for these two lexicalizations because it is
not possible for the lexicon to have two separate entries for the same
lexical item.  By using these single variables, agreement could be
preserved: it is possible to specify that "hit" can agree with
1SGPRESENT, 2SGPRESENT, 1PLPRESENT, 2PLPRESENT, 3PLPRESENT, and all of
the PAST values (i.e., 1SGPAST, 2SGPAST, etc.). [17] The details of
these lexicon entries are explained more thoroughly in Section 4.

Implementing the INFL in this way means that the Verb phrase only
contains a single V which is the main verb in the sentence.  This is
in contrast to the branching complex VP structure that would be
necessary to hold "have" and "be" when they occur [18].  Eliminating
these branching structures means extra PUSHes have been eliminated and
this makes the processing more computationally efficient.  In
preserving the participial agreement requirements of "have" and "be,"
and also checking for these requirements in tests on the grammar
rules, I have maintained the ability of these verbs to determine
structure without any extra computation.  The same result is produced;
the branching structure is just different from what X' theory
predicts.

A problem with the IP structure shown in (18) arises when sentences
have embedded IP's. Consider:

(22) 	The committee may insist [that the chairman resign].

The IP structure in (18) can not accommodate the "that", which is a
complementizer introducing the embedded clause.  It would be
incorrectly analyzed as a determiner like "the" because it introduces
the entire embedded sentence.  Similarly, it would not be an object of
"insist" because it has no reference as an independent pronoun in this
sentence.  This problem is solved in X' theory with the functional
head COMP, or Complementizer.  The COMP is the head of the overall
sentence, holding the "that" in embedded clauses or being empty for
top-level sentences. [19] The IP is in the complement position in the
template, making the structure of sentences, called CP's for
"Complementizer Phrase", like the following:

(23) [Diagram here]

That the complement is the head of the sentence is supported by the
fact that Complementizers influence the content of the INFL node.  Any
clause that has a Complementizer must have an INFL that is compatible
with it in terms of being finite or non-finite.  For example a
non-finite complementizer like "for" cannot introduce a sentence with
a finite INFL:

(24) *They are anxious [for you make up your mind.]

But a non-finite INFL is acceptable:

(25) They are anxious [for you to make up your mind.]

For the rest of this discussion I will refer to sentences as CP's
headed by COMP and sentences without complementizers as IP's headed by
INFL (cf. footnote 19).

3.2.2 Determiner Phrases

The noun phrase can be organized around a functional head of the same
type as INFL.  This was demonstrated by Steven Abney when he argued
for a DET head that is lexicalized by the determiner (Abney, 1987).
The rationale behind this analysis stems from the fact that the
determiner can subcategorize for its complement like lexical
categories and therefore must have a more central role in the noun
phrase than simply specifier, as shown in (8).  For example in English
there are some determiners that either can or can not take
complements.  Consider:

(26)	 That is terrific.
	[Complement not required.]

(27) 	*The is terrific.
	[Complement required.]

(28) 	The boy is terrific.
	[Complement required.]

(29) 	*A boys is terrific.
	[Particular type of complement required.]

Thus, it is evident that the determiner acts just like the verb in
shaping its projection. [20]

An analysis of DET as the functional head of the noun phrase serves to
unify the X' theory analysis of the noun phrase with that of the
sentence.  The structure of the DP, reminiscent of IP, is shown below.
Note that a relative clause can serve as either a complement or an
adjunct.  This is discussed in more detail in section 3.3.5.2 of this
section.

(30) [Diagram here]

The most useful effect of this structure for the purposes of this
project is that with the specifier positions of DP and NP, there are
structural positions to account for the various types of words that
can occur in these positions.  Without these positions they would have
to be considered adjuncts before the determiner or between the
determiner and noun; however, there would be no significant ordering
among these words.  Consider the following noun phrases

(31)	a. [DET a] [NP-SPEC dozen] roses
	b. [DP-SPEC all] [DET 0] [NP-SPEC six] men
	c. *[DP-SPEC six] [DET 0] [NP-SPEC many] men
	d. [DP-SPEC all] [DET the] [NP-SPEC six thousand] men
	e. [DP-SPEC all] [DET the] [NP-SPEC many] men
	f. [DP-SPEC many] [DET the] [NP-SPEC 0] men

Furthermore, if adjectives can come between determiners and nouns in
relatively free order, why can't other words such as the quantifier
"many"?  Recall the example given in (13-14), reproduced here as (32):

(32)	a. [DET A] [ADJS fancy new] car
	b. [DET A] [ADJS new fancy] car
	c. [DET The] [Q many] [ADJ honest] men
	d. *[DET The] [ADJ honest] [Q many] men

It seems therefore, that there are certain types of words that must
appear in particular positions within the noun phrase, and the extra
specifier positions provided by (30) allow for these words.  Following
this analysis, I have implemented a structure wherein the Determiner
is the head of the noun phrase.  This means the noun phrase is a
projection of Det and hence called a DP (i.e., determiner phrase).  I
will refer to what have traditionally been called noun phrases as
Determiner Phrases throughout the rest of this work.

3.2.2.1 Pronouns

In addition to arguing that Det is the head of the noun phrase, Abney
argues for generating pronouns in the Determiner position.  While this
is not to say that pronouns are determiners, it is a way of accounting
for the fact that like determiners, pronouns have a primarily
functional status.  They provide agreement features like number,
person, and gender and therefore influence the form of the verb that
follows them.

In this grammar, pronouns have been implemented in the same position
as determiners.  This has the nice effect of allowing pronouns to be
used with non-empty noun heads, as in:

(33)	We students are tired.

Here, just as in regular determiner phrases, the number of the noun
must agree with the determiner phrase head-- in this case, the pronoun
(cf. "*We student are tired." and "*We student is tired.").

This implementation also allows the possibility for determiners like
"that" which do not require NP complements to stand alone in DP's:

(34)	That is ridiculous.

Note that the specifier position of DP's contains only certain kinds
of determiners like "all" which can precede the articles.  The other
positions in the X' theory template for DP's are filled as follows:
articles and other pronouns are the only elements in the head of DP
position.  There is only one kind of complement for DP's: a NP.  It is
possible to have adjective phrases occurring before the NP (cf. next
section) but these occur in adjunct position rather than complement
position. [21]

3.2.3 Degree Phrases

To accompany his interpretation of the DP as a projection of a
functional head, Abney posits an abstract head to describe adjective
phrases and adverb phrases.  Abney's explanation of this head, which
he calls DEG, for "degree", makes it a head of the kind that Rothstein
calls functional.  Rothstein herself further refined the analysis of
the DEG head by calling it a "minor" functional head.  While empty
with simple adjectives and adverbs (i.e., phrases such as "white hair"
or "run quickly"), it is lexicalized by words like "how", "this",
"that", "so", "too", "as", "more", "less", "all", "most", and
"least". [22] Like INFL and DET, DEG can also give inflection to its
head because it is also the place where comparative -er and
superlative -est are specified.

With his functional head, Abney tries to capture the generalities
between adjective and adverb phrases, claiming that they are the
projections of the same node.  The structure he posits, however, has
an adjective phrase being the only kind of complement and the adverb
phrase as either a subcategory of adjective phrases or in the
specifier position of an otherwise empty structure.  Rothstein argues
against this analysis in her characterization of the degree phrase as
a "minor" functional category. [23]

In this implementation, I accept Abney's analysis of these kinds of
words as quantifiers and classify them as a special kind of adjective
that can be a complement to the DEG phrase.  Consequently, my DEG
phrase has three possible complements: adjective phrases, adverb
phrases, and quantifier phrases.  Each of these phrases is a full
maximal projection which can have prepositional phrases as
complements. [24]

I have also accepted significant portions of Rothstein's position, as
this implementation allows the DEG word to subcategorize for a
particular complement.  The major deviation of my implementation from
her analysis comes from the difficulty in representing a structure
whose character is not determined until the complement is parsed.  She
suggests that depending on the complement of the DEG phrase, it is
called either an ADVP, ADJP or QP.  As a result, the structure of my
degree phrase appears more like that of Abney's, in that it is always
labeled "DEGP," rather than AP or ADVP as Rothstein would prefer.  The
character of the complement is explicit in the structure produced;
however, so no explanatory power is lost.

Part of the problem her argument presents for this implementation
stems from the fact that in suggesting this behavior for her minor
heads, Rothstein must violate one of the most basic tenets of X'
Theory and a notion crucial to the theory's usefulness in this
project: that a head always determines the category of its projection.
She does this to maintain "forward" subcategorization between the head
and the complement, rather than allow "backwards" subcategorization
(cf. footnote 10).

As I discussed previously, the requirements of prediction make
backwards subcategorization inappropriate for this implementation.
Therefore I must agree with Rothstein's analysis that a degree head
chooses its complement which in turn determines the character of the
phrase.  However, this does not necessitate a completely "forward"
subcategorization because the complement that is chosen is often
dependent on the structural position of the degree phrase.  Since the
parser knows what that position is for any given degree phrase, it is
possible to only allow particular complements to occur in particular
places (e.g., ADVP's do not occur as noun modifiers).  In this case,
subcategorization is not completely dependent on the head of the DEG
phrase and so it does not have the same functional importance as the
DET and INFL functional heads.  Thus, while the DEGP analysis may not
conform to the standards of a normal functional head, it is appealing
because it provides a nice way to capture the similarities between the
adjective, adverb, and quantifier phrases.

As part of my implementation of DEGP's and my decision to categorize
quantifiers as special kinds of adjectives, I allow them to occur in
the specifier position of the DEGP.  This accounts for data like the
following:

(35)	a. I have [SPEC-DEGP much] [DEG too] [COMP-DEGP much]
		 work to do.

	b. A [SPEC-DEGP few] [DEG too] [COMP-DEGP many] men
		attended the dance.

	c. A [SPEC-DEGP few] [DEG 0] [COMP-DEGP 0] men attended
		the dance.

	d. [SPEC-DEGP Several] [DEG 0] [COMP-DEGP 0] men attended
		the dance.

These data could also be accounted for if the quantifier were in the
specifier of NP position.  This would also allow a simplified
structure for (35c-d) because there would not be an empty headed
degree phrase:

(36) 	a. A [SPEC-NP few men attended the dance.
	b. [SPEC-NP several] men attended the dance.

Because of this, I have also allowed quantifiers to occur in the
specifier of NP position.  This eliminates unnecessary computation and
solves Abney's problem of having an empty NP specifier (cf. (Abney,
1987, p. 341).  I will discuss NP's in more detail in the following
section.

One final type of phrase that Abney singles out is the "mensural
phrase".  These phrases have a cardinal or ordinal determiner and a
mensural noun [25] as their head.  Examples are:

(37) 	a. six weeks
	b. ten times
	c. a dozen

These phrases are closely related to the head of the DEG phrase when
it is lexicalized, as in

(38)	a. ten times as quickly
	b. six inches too long
	c. a dozen fewer books

Because these DP's have such a specific structure and because they
closely modify the degree word, I have implemented mensural phrases as
"MP's" in a separate network.  They are allowed to occur in the
specifier position of the DEG phrase, meaning that if the DEG is not
lexicalized, the DEGP has an empty head.  As was observed previously,
this is not particularly difficult because the DEG head is often empty
in the case of simple adjectives and adverbs.

Thus, the overall structure that I have implemented as a DEG phrase
(DEGP) is like:

(39) [Diagram here]

This single structure accounts for adjective, adverb, and quantifier
phrases including all of their degree modification.  This
implementation facilitates modifying the grammar because the kinds of
phrases that specify quality, quantity, and description are unified
into one structure.  Degree Phrases can occur as adjuncts to N' in
DP's, as the specifier of PP's, and as adjuncts in VP's.

3.2.4 Adjective Phrases

In addition to this structural account of adjective phrases as
complements of degree phrases, I have implemented a preference for
adjective ordering.  In this way I attempt to describe the scope
particular adjectives have over others and explain why in the data
below (40a) seems "better formed" than any of (40b-f).

(40) 	a. rich white American man
	b. ??white rich American man
	c. ??American rich white man
	d. ??rich American white man
	e. ??white American rich man
	f. ??American white rich man

Based on the work of Quirk and Bache, I have distinguished three types
of adjectives (Quirk, 1985), (Bache, 1978) which occur in a particular
order.  I will discuss the specifics of this implementation in Section
4, as the distinctions are encoded as part of the lexicon entry of the
adjective.  With regard to implementation as Degree Phrases, each
adjective is part of its own degree phrase so that the possibility of
data like the following can be accounted for:

(41)	The [DEGP six feet too [ADJ long]], [DEGP five feet 
		too [ADJ wide] table. [26]

I have implemented this ordering by assigning a number to each type of
adjective.  When an adjective degree phrase is encountered, if its
complement adjective is not of the same number or larger than the
number of any previous adjective degree phrases, the adjective
sequence will be considered ungrammatical and the sentence will be not
parse.

3.2.5 Noun Phrases

The implementation of the Determiner Phrase and Degree phrase I have
described above accounts for many structures normally thought of as
part of noun phrases.  This is a side effect of this version of X'
theory, which considers the noun phrase to be a complement of the
determiner phrase (cf. structure (30)).  Nevertheless, the noun phrase
is still a full maximal projection which has a specifier and
complement of its own.  As expected, the head of the noun phrase is a
noun, and this head can have either prepositional phrases or relative
clauses as complements.  Restrictive relative clauses can also serve
as adjuncts to N, as will be discussed in the section on relative
clauses below (cf. 3.3.5.2). As I mentioned in the discussion of
degree phrases, the specifier position of the NP can hold a
quantifier, but is more often an empty position.  The noun phrase fits
into the overall structure of the determiner phrase like in the tree
below:

(42) [Diagram here]

In the following sections I will discuss the parts of this structure
in more detail.

3.2.5.1 Prepositional Phrases

Prepositional phrases have a straightforward implementation as shown
in the structure below:

(43) [Diagram here]

The specifier position holds a Quantifier Phrase, which as discussed
previously, is implemented as a Degree Phrase with a quantifier
complement.  The head is a preposition and the complement a Determiner
phrase, which itself could contain prepositional phrases.

3.2.5.2 Relative Clauses

Relative clause implementation is a little more tricky.  Relative
clauses are CP's (i.e., a sentence in the X' notation) that are either
introduced by "that" or a wh-pronoun, which is the head of the CP, or
not introduced, in which case the CP has an empty head.  The tricky
part is that the CP is missing a determiner phrase, or other phrase
(e.g., prepositional phrase), usually either in subject or object
position.  This missing phrase is the one containing the noun that the
relative clause is modifying.

My implementation captures this relationship between the relative
clause and the moved phrase by putting a copy of the moved noun head
back into its original position in the relative clause.  This copied
noun serves as a kind of "trace" in the noun's original position and
maintains number and reference in the relative clause. [27] Agreement
occurs between this trace noun just as it would with a normal noun.

The schematic tree structure in (54) shows that, following the
analysis given in (Radford, 1988), relative clauses can appear in two
places in the noun phrase.  This is to account for the differences
seen in the following noun phrases, which are taken from (Radford,
1988, p. 218):

(44)	a. the claim [CP [COMP that] you made a mistake]
	b. *the claim [CP [COMP which] you made a mistake]
	c. *the claim [CP [COMP 0] you made a mistake]
	d. the claim [CP [COMP that] you made]
	e. the claim [CP [COMP which] you made]
	f. the claim [CP [COMP 0] you made]

In this example, the NP's in (44a-c) are "Noun Complement Clauses"
which occur as complements to the noun head.  They require the
complementizer "that" to introduce them and can be introduced by no
other relative pronoun.  Conversely, it is evident in (44d-f) that
these noun phrases are grammatical regardless of what, or if any,
relative pronoun introduces them.  These relative clauses are called
"Restrictive Relative Clauses" and serve only to give extra
information about the noun.  They are therefore in adjunct position in
the overall noun phrase structure.

Thus, a noun phrase with a noun complement relative clause like that
in (44a) will have the structure in (45a) in my implementation:

(44a) 	The claim that you made a mistake.

(45a) [Diagram here]

For a noun phrase with a restrictive relative clause like that in
(56d), my grammar will produce the structure in (45b):

(45b) [Diagram here]

In practice, the grammar will produce both structures for all relative
clauses having the "that" complementizer and other factors like
semantics must be applied to choose the correct interpretation.  For
structures with relative pronouns in them, the grammar will only
produce structures like that in (45b).

The most important thing to note is that to analyze the relative
clauses, this grammar uses the same structure and implementation of a
main clause CP like I discussed in Section 3.3.1.  Therefore anything
that occurs at the level of the main clause, like for example Degree
Phrases, can also be accounted for in relative clauses.  The only
difference is that when the parsing reaches the place in the sentence
where the noun is missing, it puts in the noun copy and continues
through the parse.  This allows for a great economy of structure and
accounts for all possible variation within the constituents of
relative clauses.

3.2.6 Verb Phrases and Complementation

Under the current X' theory interpretation of sentence structure, VP
is the complement of the INFL head.  INFL dictates the main verb's
person, number, and tense but the main verb is still the head of its
own maximal projection.  The relationship between INFL and the verb
phrase is shown below:

(46) [Diagram here]

While syntacticians are not convinced about what structure actually
occurs in the specifier position of the VP, based on the data given
below, I have implemented an optional ADVP (i.e., DEGP with ADVP
complement) in this position.

(47)	a. John [DEGP [DEG 0] [ADVP quickly]] ran down the street.

	b. Jane was [DEGP [DEG so] [ADVP completely]] exhausted
		that he could barely walk.

Adverb degree phrases have also been implemented as adjuncts on V', as
have prepositional phrases, finite clauses, and particle words such as
"up", or "away". [28] These are licensed to occur with particular
kinds of intransitive verbs, since because they are adjuncts, the verb
does not subcategorize for them to be in argument positions.

Recall that the head of the determiner phrase selects for an NP
complement and that the head of the degree phrase selects for an
adjective phrase, adverb phrase, or quantifier phrase complement.  In
the same way, the head of the verb phrase subcategorizes for its
complement.  Here, the range of possible complements is much greater
and when a particular kind of complement can occur depends on the verb
itself.  For example the verb "believe" can be followed by a full
sentence as in

(48) 	John believes Mary is sleeping.

but with the verb "take", this structure is ungrammatical:

(49) 	*John takes Mary is sleeping.

Instead, "take" needs a single noun phrase object and perhaps a
prepositional phrase following it, such as:

(50)	 John takes Mary to the store.

Conversely, the verb "believe" can not have this structure, but can
also have a single noun phrase object such as:

(51)	John believes Mary.

My implementation accounts for the different kinds of verbs and the
different complements they can take with codes in the dictionary entry
for each verb.  The codes are based on the verb pattern codes in the
Oxford Advanced Learners Dictionary (OALD) (Cowie, 1989). The details
of these codes are explained in Section 4, but here I will discuss the
types of complements they can specify. [29]

There are six types of verb complements which occur in various
combinations according to the number of arguments a verb
subcategorizes for.  These are the adjective phrase (DEGP with AP
complement), determiner phrase (DP), prepositional phrase (PP), a
sentential phrase (CP), small clause (SC), and exceptional clause
(EC).

Adjective phrases as complements occur primarily with linking, or
copular, verbs such as

(52) 	a. John is intelligent.
	b. The sky became dark.

Determiner phrases may also occur with copular verbs like in (65a),
but are most common as direct objects such as (65b-c):

(53) 	a. John is a farmer.
	b. The dog eats his food.
	c. The man hit the ball.

Prepositional phrases are usually adjuncts to verb phrase as in (54a),
but they can also occur as objects, as in (54b):

(54) 	a. The man was crying in the living room.
	b. The meeting lasted for two hours.

In (54b) the prepositional phrase is a complement rather than an
adjunct because the sentence "The meeting lasted" is ungrammatical
without it (cf. "The man was crying").  It is evident that the verb
"lasted" requires a complement because of sentences like "The meeting
lasted a week."

Sentential complements such as that in (48) take an entire CP as their
complement even though there is no complementizer introducing the
embedded clause "Mary is sleeping" in (48).  Other examples of this
complement are sentences like

(55) 	a. Jane thought [that Mary would take care of her]. [30]
	b. The man hoped the train would come on schedule.
	c. The man hoped that the train would come on schedule.

The implementation of this is simply to allow a CP to be PUSHed for in
the complement position.  Because it is a full CP, all of the
structures possible in the main clause (i.e., degree phrases, relative
clauses) are also possible in the complement clause.

Small and Exceptional Clauses, only appear in complement positions and
have therefore not been mentioned previously.  They lack elements that
are part of ordinary CP's: for example a small clause does not have
tense because it does not have an INFL node and therefore can not
independently constitute a sentence.

Small Clauses (SC) also do not have a Complementizer node, meaning
that they can not be introduced with words like "that" and can not
serve as relative clauses because there is no structural position for
the relative pronoun.  Instead they are of the form [DP XP] where XP
is any of the other phrasal possibilities (i.e., DP, DEGP, VP, and
PP).  Examples of Small Clause complements, taken from (Radford, 1988,
p. 324), are given below:

(56)	a. I believe [the President incapable of deception.] (DP DEGP)
	b. I consider [John extremely intelligent.] (DP DEGP)
	c. They want [Zola off the team.] (DP PP)
	d. Could you let [the cat into the house.] (DP PP)
	e. Most people find [Syntax a real drag.] (DP DP)
	f. Why not let [everyone go home.] (DP VP)

There is sufficient evidence in the syntax literature showing that
these structures are in fact clauses rather than a sequence of
different complements (cf (Radford, 1988, p. 324-331) and references
there).  I will not go into this here except to stress that there is a
difference between Small clause structures and structures with
multiple objects.  This becomes clear with verbs that allow single
complements versus those that allow more than one.  The Small clause
is a single constituent and therefore can account for one role in the
sentence (i.e., object, direct object, location, etc.).  If a verb
allows more than one complement to account for different roles, as in
a verb that takes both a direct and an indirect object, the small
clause could only fill one of these roles.

I have implemented Small Clauses in a separate network of the form [DP
XP] where the XP can be a DEGP, a DP, a PP, or one of 3 kinds of VP's:
gerundive V-ing forms, participial V-en forms, or infinitival V-0
forms.  The network has the structure:

(57) [Diagram here]

It is possible for the subject DP to be either overt or covert, in
which case I will fill this position with a "TRACE" marker, indicating
that it is lexicalized elsewhere in the sentence. [31]

Exceptional Clauses also differ from ordinary CP's, as they lack the
Complementizer position.  They do have an INFL node, but it must
always contain "to" and therefore requires the verb to have an
infinitival head.  Consequently, their basic structure is of the form:

(58) [Diagram here]

The verbs which normally take EC's as complements are usually
"cognitive" verbs, such as those shown in (Radford, 1988, p. 317):

(59)	a. I believe [the President to be right.]
	b. I've never known [the Prime Minister to lie.]
	c. They reported [the patient to be in great pain].
	d. I consider [my students to be conscientious.]

Exceptional Clauses have been implemented as a separate network of the
form [DP to VP] where the VP is infinitival and the DP can either be
overt or the covert DP called "PRO". [32] The network has the form:

(60) [Diagram here]

As I mentioned previously, when these complements occur depends on the
particular verb in the sentence and the code it has in its lexicon.
The lexicon is therefore crucial to determining the structure of a
sentence. This is predicted in Government and Binding Theory by the
Projection Principle, which states that "representations at each
syntactic level are projected from the lexicon, in that they observe
the subcategorization properties of lexical items. (Sells, 1985)."

I will discuss this in more detail in Section 5, but would like to
stress that this "lexical determinism" should in no way be considered
a problem of the implementation.  It does make using the grammar for
alternate computational applications somewhat demanding on the
computational environment because the lexical entries must be tailored
as described in Section 4.  This is not unexpected, because it is
exactly this type of dependence that the linguistic theory
predicts. [33]

3.3 Example Parse Trees

A bountiful selection of parse trees showing the structures this
grammar produces is available in (Van Dyke, 1991c).

4 LINGUISTIC IMPLICATIONS	

4.1 Government and Binding Syntax

Here I would like to characterize this implementation with respect to
Government and Binding syntax, whose formulation of X' theory I have
adopted in the grammar described here.  I explained in Section 3 that
I have made use of the functional categories INFL, COMP, DET, and DEG.
The existence of these abstract categories is what distinguishes the
GB interpretation of X' theory from that of other syntax theories such
as Generalized Phrase Structure Grammar and Lexical-Functional
Grammar.  I chose to adopt these abstract, functional categories
because they facilitated applying the X' structure template to all
instantiations of X.  This allowed subcategorization to be used for
explaining not only the relationship between verbs and complements,
but also that between determiners and head nouns.  Using the DEG
functional head enabled capturing the similarities between adjective
and adverb phrases.  But by far the strongest reason for adopting the
DET and DEG functional heads, is because through them I was able to
develop a structure to account for the various types of constituents
that occur before the head determiner or between the determiner and
the noun.  The branching structure the functional heads provide
allowed me to eliminate ordering tests on the grammar arcs; tests
which would have been necessary to ensure well-formed word sequences.
For example, without a structural position at the beginning of the
determiner phrase (i.e., the specifier position), in order to account
for the sequence "all the many men" I would have needed a looping
determiner category arc.  Instead I can implement a series of arcs
with different category and feature requirements motivated by an
overall grammatical theory.  This makes the implementation something
more than an ad hoc solution to the problem.

It was also important to be able to apply the X' template to any
position in a sentence, including the sentence level itself, because
this facilitated producing a complete grammar.  It was therefore
possible to confront a common objection to using grammars in
augmentative communication: that complete ones are difficult to
construct.  A complete grammar is crucial for communication devices
because a user will be using the device to produce normal, everyday
language.  Consequently they must be able to produce all of the
syntactic structures that a human language user could think of
constructing.  The X' template eliminates this difficulty because it
gives a standard structure that underlies all syntactic structures.
The problem is reduced to providing structures in the grammar and
using subcategorization to eliminate those that are inappropriate for
particular lexical items.

Borrowing these concepts from GB, the system performs a number of
functions in the way GB predicts.  It accepts a surface structure and
undoes the transformations that show up there so that the structure it
produces is akin to the sentence's deep structure.  Movement occurs
from argument positions and is only allowed to land at appropriate
landing sites.  Landing sites can be easily determined with this
formalism because the process of undoing a transformation must be
explicitly invoked (i.e., a hold action is performed in a grammar
arc).  In this way, the grammar controls for what constituents can
move and to where: if a constituent is encountered that is not in an
appropriate landing site, then the parser will be unable to complete a
parse for that sentence.  In this way, Government and Binding theory's
constraints on NP and WH-movement are obeyed, even though they are not
overtly implemented as such (i.e., there is no instance in the grammar
or processing when I invoke some procedure called "NP-movement.") [34]

But even with these GB movement characteristics, a GB motivated X'
structure, and adherence to subcategorization in the way that GB's
over-arching Projection Principle suggests, the structures this
grammar produces are often not those that GB theory would
predict. [35]

For example, this grammar analyzes a relative clause as the result of
a movement of a DP out of an embedded clause and into a higher
position in the tree.  This analysis is significantly different from
the Government and Binding theory analysis, which posits that the
position seen in surface structure is the position where the noun
phrase originated, or was "base generated."

To represent the movement analysis, the grammar restores the "moved
element" to its original position in the sentence while at the same
time leaving a copy of that element in the position where it was
found. [36] This has the strange side-effect of generating two
occurrences of the moved item in the deep structure of the
sentence. This action is crucial for the prediction to be
effective. Consider, for example, a sentence where the subject of the
relative clause is the subject of the main sentence:

(1) The man who walks the dog was late today.

When the predictor has the partial sentence "The man who", if the
moved noun phrase "the man" is not put back into the position
following "who", the predictor will not be able to eliminate the
sentence

(2) *The man who walk the dog was late today.

The predictor will not know what kind of inflection the verb of the
relative clause must have.

This problem motivates the necessity of a deep structure with the
form:

(3) The man [who the man walks the dog] was late today.

It is necessary for both occurrences of "the man" to be in the
sentence in order for that clause's subject-verb agreement to be
checked. [37]

Conversely, there are some cases of movement where the deep structure
produced by my grammar is faithful to the GB analysis.  Consider, for
example the structure produced for control sentences like the
following:

(4) 	a. John expected Mary to wash the dishes.
	b. John expected to wash the dishes.

The GB analysis would predict that the real surface structure of these
sentences is like:

(5)	a. John expected [Mary to wash the dishes].
	b. John expected [PRO to wash the dishes.]

Here, PRO is an empty category that refers back to John.  The
structure that my grammar will produce is exactly that in (5a-b).  The
parser neither performed or undid any movement to derive the
structure; however, contrary to the GB analysis that "Mary" is the
object of "expected" because the NP moves in order to satisfy the
Theta Criterion. [38]

The grammar developed here can therefore be characterized as one that
borrows significantly from GB syntax, but is not a complete
representation of the Government and Binding Theory of grammar.  This
stems from the fact that GB is a descriptive theory of grammar.  Its
definitions of C-command, government, and the empty category principle
are theoretical definitions used to describe relationships between
words or within syntax trees.  The relationships must hold for a
sentence to be grammatical: they are a way of describing what has gone
wrong in an ungrammatical sentence's derivation.

It became clear in this project that GB is not well suited for parsing
or computational implementation.  As a theoretical framework, its
practitioners gives little emphasis to the details of grammar
structures (i.e., what gets attached where).  Those structures that
are analyzed in detail tend to be only the anomalous or "interesting"
ones that test the limits of GB principles.  The result is that there
is no standard interpretation of X' theory attachments or about the
interpretations of particular kinds of complements.  For example, it
is important for a computational implementation to know where
adjectives and adverbs can be attached in the structure or what kinds
of constituents can appear in the Specifier positions of all
instantiations of X, but these are topics that have received little
treatment from the theorists.

Nevertheless, more than any other grammatical theory, Government and
Binding syntax was easily adaptable to the requirements of the ATN
computational formalism.The formulation of X' theory found in GB
exploits the generality of the XP structural template to the
fullest. [39] In addition, it allows minimal changes to a generic
lexicon of English (i.e., one that includes syntactic categories and
little more than perhaps number and agreement information) and this is
most often all the information that computational systems have access
to.  In contrast, grammatical theories like Generalized Phrase
Structure Grammar and Lexical-Functional grammar exploit syntactic
features and complicated coding systems provided in the lexical entry
of each word.  Since subcategorization is the only real idiosyncrasy
of a GB grammar, it is easy to integrate a GB-based grammar into other
already existing systems, such as the flexible abbreviation system
discussed in Section 1.

4.2 Human Language Parsing

I have claimed that the system presented here offers a more "natural"
solution to the communication problem facing disable users.  However,
my purpose in building this grammar and basing this system on a
linguistic theory is not to make claims about what our natural grammar
or syntax rules might look like.  The goal of this grammar is to
direct the syntactic predictor so that a person using it will produce
only grammatical sentences.  I do not suggest that the sentence
parsing this grammar facilitates is a model for what goes on in the
user's head, only that the two procedures are exploiting the same
kinds of regularities in language.  The fact that the predictor can
use simple characteristics about words rather than contrived
statistics is what makes this a natural parsing solution.

The distinction between what a grammar tells about human language
parsing and the parsing process itself has been discussed by Roger
Berwick and Amy Weinberg as the "Type Transparency Hypothesis"
(Berwick & Weinberg, 1983).  They question to what extent a
computational grammar of English can perform sentence parsing the way
the theory of grammar predicts; in other words, whether or not the
grammar theory is equivalent, or transparent, to the method of
parsing.  Assuming for this discussion that GB makes particular claims
about the parsing process, with my grammar and parsing implementation,
I have not preserved a transparency between these two components.
Rather than explicitly implementing Government and Binding Theory
notions like Government, C-command, and the Empty category principle,
I have used these principles to guide the construction of the grammar.
This means that while the parser does not explicitly check for the
relationships these principles denote, they are implicitly at work
within it because of the way the grammar has been constructed.

For example, government is a relationship that describes what
constituents a head can determine (or influence): it defines the scope
of the head. [40] In most cases, government amounts to the sister
relationship that holds between the head and its complement.  Among
others, the concept of subcategorization is said to occur under the
relationship of government.  This is exactly the case in my
implementation: a verb or determiner subcategorizes for only its
complement because that is the only position it governs.

Government and Binding theory is an excellent guide for constructing a
grammar, but I have found in this project that it is insufficient to
describe the predictable qualities in language.  This explains the
deviations from GB-predicted structures, an example being my analysis
of relative clauses, that occur in my grammar.  It also explains how
my grammar is licensed to produce a deep structure for control
sentences like those GB predicts without adhering to the method that
GB claims brings them about.  These results come from a pragmatic
usage of grammar theory to attack a real world problem.  Willingness
to reject the Transparency Hypothesis as I have done here, and as
Berwick and Weinberg have argued must be done, has brought about a
simple and efficient solution with potential for widespread use in
natural language understanding systems.

5 OTHER APPLICATIONS

In Section 1, I described this project in relation to its application
in Augmentative Communication.  I described its usefulness for
improving a flexible abbreviation system and as a syntax module for
prediction systems in general.  Other uses within Augmentative
Communication can be found because the system does not prohibit using
statistical information in addition to the syntax it exploits.  For
example, statistics could be used to rank predicted categories: the
next word in the partial sentence "the gold" might have a higher
probability of being a noun than a verb.

Significantly, this work also has application outside the field of
Augmentative Communication.  A speech recognition system addressing
issues similar to those I have described here has been developed at
Carnegie-Mellon University (CMU) (Hauptmann, et al., 1988)).  The
ANGEL speech recognition system shares my goal of applying linguistic
knowledge to solving problems in language processing: in this case
analyzing speech input so that speech can control a machine's actions.
The problem the CMU researchers must overcome is that analyzing speech
input is a computationally difficult and costly task. Initial
solutions are reminiscent of the flexible abbreviation expansion I
have discussed previously.  For example, the CMU's ANGEL speech
recognition system tries to solve this task by generating several
hundred word candidates for every word actually spoken. Researchers
are currently working to efficiently reduce the number of these
possibilities by applying linguistic constraints as early as possible.
To that end they have developed the MINDS system, a Multi-modal,
INteractive Dialog System (Young, et al., 1989), (Hauptmann, et al.,
1988).

MINDS tries to use knowledge gained from studies of discourse,
especially notions of focus, user goals, and dialog structure to
reduce the computer's search space for determining what speech
patterns could mean.  The MINDS system uses discourse where my project
uses syntax, but both systems attempt to predict what the user will
talk about next.  My project uses prediction to reduce the searching
required by a disabled user when trying to identify the word he or she
wants to use.  Using prediction in this way increases the
communication rate the user can achieve while communicating with his
or her AAC device.  Similarly, the MINDS system is able to improve
speech recognition by reducing the searching required by the machine
to identify the word it has "heard".  This allows the machine's speech
processing rate to increase.

In addition, the MINDS project comes from a background similar to the
one found in Augmentative Communication.  Until MINDS, speech
recognition was done with statistics of word frequencies and
collocations.  These were based on sequences of two or three words,
called Markov models.  These same Markov models were used in previous
AAC prediction models and the speech recognition systems suffered from
the same problems found in those systems: the two and three word
transition tables give limited success because their look-ahead is too
small and so they erroneously eliminate interpretations that turn out
to be correct.  Also, they are dependent on word frequencies gathered
from relatively small amounts of data and so they may not be accurate.

MINDS runs primarily on semantics, or concepts, that its
discourse-tracking capability identifies.  It combines these concepts
with "a set of syntactic networks" to derive possible sentence
structures for the concepts.  This means the only syntax done is to
determine the lexical realizations of the concepts; the syntax is not
comparable to that of natural language users.  Consider, for example,
that within their Navy ship knowledge base, that the frigate "Spark"
has been established as being disabled.  MINDS predicts the user will
ask about the Spark's capabilities next.  The semantic concepts for
the dialog exchange are identified as follows: "shipname" is
restricted to the value "Spark", and any "ship-capabilities" concepts.

They then expand these concepts into the syntactic realizations of
ways to refer to the Spark- they allow "the ship", "this ship", "the
ship's", "it", "its", "Spark" and "Spark's".  The notion of
"ship-capabilities" generates the syntactic realizations of "all
capabilities", "radar", "sonar", "Harpoon", "Phalanx", etc.  They then
combine these to generate a highly constrained search space of phrases
like "Does it/Spark/this ship/ the ship have
Phalanx/Harpoon/radar/sonar?" or "What capabilities/radar/sonar does
the ship/this ship/it/Spark have?".

This works well in their constrained environment, but in real-world,
unconstrained speech recognition, this type of syntactic generation
would be impossible as there could easily be an infinite number of
lexicalizations.  If the system could use a syntactic prediction
system like the one I have outlined in conjunction with the discourse
and focus information, then the recognition could be increased without
depending on a restricted domain.

It is not clear what role syntax plays in the MINDS system because
they are mainly concerned with issues at a higher level of language
processing (i.e., discourse and focus).  Nevertheless, it would seem
that when trying to recognize individual spoken words that the system
would benefit from some syntactic prediction that could give
information about the structure of the partial sentence and use this
to predict the category of the next word.  This would limit the search
space for speech recognition in the same way it does for abbreviation
expansion.  Given that the motivation for the speech recognition
problem is so similar to that of the project I have described here, it
is likely that syntactic prediction could be successfully applied to
this field of research.

6 FUTURE WORK

Here I have described my work aimed at making augmentative
communication devices more efficient and "usable" for the disabled
user.  This work has focused on how syntax can be used to eliminate
the possible expansions of a creative abbreviation entered at
run-time.  Using a parallel parsing strategy, I have found it possible
to reduce the effort required of the user because he or she is offered
only the grammatically appropriate words as abbreviation expansions. A
pleasant result of this is that the user is less likely to be confused
by the words the computer offers as choices since they are always
syntactically relevant to the situation.  Other ways that the list of
possibilities can be reduced and relevancy be maintained could come
from applying other kinds of linguistic knowledge of the sort humans
use to understand language.  For example, discourse tracking is a kind
of pragmatics that could be used to give the system knowledge like
"since we have been talking about eating breakfast, it is probably the
case that "tbl" stands for "table" and not "tablet".  Semantics could
also be used to reflect the fact that if the user has used the verb
"drink" then we expect the following NP to be some inanimate,
consumable object rather than a person's name or things like "table".
This sort of information would add to the power that syntactic
prediction gives the system and eventually the user will have an
extremely small and precise set of words to choose from.

More work could also be done at the syntax level, in the form of
adding to the kinds of structures the grammar is able to handle.  For
example, currently the grammar can not handle coordination, ellipsis,
or topicalization; all of which are reasonably common in spoken
language.  The appendix at the end of this work includes a test suite
that demonstrates the coverage of the grammar as it stands at this
writing.  From this it is easy to see where additions to the grammar
could be made.

I feel that the present implementation could also be improved through
a more critical analysis of the Degree phrase, especially regarding
the lexicalizations of the degree head and the relationship of the
head to the elements it subcategorizes for.  In particular, it would
be useful to re-analyze the status of the quantifier phrase and
constituents that can occur in specifier position.  In (Ernst, 1991),
a positional interpretation of these items is given that may allow a
more exact specification of word order in the pre-head noun positions
of DP's.  The task will be to find an explanation for these kinds of
phrases that does not sacrifice capturing the generalities between
them (cf. Section 3)

Finally, in order for this system to be most useful, it must be
implemented in conjunction with a large dictionary that includes the
subcategorization codes it requires.  Suggestions for carrying out
this process are mentioned in (Van Dyke, 1991b), but most important
will be to automate the process of assigning the subcategorization
codes.  I have mentioned previously that this is facilitated by
working with learner's-type dictionaries like Longman's or Oxford's
which exist in computerized form.  I have provided the starting point
for this by including references to the Brown corpus tags and explicit
descriptions of the requirements for assigning a particular code to a
word.  On this basis, the task of generating a large dictionary for
the system should prove easy to overcome.

7 CONCLUSION

This thesis represents a successful application of linguistic
information to the problem of augmentative communication. A syntactic
predictor has been built which relies on a syntactic grammar of
English to speed the communication rate possible with an AAC device.
Because the system draws on the same rules for creating a sentence
that the disabled user exercises as he or she forms sentences, the
computer is able to intelligently anticipate the word-form the user
will type next.  This technology is a first step toward endowing the
computer with the ability to disambiguate language in order to achieve
understanding.

Instrumental to the success of this system is how well the grammar it
exploits has captured the generalities of the language.  Through
adopting a Government and Binding theory of English syntax, I have
provided for a significant number of constructs, including relative
clauses, yes-no and wh-questions, passives, and both matrix and
embedded sentences with 39 different types of verb complements.  The
use of X' theory has also allowed my grammar to be uncomplicated and
therefore amenable to additions.  I believe that with this grammar, I
have developed a strong base to which other constructions could easily
be added.  This makes the grammar highly applicable to many research
problems, including analyses of English and modeling human language
use in a machine.  Thus, not only have I devised an enhancement for
disabled users' communication, but I have proceeded toward a more
complete computational model of language.

CITED BIBLIOGRAPHY

Abney, S. (1987). The English Noun Phrase in its Sentential
Aspect. Ph.D. dissertation, MIT.

Allen, J. (1987). Natural Language Understanding. CA:
Benjamin/Commings.

American Heritage Dictionary, Revised Second College
Edition. (1976). Boston: Houghton Mifflin Company.

Baker, B. R., & Stuart, S. (1985). Communication Mapping for Semantic
Compaction Systems. Proceedings of the 8th Annual Conference on
Rehabilitation Technology, Memphis, TN: RESNA, 122-124.

Bache, C. (1978). The Order of Premodifying Adjectives in Present-Day
English. Odense University Studies in English. vol. 3.

Bates, M. (1978). The Theory and Practice of Augmented Transition
Network Grammars. In L. Bloc (ed.), Natural Language Communication
with Computers. New York: Springer.

Berwick, R.C. (1981). Computational Complexity and Lexical Functional
Grammar. Proceedings of the 19th Annual Meeting of the Association for
Computational Linguistics, Stanford, CA: ACL,7-12.

Berwick, R.C. & Weinberg, A. (1983). The Role of Grammars in Models of
Language Use. Cognition, vol. 13, 1-61.

Chomsky, N. (1986). Barriers. Cambridge, MA: MIT Press.

Cowie, A.P. (1989). Oxford Advanced Learner's Dictionary of Current
English, Fourth Edition. Oxford: Oxford University Press.

Demasco, P.W., Lillard, M.,& McCoy, K.F. (1989). Word Compansion:
Allowing Dynamic Word Abbreviations. Proceedings of the 12th Annual
Conference on Rehabilitation Technology, New Orleans, LA: RESNA,
282-283.

Ernst, T. (1990). A Phrase Structure Theory for Tertiaries. In
S. Rothstein, ed., Perspectives on Phrase Structure: Heads and
Licensing. Syntax and Semantics 26, New York: Academic Press.

Francis.W. & Kucera, H. (1982). Frequency Analysis of English Usage:
Lexicon and Grammar. Boston: Houghton Mifflin Company.

Frazier, L. (1991). Parsing Novel Words. Presented at Cognitive
Science Colloquium, May 13, 1991. University of Delaware.

Foulds, R.A. (1980). Communication rates for non-speech expression as
a function of manual tasks and linguistic constraints. Proceedings of
the International Conference on Rehabilitation Engineering, Toronto:
RESNA, 83-87.

Foulds, R.A., Baletsa, G., Crochetiere, W.J., & Meyer, C. (1976). The
Tufts Non-vocal Communication Program. Presented at the Conference on
Medical Devices in Rehabilitation. Boston.

Garside, R., Leech, G., & Sampson, G., eds. (1987). The Computational
Analysis of English. London: Longman.

Griffith, H.W. (1985) Guide to Symptoms, Illness, and Surgery. Tucson,
AZ: Body Press.

Hauptmann, A.G., Young, S.R., & Ward, W.H. (1988). Using Dialog-Level
Knowledge Sources to Improve Speech Recognition. Proceedings of the
7th National Conference on Artificial Intelligence, Saint Paul, MN:
AAAI, 729-733.

Jackendoff, R. (1977). X Syntax. Cambridge, MA: MIT Press.

Kaplan, R. & Bresnan, J. (1981) Lexical-functional Grammar: A Formal
System for Grammatical Representation. In Bresnan, ed., The Mental
Representation of Grammatical Relations. Cambridge, MA: MIT Press.

Keulen, F. (1986): The Dutch Computer Corpus Pilot
Project. M.A. Thesis, University of Nijmegen.

Marcus, M.P., Santorini, B., & Magerman, D. (1990). First Steps Toward
an Annotated Databaseq of American English. Department of Computer and
Information Science, Technical Report MS-CIS-90-46.  Philadelphia, PA,
University of Pennsylvania.

McCoy, K.F., Demasco, P., Jones, M., Pennington, C., & Rowe,
C. (1990). A Domain Independent Semantic Parser for
Compansion. Proceedings of the 13th Annual Conference on
Rehabilitation Technology, Washington, D.C.: RESNA, 187-188.

Miller, L.J., Demasco, P.W., & Elkins, R.A. (1990). Automatic Data
Collection and Analysis in an Augmentative Communication
System. Proceedings of the 13th Annual Conference on Rehabilitation
Technology, Washington, D.C.: RESNA, 99-100.

Quirk, R., et al. (1985). A Comprehensive Grammar of the English
Language. London: Longman.

Radford, A. (1988). Transformational Grammar. Cambridge: Cambridge
University Press.

Rothstein, S. (1991). Heads, Projections, and Category
Determination. To Appear in Kathleen Leffel and Denis Buchard (eds.),
Anthology of Phrase Structure Theory (Tentative title). Dordrecht:
Kluwer.

Sells, P. (1985). Lectures on Contemporary Syntactic Theories. CSLI
Lecture Notes, no. 3.

Stum, G., Demasco, P.W., McCoy, K.F. (1991). Automatic Abbreviation
Generation. Forthcoming, RESNA.

Swiffin, A. L., Arnott, J.L., & Newell, A.F. (1987). The use of syntax
in a predictive communication aid for the physically
handicapped. Proceedings of the 10th Annual Conference on
Rehabilitation Technology, San Jose, CA: RESNA, 124-126.

Van Dyke, J. (1991a). Word Prediction for Disabled Users: Applying
Natural Language Processing to Enhance Communication.  Honors BA
Thesis, University of Delaware.

Van Dyke, J. (1991b).  Tagging Guide for the X' Theory Grammar.
Technical Report. Center for Applied Science and Engineering,
A.I. DuPont Institute.

Van Dyke, J. (1991c).  An Annotated Test Suite for the X' Theory
Grammar.  Technical Report.  Center for Applied Science and
Engineering, A.I. DuPont Institute.

Wehrli, E. (1988). Parsing with a GB Grammar. In U. Reyle & C. Rohrer,
eds., Natural Language Parsing and Linguistic Theories. Dordrecht:
Kluwer.

Woods, W.A. (1969). Augmented Transition Networks for Natural Language
Analysis. Harvard Computation Laboratory Report No. CS-1, Cambridge,
MA: Harvard University.

Yang, G., McCoy, K., Demasco, P. (1990). Word Prediction Using a
Systemic Tree Adjoining Grammar. Proceedings of the 13th Annual
Conference on Rehabilitation Technology, Washington, D.C.: RESNA,
185-186.

Young, S.R., Hauptmann, A.G., Ward, W.H., Smith, E.T., Werner,
P. (1989). High Level Knowledge Sources in Usable Speech Recognition
Systems. Communications of the ACM, vol. 32, no. 2, 183-193.

Zagona, K. (1988). Verb Phrase Syntax. Dordrecht, Holland: Kluwer.

REFERENCE BIBLIOGRAPHY

Baumgart, D., Johnson, J., & Helmstetter, E. (1990). Augmentative and
Alternative Communication Systems for Persons with Moderate and Severe
Disabilities. Baltimore: Brookes.

Berwick, R. & Weinberg, A. (1983). Syntactic Constraints and Efficient
Parsability. Proceedings of the 21st Annual Meeting of the Association
for Computational Linguistics, Cambridge, MA: ACL, 119-122.

Berwick, R. (1985). The Acquisition of Syntactic Knowledge. Cambridge,
MA: MIT Press.

Borer, H. (1990). V+ing: It Walks like an Adjective, It Talks like an
Adjective. Linguistic Inquiry, vol. 21, no.1, 95-103.

Bowers, J.S. (1981) Theory of Grammatical Relations. Ithaca: Cornell
University Press.

Bresnan, J.W. (1979). Theory of Complementation in English Syntax. New
York: Garland Publishing.

Dowty, D.R., Karttunen, L. & Zwicky, A.M. (1985). Natural Language
Parsing. Cambridge: Cambridge University Press.

Emonds, J.E. (1985). A Unified Theory of Syntactic
Categories. Dordrecht, Holland: Foris Publications.

Ernst, T. (1991). The Phrase Structure of English
Negation. Unpublished manuscript, University of Delaware.

Grimshaw, J. (1982). Subcategorization and Grammatical
Relations. Subjects and Other Subjects: Proceedings of the Harvard
Conference on the Representation of Grammatical
Relations. Bloomington: IULC, 35-56.

Hawkins, J. (1990). A Parsing Theory of Word Order
Universals. Linguistic Inquiry, vol. 21, no. 2, 223-261.

Hoekstra, T., van der Hulst, H., & Moortgat, M., eds. (1981). Lexical
Grammar. Dordrecht: Foris.

Hornby, A.S. (1975). Guide to Patterns and Usage in English. London:
Oxford University Press.

Hudson, R. (1984). Word Grammar. Oxford: Basil Blackwell.

Jackendoff, R. (1990). On Larson's Treatment of the Double Object
Construction. Linguistic Inquiry, vol. 21, no. 3, 427-455.

Kimball, J. (1973). Seven Principles of Surface Structure Parsing in
Natural Language. Cognition, vol. 2, 15-47.

Lasnik, H. & Uriagereka, J. (1988). A Course in GB Syntax: Lectures on
Binding and Empty Categories. Cambridge, MA: MIT Press.

Li, Y. (1991). X0 Binding and Verb Incorporation. Linguistic Inquiry,
vol. 21, no. 3, 399-426.

Melcuk, I.A. (1988). Dependency Syntax. Albany, NY: SUNY Press.

McCawley, J.D. (1981). The Syntax and Semantics of English Relative
Clauses. Lingua 53, 99-149.

Musselwhite, C.R. & St. Louis, K.W. (1988). Communication Programming
for Persons with Severe Handicaps. Boston, MA: College-Hill.

Rothstein, S. (1985). Syntactic Forms of Predication. Bloomington:
IULC

Sager, Naomi. (1981). Natural Language Information Processing: A
Computer Grammar of English and Its Applications. London:
Addison-Wesley.

Siegel, M. (1980). Capturing the Adjective. New York: Garland
Publishing.

Speas, M.J. (1990). Phrase Structure in Natural Language. Dordrecht,
Holland: Kluwer Academic Publishers.

Tennant, H.R., Ross, K.M., Saenz, R.M., Thompson, C.W., & Miller,
J.R. (1983). Menu-based Natural Language Understanding. Proceedings of
the 21st Annual Conference of the Association for Computational
Linguistics, Cambridge, MA: ACL, 151-158.

ENDNOTES

[1] This is based on knowing 3 prior characters so the statistics
would take the form of quadgrams such as "stri".

[2] For this example I am assuming noun-noun modification is not
allowed.

[3] I am describing a top-down method because the computational
formalism I am using in this project as well as the predictor I have
constructed from this formalism work in a top-down way.  It is also
possible to traverse the search space in a number of other ways,
including a bottom-up method.

[4] It would be just as easy for the prediction to be carried out in a
more general way (i.e., to produce a grammatical specification of the
next word) but my desire at this point is to show how this solution
can be easily applied to present problems in AAC.

[5] In this formulation, I am assuming that a maximal projection can
only have one double-bar level.

[6] In structure (2) "Specifier" and "Complement" are not syntactic
categories or constituents.  They are only labels describing the
function of the constituents that hold these positions in the
structure.

[7] For arguments on the syntactic reality of the X' level, cf. (Van
Dyke, 1991a) and references there.

[8] Presently I will show that the subject occupies the specifier
position when the X' template is ap plied at the sentence level.

[9] Chomsky's Maximality principal described in (Chomsky, 1986) says
that all non-head elements in a maximal projection (i.e., instance of
the X' template) must themselves be maximal projections.  This view is
accepted by Susan Rothstein and other syntacticians working in
Chomsky's tradition.

[10] It is argued in (Ernst, 1990) that the head can also determine
what is in the specifier position, meaning that it is able to do what
I will call "backwards subcategorization."  This suggestion falls out
from the fact that Ernst is not accepting the DP analysis of the noun
phrase (discussed below) but still needs to account for the agreement
facts that raised the argument for the DET functional head in the
first place.  As will become clear in what follows, I have accepted
the DP analysis of the noun phrase and consequently, I explain the
agreement as the DET subcategorizing (forward) for the noun phrase.
This is necessary to do prediction, because it would not be useful to
have to wait for the head to be entered before being able to determine
if what has already been parsed in the specifier position is properly
licensed.  Instead, I specify particular entities that can exist in
the specifier position based on knowing the instantiation of XP.  For
example, if the XP is a NP I know that only certain elements can occur
as prenominal modifiers and these are coded into the grammar

[11] In the sentence "John cries for Mary" the "for Mary" would be an
adjunct preposition, rather than a complement.

[12] The phrasal head selects its complement, but it does not select
for adjuncts.  They are present only as "extra" information in the
sentence.

[13] Here I am alluding to the fact that they have theta-grids and
that all theta roles must be dis charged.  This comes from GB's
conception that syntax is a projection of lexical properties and so
each head gets exactly the number of arguments that is specified for
it in the lexicon.  The "Theta Criterion" of GB ensures that heads and
their arguments are in proper distribution.

[14] Conjunctions are also said to be minor functional heads.
Although they are not implemented here, they could be done in a way
similar to the degree phrase.

[15] This explanation recalls Chomsky's Maximality constraint; however
as I mentioned previously, I am not adopting this because of my
position that specifiers can also be lexical items.  Hence, the
statement about heads being the only non-PUSHed element must be
qualified in the case of a spec ifier.  Sometimes specifiers are
maximal projections, and therefore PUSHed constituents, and sometimes
they are unprojected words.  Not requiring specifiers to be maximal
projections is com putationally preferable because when the specifier
is a single word unnecessary PUSHes do not need to be done only for
the sake of the Maximality Constraint.

[16] GB does hold that "have" and "be" can appear in INFL at surface
structure if there is no modal in INFL.  This is the result of "have"
or "be" raising into the INFL position from their original po sition
as part of a complex VP.  It is therefore not unheard of for these
words to appear in INFL; the major deviation in my implementation is
that more than one of them can appear in INFL.

[17] Prima facie it seems that an indication that "hit" can occur with
all forms except 3SGPRESENT might be a better way to explain its
distribution.  The lexicon is set up so that the actual lexical en try
allows for short cuts like this; however it is useful to be able to
specify which exact combina tions a word can occur with- especially in
the case of the personal pronouns.  This implementation seems to
facilitate handling all agreement, even though some realism is lost
through positing these "agreement codes".  Quid pro quo.

[18] This structure would look like: [Diagram here] where the first
branching verb, "have" in this case, would move into INFL position
from its posi tion shown here. (Zagona, 1988)

[19] All sentences are CP's, but because top level sentences rarely
have a lexicalized Complemen tizer, they are sometimes referred to
simply as IP's

[20] Further arguments for the DET functional head can be found in
(Van Dyke, 1991a) and refer ences there.

[21] This is a significant point of departure from Abney's discussion
of the structure of the DP.  His makes the adjective a complement to
DP, a move motivated by his opinion that a structure where an X'
expands into an X' is undesirable.  I have adopted this very structure
based on the arguments of (Radford, 1987, p. 179-196).  Consequently,
I am inclined towards positing that the adjective phrase is in adjunct
position, attached to an X'.

[22] Abney explains that the head being empty is not problematic since
the same thing happens to DET when there is no overt determiner in
noun phrases.

[23] For a discussion of these two views and my position regarding
them, cf. (Van Dyke, 1991a).

[24] The implementation of prepositional phrases as complements of
ADJP, ADVP, or QP is motivated by the discussion in (Radford, 1987,
p. 241-246).  This discussion does not conceive of these phrases as
part of DEGP and therefore represents the overall structure of the
ADJP, ADVP, and QP differently from what I have describing.
Nevertheless, I have found no other explanation for what can serve as
constituents in an ADJP, ADVP, or QP and so I have adopted the portion
of Radford's analysis that is appropriate.  As the implementation
stands, the specifier and adjunct positions of these phrases are
always empty.

[25] A noun which specifies a countable unit, such as "dozen",
"bushel", "bundle", "feet".

[26] The grammar allows no punctuation so that a phrase like "six feet
too long five feet too wide" may actually occur.

[27] The noun head rather than the entire DP is sufficient as a trace
because the noun is all that is necessary to maintain number and
reference. It is used because it is the most accessible structure at
the point in the parse where relative clauses have been encountered
(i.e., a DP has not been com pleted because the relative clause is
part of it and so a full DP is not available to move back to its
original position.).  The use of the word "trace" here is not equated
with any of the traces in GB (i.e., NP-trace or WH-trace), and is only
meant to recall that notion.

[28] Particles are taken to be bare adverbs and therefore do not
adhere to Chomsky's Maximality constraint in the same way that my
implementation of specifiers does not.  Refer to footnote 15 for a
further discussion of this.

[29] Example sentence parses for each verb code can be found in (Van
Dyke, 1991c).

[30] According to GB, this example shows that "that Mary would take
care of her" is a clause sepa rate from the main clause because "her"
is a pronoun and not a reflexive such as in the ungrammat ical
sentence "*Janei thought that Mary would take care of herselfi.

[31] This analysis accounts for the structure derived by subject
raising even though the actual pro cess of raising is not implemented.
This structure is only possible with verbs having the code CNI.  See
(Van Dyke, 1991b) for details.

[32] "Big PRO," as PRO is called, has a specific meaning and
distribution in GB.  This meaning is not pertinent to this project
except to say that it is possible to have a PRO subject in an EC
because the verb is always infinitival.

[33] I would also like to note that of the three syntactic theories
that use X' Theory (i.e., Generalized Phrase Structure Grammar (GPSG),
Lexical-Functional Grammar (LFG), and Government and Binding Theory
(GB)), GB demands the least amount of information of its lexicon.
This is largely because it uses functional heads to build up structure
according to X' theory.

[34] This is a consequence of the fact that I am using a rule-based
parser rather than a principle-based GB parser such as that described
in (Wehrli, 1988).  The principle-based parser works with a
base-generated structure and explicitly applies the GB principles such
as Binding, Theta-Criterion, Government, and the Empty Category
Principle.  In comparison, a rule-based parser focuses on sur face
structure, and applies pre-determined rules to assign a structure to
the input sentence.

[35] The Projection Principle, which applies at all levels of
syntactic analysis (i.e., deep structure, surface structure, phonetic
form, and logical form) was originally given by Chomsky in his Lec
tures in Government and Binding, 1981.  The original formulation is
given here, taken from (Sells, 1985): Representations at each
syntactic level are projected from the lexicon, in that they observe
the subcategorization properties of lexical items.

[36] This occurs only with relative clauses and wh-questions.  The
movement done to analyze pas sive sentences is simply an exchange of
argument positions.  The trace DP's are generated only for clauses
where there is a "hole" in the surface structure.

[37] Recall from section 2 that in practice, only the noun "man" is
replaced into the original posi tion.  The head noun of the determiner
phrase holds all the agreement information necessary to cor rectly
analyze the sentence and is therefore the only part of the moved
constituent that must be maintained.

[38] Notice that it is clear that "Mary" serves the object role in the
sentence because if the name were to be replaced with the female
pronoun, it would be the accusative pronoun "her" rather than a
nominative "she".  The Theta Criterion explains that verbs have
particular theta roles which must always be discharged.  The verb
"expect"" requires an object, so it discharges that role by causing
the subject of the embedded clause to move into object position in the
main clause.

[39] The versions of X' theory used in Generalized Phrase Structure
Grammar and Lexical-Functional Grammar do not make use of functional
heads and therefore they can not apply the abstract template to the
sentence level or to minor categories.

[40] I am assuming the definition given in (Sells, 1985):
a governs b iff
(a) a c-commands b, and
(b) a is an X, i.e., (N, V, P, A, INFL), and
(c) every maximal projection dominating b dominates a.