[MUSIIC]

Multimodal User Supervised Interface and Intelligent Control

A GENERAL OVERVIEW OF THE ROBOT ARM SYSTEM FOR THE MULTIMODAL PROJECT

Suggstions by Dan

The Multi-Modal User Direction (MMUD) system consists of 5 subsystems, each with its own computer:

* Speech recognition (SR) subsystem

* Robot arm control (RAC) subsystem

* Space and object mapping (SOM) subsystem

* Planning and execution supervisor (PES) subsystem

* Graphic simulations (GS) subsystem.

Here are the subsystems in more detail, paying special attention to what needs to be specified.

1.0 Speech recognition (SR) subsystem

1.1 PURPOSE

Accept speech input from user and send a representation of what was said to PES.

1.2 INPUTS

1. Speech is entered through the Dragon Dictate system.

1.3 OUTPUTS

A list of words that were spoken without long pauses between them is sent to PES. SR probably should not try to parse the list because (i) it might interfere with the recognition of the individual words and (ii) a knowledge base in PES might be needed to assist the parsing, i.e., to recognize user-defined terms, and context information. [Are my assumptions correct?]

1.4 SPECIFICATIONS NEEDED

1. Format for sending the list. Perhaps just a "("followed by the words (separated by spaces) followed by a ")".

2.0 Robot arm control (RAC) subsystem

2.1 PURPOSE

Operate the Zebra robot arm.

2.2 INPUTS

1. Plans are received from PES for execution.

2. Requests for information about the state of the arm are received from PES.(These might be considered degenerative cases of plans.)

2.3 OUTPUTS

1. Reports about the results of plan executions. These might be reports of success, with any updating parameters that would be routinely useful to PES, or reports of failure, with details of the nature of the failure.

2. Replies to requests from PES. These would be parameter information about the arm that PES might need for planning purposes.

2.4 SPECIFICATIONS NEEDED

1. Specification of the plan language.

2. Specification of the information requests made by PES.

3. Specification of the reports made by RAC to PES regarding plan executions.

4. Specification of the replies made by RAC to PES to information requests.

5. Specification of a network protocol for transmitting data (plans, reports, replies, word lists, etc.) between subsystems. This should be based on the transmission of ascii strings to make the protocol as machine independent as possible. (The internal representation of numbers varies from machine to machine, and pointers are clearly out of the question.)

3.0 Space and object mapping (SOM) subsystem

3.1 PURPOSE

Obtain information about the environment (topology and location of objects) through a vision system.

3.2 INPUTS

1. Command from PES to take pictures and locate laser spot.

2. Command from PES to take pictures and make topological map of environment.

3.3 OUTPUTS

1. Three-dimensional coordinates of laser spot sent to PES.

2. Reply after making a topological map of environment. Question: should SOM send the whole map to PES, or should it keep it to itself and allow PES to ask questions about the map as it needs to? In other words, should the calculations involving the map (such as object recognition and path checking) be done in PES or SOM?

3.4 SPECIFICATIONS NEEDED

1. Specification of commands from PES to SOM.

2. Specification of replies to PES. This might include replies transmitting the whole map, or it might include the commands and replies involving map calculations done by SOM.

4.0 Planning and execution supervisor (PES) subsystem

4.1 PURPOSE

Plan the actions of the robot arm, supervise the execution of the plan, and carry out a dialog with the user.

4.2 INPUTS

1. List of words from SR. These are parsed into user commands and replies.

2. Replies from RAC. These are reports of success or failure of an executed plan.

3. Replies from SOM.

4.3 OUTPUTS

1. Commands to RAC. (See RAC specifications.)

2. Commands to SOM. (See SOM specifications.)

3. Commands, requests, questions directed to the user.

4.4 SPECIFICATIONS NEEDED

(in addition to those listed for other subsystems)

1. A grammar for all the commands, requests and questions coming from the user.

2. A grammar for all the commands, requests and questions directed to the user.

3. Specification of the dialogs that can take place between user and MMUD. This also might look like a grammar.

4.4.1 Some general questions

What will the form of feedback to the user be?

Synthesized speech?

Text displayed on a monitor?

A graphics simulation (GS)?

On which computer should synthesized speech be generated or text displayed?

Last Updated: March 5, by Zunaid Kazi <kazi@asel.udel.edu>