[MUSIIC]

Multimodal User Supervised Interface and Intelligent Control

Objective

The primary objective of MUSIIC is the development of a multimodally controlled assistive robot for operation in an unstructured environment. The long term goal of the project is to have the assitive robotic arm attached to a wheel chair and allow the user the freedom from being limited to a fixed workcell.

Philosophy

The main philosophy of MUSIIC is based on the synergy between Humans and Machines. Inspite of our physical limitations, we excel in

Creativity
Problem Solving Ability
Common Sense
and Flexibility

Machines, on the other hand, excel in

Speed of Computation
Mechanical Power
Ability to Persevere

We contend that by harnessing the best abilities of Human and Machine, can we arrive at an assistive robot that is feasible, usable and most importantly practical.

Components

MUSIIC has four main components

Stereovision Component

A pair of cameras takes a stereoscopic snap-shot of the domain of interest and obtains information about the the shape, pose and location of objects in the domain.

Multimodal Interface

The user instructs the robot arm by means of a multimodal interface which is a combination of speech and gesture. By gesture, we are currently focusing on deictic gesture, ie pointing gesture. This allows the identification of the user's focus of interest without having to perform the computationally difficult process of real-time object identification.

The speech input is a restricted subset of Natural Language [NL] and is called Pseudo Natural Language [PNL]. The basic grammar has the following structure:

Verb Object Subject

Since this is a manipulative domain, the language struture is essentially, an action on an object to/at a location. As an example we may have inputs like:

Put that right of the blue book.

Insert the straw into the cup.

A parser then generates the semantics of the user intentions from the combined speech and gesture input. The semantic interpretation of user input is then passed on to the planner.

Adaptive Planner

The adaptive planner executes user intentions by generating the appropriate robot control commands. The adaptive planner reduces cognitive load on the user by taking over most high-level as well as all low level planning tasks. The planner is also able to adapt to changing situations and events, and either replans on its own or interacts with the user when it is unable to construct a plan on its own.

Object Oriented Knowledge Bases

Underlying MUSIIC are a pair of knowledge bases. One is a knowledge base of actions, both primitive and complex, which is user extendible. The other is a knowledge base of objects, which contains information about objects in an abstraction hierarchy. The hierarchy has four tiers arranged on the basis of abstraction.

Generic Blob: This is the top level hierarchy and contains just enough information to allow the construction of correct planes based solely on what information is obtained from the vision system and no a priori information about the objects in question.
Shape Based: AT this level, objects are classified in terms of general shapes, such as cuboid, cylindrical, conical etc. This classification was chosen because in a manipulative domain, the shape of objects affects the planning process the most.
Object Type: This is derived from the previous tier and groups together classes of objects, such as cups, straws, pencils, boxes etc. Information contained in this tier is used to make plans that are more indicative of user intentions and also constrains planning parameters such as approach position, grasp position, orientation to maintain etc. Other information is obtained from the vision system for correct plan construction.
Instantiated Object: This is the final tier on the abstraction hierarchy, and contains information about actual objects in the domain.

The knowledge bases allows the planner to not only make plans for manipulating objects about which it knows nothing a priori except what is obtained by the vision system, but also to make plans that are a more accurate interpretation of user intentions when more information is available from the knowledge bases.

Test Bed

A test-bed has been constructed to determine the validity of our proof of concept. The backbone computing machine for the vision interface is an SGI XS-24 IRIS Indigo computer. Pictures are taken by two CCD color cameras, model VCP-920 that have light-resolution 450 TV lines with 768X494 picture elements. Each camera is equipped with motorized TV zoom lens, model Computer M61212MSP. Cameras are connected to the SGI Galileo graphics board which provides up to three input channels. In this system we use two channels with s-video inputs. The Noesis Visilog-4 software package installed on the SGI machine is used as an image processing engine to assist developing the vision interface software. The speech system used is Dragon Dictate running on a PC. A six degree of freedom robot manipulator, Zebra ZERO, is employed as the manipulation device. The planner and knowledge bases reside on a Sun Sparc Station 5. Communication between the planner and the sub-systems are supported by the RPC (Remote Procedure Call) protocol.

Conclusion

We claim that the MUSIIC method is one of the better ways to extend the functionality of a person with disability given that our system endows the user with

Flexibility: the user is not constrained to work in a fixed workcell as well as being able to handle manipulation in an unstructured domain.
Easing of Cognitive Loading: the adaptive planner shares with the user the chore of planning out complex tasts as well as being intelligent enough to handle high-level tasks on its own.
Domain Independence: The MUSIIC method is not limited to a fixed domain.
Ease of Use: The multimodal interface allows the user to easily interact with the system in an intuitive manner.

Last Updated: June 15, 1996 Zunaid Kazi <kazi@asel.udel.edu>