A MUSIIC Overview

I thought the RESNA presentation that I made might become a
concise sort of non-technical description of our work. I am
attaching a cleaned out version of my prsentation. Feedback 
will be most useful. I was thinking that I coud use this as
a basic hand-out for visitors as well as put this up on the
MUSIIC web pages.


--- begin attachment ----

MUSIIC: Multimodal User Supervised Interface and Intelligent Control


The primary objective of MUSIIC is the development of a multimodally
controlled assistive robot for operation in an unstructured environment. The
long term goal of the project is to have the assitive robotic arm attached
to a wheel chair and allow the user the freedom from being limited to a
fixed workcell.


The main philosophy of MUSIIC is based on the synergy between Humans and
Machines. Inspite of our physical limitations, we excel in

   * Creativity
   * Problem Solving Ability
   * Common Sense
   * and Flexibility

Machines, on the other hand, excel in

   * Speed of Computation
   * Mechanical Power
   * Ability to Persevere

We contend that by harnessing the best abilities of Human and Machine, can
we arrive at an assistive robot that is feasible, usable and most
importantly practical.


MUSIIC has four main components

Stereovision Component

A pair of cameras takes a stereoscopic snap-shot of the domain of interest
and obtains information about the the shape, pose and location of objects in
the domain.

Multimodal Interface

The user instructs the robot arm by means of a multimodal interface which is
a combination of speech and gesture. By gesture, we are currently focusing
on deictic gesture, ie pointing gesture. This allows the identification of
the user's focus of interest without having to perform the computationally
difficult process of real-time object identification.

The speech input is a restricted subset of Natural Language [NL] and is
called Pseudo Natural Language [PNL]. The basic grammar has the following

     Verb Object Subject

Since this is a manipulative domain, the language struture is essentially,
an action on an object to/at a location. As an example we may have inputs

     Put that right of the blue book.
     Insert the straw into the cup.

A parser then generates the semantics of the user intentions from the
combined speech and gesture input. The semantic interpretation of user input
is then passed on to the planner.

Adaptive Planner

The adaptive planner executes user intentions by generating the appropriate
robot control commands. The adaptive planner reduces cognitive load on the
user by taking over most high-level as well as all low level planning tasks.
The planner is also able to adapt to changing situations and events, and
either replans on its own or interacts with the user when it is unable to
construct a plan on its own.

Object Oriented Knowledge Bases

Underlying MUSIIC are a pair of knowledge bases. One is a knowledge base of
actions, both primitive and complex, which is user extendible. The other is
a knowledge base of objects, which contains information about objects in an
abstraction hierarchy. The hierarchy has four tiers arranged on the basis of

Generic Blob
     This is the top level hierarchy and contains just enough information to
     allow the construction of correct planes based solely on what
     information is obtained from the vision system and no a priori
     information about the objects in question.
Shape Based
     AT this level, objects are classified in terms of general shapes, such
     as cuboid, cylindrical, conical etc. This classification was chosen
     because in a manipulative domain, the shape of objects affects the
     planning process the most.
Object Type
     This is derived from the previous tier and groups together classes of
     objects, such as cups, straws, pencils, boxes etc. Information
     contained in this tier is used to make plans that are more indicative
     of user intentions and also constrains planning parameters such as
     approach position, grasp position, orientation to maintain etc. Other
     information is obtained from the vision system for correct plan
Instantiated Object
     This is the final tier on the abstraction hierarchy, and contains
     information about actual objects in the domain.

The knowledge bases allows the planner to not only make plans for
manipulating objects about which it knows nothing a priori except what is
obtained by the vision system, but also to make plans that are a more
accurate interpretation of user intentions when more information is
available from the knowledge bases.

Test Bed

A test-bed has been constructed to determine the validity of our proof of
concept. The backbone computing machine for the vision interface is an SGI
XS-24 IRIS Indigo computer. Pictures are taken by two CCD color cameras,
model VCP-920 that have light-resolution 450 TV lines with 768X494 picture
elements. Each camera is equipped with motorized TV zoom lens, model
Computer M61212MSP. Cameras are connected to the SGI Galileo graphics board
which provides up to three input channels. In this system we use two
channels with s-video inputs. The Noesis Visilog-4 software package
installed on the SGI machine is used as an image processing engine to assist
developing the vision interface software. The speech system used is Dragon
Dictate running on a PC. A six degree of freedom robot manipulator, Zebra
ZERO, is employed as the manipulation device. The planner and knowledge
bases reside on a Sun Sparc Station 5. Communication between the planner and
the sub-systems are supported by the RPC (Remote Procedure Call) protocol.


We claim that the MUSIIC method is one of the better ways to extend the
functionality of a person with disability given that our system endows the
user with

   * Flexibility: the user is not constrained to work in a fixed workcell as
     well as being able to handle manipulation in an unstructured domain.
   * Easing of Cognitive Loading: the adaptive planner shares with the user
     the chore of planning out complex tasts as well as being intelligent
     enough to handle high-level tasks on its own.
   * Domain Independence: The MUSIIC method is not limited to a fixed
   * Ease of Use: The multimodal interface allows the user to easily
     interact with the system in an intuitive manner.

--- end attachment ----
Zunaid Kazi                                        kazi@asel.udel.edu
AI & Robotics                         http://www.asel.udel.edu/~kazi/