A MUSIIC Overview
I thought the RESNA presentation that I made might become a
concise sort of non-technical description of our work. I am
attaching a cleaned out version of my prsentation. Feedback
will be most useful. I was thinking that I coud use this as
a basic hand-out for visitors as well as put this up on the
MUSIIC web pages.
--- begin attachment ----
MUSIIC: Multimodal User Supervised Interface and Intelligent Control
The primary objective of MUSIIC is the development of a multimodally
controlled assistive robot for operation in an unstructured environment. The
long term goal of the project is to have the assitive robotic arm attached
to a wheel chair and allow the user the freedom from being limited to a
The main philosophy of MUSIIC is based on the synergy between Humans and
Machines. Inspite of our physical limitations, we excel in
* Problem Solving Ability
* Common Sense
* and Flexibility
Machines, on the other hand, excel in
* Speed of Computation
* Mechanical Power
* Ability to Persevere
We contend that by harnessing the best abilities of Human and Machine, can
we arrive at an assistive robot that is feasible, usable and most
MUSIIC has four main components
A pair of cameras takes a stereoscopic snap-shot of the domain of interest
and obtains information about the the shape, pose and location of objects in
The user instructs the robot arm by means of a multimodal interface which is
a combination of speech and gesture. By gesture, we are currently focusing
on deictic gesture, ie pointing gesture. This allows the identification of
the user's focus of interest without having to perform the computationally
difficult process of real-time object identification.
The speech input is a restricted subset of Natural Language [NL] and is
called Pseudo Natural Language [PNL]. The basic grammar has the following
Verb Object Subject
Since this is a manipulative domain, the language struture is essentially,
an action on an object to/at a location. As an example we may have inputs
Put that right of the blue book.
Insert the straw into the cup.
A parser then generates the semantics of the user intentions from the
combined speech and gesture input. The semantic interpretation of user input
is then passed on to the planner.
The adaptive planner executes user intentions by generating the appropriate
robot control commands. The adaptive planner reduces cognitive load on the
user by taking over most high-level as well as all low level planning tasks.
The planner is also able to adapt to changing situations and events, and
either replans on its own or interacts with the user when it is unable to
construct a plan on its own.
Object Oriented Knowledge Bases
Underlying MUSIIC are a pair of knowledge bases. One is a knowledge base of
actions, both primitive and complex, which is user extendible. The other is
a knowledge base of objects, which contains information about objects in an
abstraction hierarchy. The hierarchy has four tiers arranged on the basis of
This is the top level hierarchy and contains just enough information to
allow the construction of correct planes based solely on what
information is obtained from the vision system and no a priori
information about the objects in question.
AT this level, objects are classified in terms of general shapes, such
as cuboid, cylindrical, conical etc. This classification was chosen
because in a manipulative domain, the shape of objects affects the
planning process the most.
This is derived from the previous tier and groups together classes of
objects, such as cups, straws, pencils, boxes etc. Information
contained in this tier is used to make plans that are more indicative
of user intentions and also constrains planning parameters such as
approach position, grasp position, orientation to maintain etc. Other
information is obtained from the vision system for correct plan
This is the final tier on the abstraction hierarchy, and contains
information about actual objects in the domain.
The knowledge bases allows the planner to not only make plans for
manipulating objects about which it knows nothing a priori except what is
obtained by the vision system, but also to make plans that are a more
accurate interpretation of user intentions when more information is
available from the knowledge bases.
A test-bed has been constructed to determine the validity of our proof of
concept. The backbone computing machine for the vision interface is an SGI
XS-24 IRIS Indigo computer. Pictures are taken by two CCD color cameras,
model VCP-920 that have light-resolution 450 TV lines with 768X494 picture
elements. Each camera is equipped with motorized TV zoom lens, model
Computer M61212MSP. Cameras are connected to the SGI Galileo graphics board
which provides up to three input channels. In this system we use two
channels with s-video inputs. The Noesis Visilog-4 software package
installed on the SGI machine is used as an image processing engine to assist
developing the vision interface software. The speech system used is Dragon
Dictate running on a PC. A six degree of freedom robot manipulator, Zebra
ZERO, is employed as the manipulation device. The planner and knowledge
bases reside on a Sun Sparc Station 5. Communication between the planner and
the sub-systems are supported by the RPC (Remote Procedure Call) protocol.
We claim that the MUSIIC method is one of the better ways to extend the
functionality of a person with disability given that our system endows the
* Flexibility: the user is not constrained to work in a fixed workcell as
well as being able to handle manipulation in an unstructured domain.
* Easing of Cognitive Loading: the adaptive planner shares with the user
the chore of planning out complex tasts as well as being intelligent
enough to handle high-level tasks on its own.
* Domain Independence: The MUSIIC method is not limited to a fixed
* Ease of Use: The multimodal interface allows the user to easily
interact with the system in an intuitive manner.
--- end attachment ----
Zunaid Kazi firstname.lastname@example.org
AI & Robotics http://www.asel.udel.edu/~kazi/