Volume 1 Contents

ThA1LP -- Opening Ceremony and Plenary Lecture

Chairs: H. Timothy Bunnell, Alfred I. duPont Institute; and Richard A. Foulds, Alfred I. duPont Institute

The Comparative Study of Spoken-Language Processing Anne Cutler

ThA2L1 -- Large Vocabulary

Chair: Michael D. Riley, AT&T Labs - Research

New Developments in the INRS Continuous Speech Recognition System Z. Li, M. Heon, Douglas O'Shaughnessy
On Designing Pronunciation Lexicons for Large Vocabulary, Continuous Speech Recognition Lori Lamel, Gilles Adda
Word Graph Rescoring Using Confidence Measures Pablo Fetter, Frédéric Dandurand, Peter Regel-Brietzmann
A Bottom-up Approach for Handling Unseen Triphones in Large Vocabulary Continuous Speech Recognition X.L. Aubert, Peter Beyerlein, Meinhard Ullrich
Discriminative Optimisation of Large Vocabulary Recognition Systems V. Valtchev, P.C. Woodland, S. J. Young
Japanese Large-vocabulary Continuous-speech Recognition using a Business-newspaper Corpus Tatsuo Matsuoka, Katsutoshi Ohtsuki, Takeshi Mori, Sadaoki Furui, Katsuhiko Shirai
Handling Compound Nouns in a Swedish Speech-understanding System David Carter, Jaan Kaja, Leonardo Neumeyer, Manny Rayner, Fuliang Weng, Mats Wiren
Initial Evaluation of a Preselection Module for a Flexible Large Vocabulary Speech Recognition System in Telephone Environment J. Macias-Guarasa, A. Gallardo, J. Ferreiros, Jose M. Pardo, L. Villarrubia

ThA2L2 -- Multimodal ASR (Face and Lips)

Chair: Eric Petajan, Bell Labs - Lucent Technologies

Asynchronous Integration of Visual Information in an Automatic Speech Recognition System Mamoun Alissali, Paul Deleglise, Alexandrina Rogozan
Audiovisual Speech Recognition using Multiscale Nonlinear Image Decomposition. I.A. Matthews, J. Bangham, S.J. Cox
Robust Audiovisual Integration using Semicontinuous Hidden Markov Models Qin Su, Peter L. Silsbee
The Effect of Visual Information on Word Initial Consonant Perception of Dysarthric Speech Richard P. Schumeyer, Kenneth E. Barner
A Multiple Deformable Template Approach for Visual Speech Recognition Devi Chandramohan, Peter L. Silsbee
Speaker Independent Bimodal Phonetic Recognition Experiments P. Cosi, E. Magno Caldognetto, F. Ferrero, M. Dugatto, K. Vagges
Speechreading using Shape and Intensity Information Juergen Luettin, Neil A. Thacker, Steve W. Beet
Speaker Identification by Lipreading Juergen Luettin, Neil A. Thacker, Steve W. Beet

ThA2L3 -- Perception of Words

Chair: Sharon Manuel, Emerson College and Massachusetts Institute of Technology

How Word Onsets Drive Lexical Access and Segmentation: Evidence from Acoustics, Phonology and Processing David W. Gow Jr., Janis Melvold, Sharon Manuel
RAW: A Real-speech Model for Human Word Recognition David van Kuijk, Peter Wittenburg, Ton Dijkstra
How Facilitatory can Lexical Information Be During Word Recognition? Evidence from Moroccan Arabic Mehdi Meftah, Sami Boudelaa
Effects of Frequency on the Auditory Perception of Open- Versus Closed-class Words Alette P. Haveman
Phonotactic and Metrical Influences on Adult Ratings of Spoken Nonsense Words Michael S. Vitevitch, Paul A. Luce, Jan Charles-Luce, David Kemmerer
Lipreading Supplemented by Voice Fundamental Frequency: To What Extent Does the Addition of Voicing Increase Lexical Uniqueness for the Lipreader? Edward T. Auer Jr., Lynne E. Bernstein
Strategies Used in Rhyme-Monitoring S. te Riele, S.G. Nooteboom, H. Quené
How do Dutch Listeners Process Words with Epenthetic Schwa? Wilma van Donselaar, Cecile Kuijpers, Anne Cutler

ThA2P1 -- Phonetics, Transcription, and Analysis

Chair: Jim Hieronymus, Bell Labs - Lucent Technologies

Whole-word Phonetic Distances and the PGPfone Alphabet Patrick Juola, Philip Zimmermann
Automatic Vowel Quality Description using a Variable Mapping to an Eight Cardinal Vowel Reference Set Shuping Ran, J. Bruce Millar, Phil Rose
Automatic Detection and Segmentation of Pronunciation Variants in German Speech Corpora Andreas Kipp, Maria-Barbara Wesenick, Florian Schiel
ANGIE: A New Framework for Speech Analysis Based on Morpho-phonological Modelling Stephanie Seneff, Raymond Lau, Helen Meng
Perceptual Contrast in the Korean and English Vowel System Normalized Byunggon Yang
On Phonetic Characteristics of Pause in the Korean Read Speech Yong-Ju Lee, Sook-hyang Lee
Cross-Language Effects of Lexical Stress in Word Recognition: The Case of Arabic English Bilinguals Sami Boudelaa, Mehdi Meftah
Automatic Generation of German Pronunciation Variants Maria-Barbara Wesenick
Estimating the Quality of Phonetic Transcriptions and Segmentations of Speech Signals Maria-Barbara Wesenick, Andreas Kipp
An Acoustic Analysis of Contemporary Vowels of the Standard Slovenian Language Bojan Petek, Rastislav Sustarsic,Smiljana Komar
Using Decision Trees to Construct Optimal Acoustic Cues Sandrine Robbe, Anne Bonneau, Sylvie Coste, Yves Laprie
Maximum Jaw Displacement in Contrastive Emphasis Donna Erickson, Osamu Fujimura
Subglottal Pressure and Final Lowering in English Rebecca Herman, Mary Beckman, Kiyoshi Honda
Phonological Variation: Epenthesis and Deletion of Schwa in Dutch Cecile Kuijpers, Wilma van Donselaar, Anne Cutler
Can a Moraic Nasal Occur Word-initially in Japanese? Takashi Otake, Kiyoko Yoneyama

ThA2P2 -- Spoken Language Processing for Special Populations

Chair: Valerie Hazan, University College London

Feedback Considerations for Speech Training Systems James J. Mahshie
Clinical Applications of Computer-Based Speech Training for Children with Hearing Impairment Anne-Marie Öster
Enhancing Information-rich Regions of Natural VCV and Sentence Materials Presented in Noise Valerie Hazan, Andrew Simpson
Speech Perceptual Abilities of Children with Specific Reading Difficulty (Dyslexia) Valerie Hazan, Alan Adlard
Bimodal Perception of Spectrum Compressed Speech Larry D. Paarmann, Michael K. Wynne
Effect of Sentential Context on Syllabic Stress Perception by Hearing-impaired Listeners Dragana Barac-Cikoja, Sally Revoile
Applications of Automatic Speech Recognition to Speech and Language Development in Young Children Martin Russell, Catherine Brown, Adrian Skilling, Rob Series, Julie Wallace, Bill Bohnam, Paul Barker
Sub-band Adaptive Speech Enhancement for Hearing Aids D. R. Campbell
Adapting a TTS System to a Reading Machine for the Blind Thomas Portele, Juergen Kraemer

ThA2S1 -- Dialogue Special Session I

Chairs: James R. Glass, MIT Laboratory for Computer Science; and Yasunaga Niimi, Kyoto Institute of Technology

Modeling of Spoken Dialogue with and without Visual Information Katsuhiko Shirai
Multimodal Discourse Modelling in a Multi-user Multi-domain Environment Stephanie Seneff, David Goddeau, Christine Pao, Joseph Polifroni
Automatic Acquisition of Probabilistic Dialogue Models Kenji Kita, Yoshikazu Fukui, Masaaki Nagata, Tsuyoshi Morimoto
Units of Dialogue Management: An Example Paul Heisterkamp, Scott McGlashan
Error Resolution During Multimodal Human-computer Interaction Sharon Oviatt, Robert VanGent
Improved Spontaneous Dialogue Recognition Using Dialogue and Utterance Triggers by Adaptive Probability Boosting Ramesh R. Sarukkai, Dana H. Ballard
Speech Recognition for Spontaneously Spoken German Dialogues Kai Hübener, Uwe Jost, Henrik Heine
Using Prosodic Information to Constrain Language Models for Spoken Dialogue Paul Taylor, Hiroshi Shimodaira, Stephen Isard, Simon King, Jacqueline Kowtko

ThP1L1 -- Language Modeling I

Chair: Roberto Pieraccini, AT&T Labs - Research

Combination of Word-based and Category-based Language Models T.R. Niesler, P.C. Woodland
A Multi-level Lexical-semantics Based Language Model Design for Guided Integrated Continuous Speech Recognition Francisco J. Valverde-Albacete, Jose M. Pardo
A Category Based Approach for Recognition of Out-of-Vocabulary Words Florian Gallwitz, Elmar Noeth, Heinrich Niemann
Scalable Backoff Language Models Kristie Seymore, Ronald Rosenfeld
Modeling Long Distance Dependence in Language: Topic Mixtures vs. Dynamic Cache Models R. Iyer, Mari Ostendorf
Bayesian Estimation Methods for N-Gram Language Model Adaptation Marcello Federico

ThP1L2 -- Feature Extraction for Speech Recognition I

Chair: Shubha Kadambe, Atlantic Aerospace Electronics Corp.

Feature Dimension Reduction Using Reduced-Rank Maximum Likelihood Estimation for Hidden Markov Models Don X. Sun
Using Multi-Level Segmentation Coefficients to Improve HMM Speech Recognition Kai Hübener
A Comparative Study of Linear Feature Transformation Techniques for Automatic Speech Recognition T. Eisele, R. Haeb-Umbach, D. Langmann
Inclusion of Temporal Information into Features for Speech Recognition Ben Milner
New Cepstral Representation using Wavelet Analysis and Spectral Transformation for Robust Speech Recognition Hubert Wassner, Gérard Chollet
Wavelet Based Feature Extraction for Phoneme Recognition C.J. Long, S. Datta

ThP1L3 -- Speech Production - Measurement and Modeling

Chair: Terrance M. Neary, University of Alberta

Extraction of Tongue Contours in X-ray Images with Minimal User Interaction Yves Laprie, Marie-Odile Berger
Three-dimensional Measurement of the Vocal Tract by MRI Didier Demolin, Thierry Metens, Alain Soquet
Syllable Affiliation of Final Consonant Clusters Undergoes a Phase Transition Over Speaking Rates Philip Gleason, Betty Tuller, J. A. Scott Kelso
Towards a Biomechanical Model of the Larynx Arthur Lobo, Michael O'Malley
Effects of Auditory Feedback on F0 Trajectory Generation Hideki Kawahara, Hiroko Kato, J. C. Williams

ThP1P1 -- Speech Coding / HMMs and NNs in ASR

Chair: Jean-Luc Gauvain, LIMSI-CNRS

On the Effects of Accent and Language on Low Rate Speech Coders I. S. Burnett, J. J. Parry
VQ Codevector Index Assignment Using Genetic Algorithms for Noisy Channels J.S. Pan, Fergus R. McInnes, Mervyn A. Jack
An Improved Vector Quantization Algorithm for Speech Transmission Over Noisy Channels Gavin C. Cawley
Very Low Delay and High Quality Coding of 20 Hz-15 kHz Speech Signals at 64 kbit/s C. Murgia, G. Feng, A. Le Guyader, C. Quinquis
Application of Speaker Modification Techniques to Phonetic Vocoding Carlos M. Ribeiro, Isabel M. Trancoso
Entropy Coded Vector Quantization with Hidden Markov Models Tadashi Yonezaki, Kiyohiro Shikano
An Application of Recurrent Neural Networks to Low Bit Rate Speech Coding Minoru Kohata
CELP Coding System Based on Mel-Generalized Cepstral Analysis Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai
Wideband Re-synthesis of Narrowband CELP-coded Speech Using Multiband Excitation Model Cheung-Fat Chan, Wai-Kwong Hui
Recurrent Neural Networks for Phoneme Recognition Takuya Koizumi, Mikio Mori, Shuji Taniguchi, Mitsutoshi Maruya
A Model for the Acoustic Phonetic Structure of Arabic Language using a Single Ergodic Hidden Markov Model M.A. Mokhtar, A. Zein-el-Abddin
Modelling Long Term Variability Information in Mixture Stochastic Trajectory Framework Yifan Gong, Irina Illina, Jean-Paul Haton
Segmental Phonetic Features Recognition by means of Neural-fuzzy Networks and Integration in an N-best Solutions Post-processing T. Moudenc, R. Sokol, G. Mercier
Stochastic Trajectory Model with State-Mixture for Continuous Speech Recognition Irina Illina, Yifan Gong
Recognition of Spelled Names over the Telephone Hermann Hild, Alex Waibel
Optimal Tying of HMM Mixture Densities using Decision Trees Gilles Boulianne, Patrick Kenny
Speech Recognition Using an Enhanced FVQ Based on a Codeword Dependent Distribution Normalization and Codeword Weighting by Fuzzy Objective Function Hwan Jin Choi, Yung Hwan Oh
Using the Self-Organizing Map to Speed up the Probability Density Estimation for Speech Recognition with Mixture Density HMMs Mikko Kurimo, Panu Somervuo

ThP1S1 -- Dialogue Special Session II

Chairs: Patti Price, SRI International; and Akira Kurematsu, University of Electro-Communications

Combining the Detection and Correction of Speech Repairs Peter A. Heeman, Kyung-ho Loken-Kim, James F. Allen
Generating Spontaneous Elliptical Utterance Yuji Sagawa, Wataru Sugimoto, Noboru Ohnishi
Developing the Modelling of Swedish Prosody in Spontaneous Dialogue Gösta Bruce, Marcus Filipsson, Johan Frid, Björn Granström, Kjell Gustafson, Merle Horne, David House, Birgitta Lastow, Paul Touati
Spoken Language Generation in a Multimedia System Shimei Pan, Kathleen R. McKeown
Synthesizing Dialogue Speech of Japanese Based on the Quantitative Analysis of Prosodic Features Keikichi Hirose, Mayumi Sakata, Hiromichi Kawanami
Spoken Dialogue Interface in a Dual Task Situation Shuichi Tanaka, Shu Nakazato, Keiichiro Hoashi, Katsuhiko Shirai

ThP1S2 -- Neural Models of Speech Processing I

Chair: Eric D. Young, Johns Hopkins University

How is Information About Speech Encoded in the Peripheral Auditory System? Eric D. Young
Spectral Shape Analysis in the Central Auditory System Shihab Shamma

ThP2L1 -- Language Modeling II

Chair: Jerome R. Bellegarda, Apple Computer, Inc.

Modeling Disfluencies in Conversational Speech Man-hung Siu, Mari Ostendorf
Evaluation of a Language Model using a Clustered Model Backoff John Miller, Fil Alleva
Language Modeling Using X-grams Antonio Bonafonte, José B. Mariño
Class Phrase Models For Language Modelling Klaus Ries, Finn Dag Buo, Alex Waibel
Introducing Linguistic Constraints into Statistical Language Modeling Petra Geutner
Language Modeling with Stochastic Automata Jianying Hu, William Turin, Michael K. Brown

ThP2L2 -- Feature Extraction for Speech Recognition II

Chair: Shubha Kadambe, Atlantic Aerospace Electronics Corp.

New Fast Wavelet Packet Transform Algorithms for Frame Synchronized Speech Processing Andrzej Drygajlo
Frequency-Warping in Speech S. Umesh, L. Cohen, N. Marinovic, D. Nelson
Extracting Speech Features from Human Speech-like Noise Daisuke Kobayashi, Shoji Kajita, Kazuya Takeda, Fumitada Itakura
Subband-Crosscorrelation Analysis for Robust Speech Recognition Shoji Kajita, Kazuya Takeda, Fumitada Itakura
A New ASR Approach Based on Independent Processing and Recombination of Partial Frequency Bands Hervé Bourlard, Stéphane Dupont
Frequency and Time Filtering of Filter-bank Energies for HMM Speech Recognition Climent Nadeu, José B. Mariño, Javier Hernando, Albino Nogueiras

ThP2L3 -- Vowels

Chair: John Ohala, University of California, Berkeley

Temporal Cues for Vowels and Universals of Vowel Inventories Carrie E. Lang, John J. Ohala
Acoustic Variability in Spontaneous Conversational Speech of American English Talkers Ann K. Syrdal
Cross-language Speech Perception: Swedish, English, and Spanish Speakers' Perception of Front Rounded Vowels Raquel Willerman, Patricia K. Kuhl
Inter-language Vowel Perception and Production by Korean and Japanese Listeners John C.L. Ingram, See-Gyoon Park
Intelligibility and Acoustic Correlates of Japanese Accented English Vowels Diane Kewley-Port, Reiko Akahane-Yamada, Kiyoaki Aikawa
Segmentation Strategies for Spoken Language Recognition: Evidence from Semi-bilingual Japanese Speakers of English Kiyoko Yoneyama

ThP2P1 -- NNs and Stochastic Modeling

Chair: Wu Chou, Bell Labs - Lucent Technologies

Integrating Connectionist, Statistical and Symbolic Approaches for Continuous Spoken Korean Processing Geunbae Lee, Jong-Hyeok Lee, Kyubong Park, Byung-Chang Kim
Towards ASR on Partially Corrupted Speech Hynek Hermansky, Sangita Timberwala, Misha Pavel
Parametric Trajectory Models for Speech Recognition Herbert Gish, Kenney Ng
Use of Gaussian Selection in Large Vocabulary Continuous Speech Recognition Using HMMs K.M. Knill, M.J.F. Gales, S. J. Young
Cross Phone State Clustering using Lexical Stress and Context J. Hogberg, K. Sjolander
Likelihood Ratio Decoding and Confidence Measures for Continuous Speech Recognition Eduardo Lleida, Richard C. Rose
A Study on Continuous Chinese Speech Recognition Based on Stochastic Trajectory Models Xiaohui Ma, Yifan Gong, Yuqing Fu, Jiren Lu, Jean-Paul Haton
A Proposal for a New Algorithm of Reference Interval-free Continuous DP for Real-time Speech or Text Retrieval Yoshiaki Itoh, Jiro Kiyama, Hiroshi Kojima, Susumu Seki, Ryuichi Oka
Language Modeling by String Pattern N-gram for Japanese Speech Recognition Akinori Ito, Masaki Kohda
Statistical Language Modeling using a Variable Context Length Reinhard Kneser
A Comparison of Hybrid HMM Architectures Using Global Discriminative Training Finn Tore Johansen
Improved Probability Estimation with Neural Network Models Wei Wei, Etienne Barnard, Mark Fanty
A Neural Network Using Acoustic Sub-word Units for Continuous Speech Recognition Ha-Jin Yu, Yung-Hwan Oh
On the Error Criteria in Neural Networks as a Tool for Human Classification Modelling Louis F. M. ten Bosch, Roel Smits
A Non-linear Filtering Approach to Stochastic Training of the Articulatory-acoustic Mapping Using the EM Algorithm Gordon Ramsay
A Tool for Automated Design of Language Models Y.P. Yang, J.R. Deller Jr.
Acoustic-phonetic Decoding Based on Elman Predictive Neural Networks F. Freitag, E. Monte
On Improving Discrimination Capability of an RNN Based Recognizer Tan Lee, P.C. Ching
An Evaluation of Statistical Language Modeling for Speech Recognition using a Mixed Category of Both Words and Parts-of-speech Yumi Wakita, Jun Kawai, Hitoshi Iida

ThP2S1 -- Dialogue Special Session III

Chairs: Paul Dalsgaard, Aalborg University; and Hiroya Fujisaki, Science University of Tokyo

A Dialogue Control Strategy Based on the Reliability of Speech Recognition Yasuhisa Niimi, Yutaka Kobayashi
SpeechWear: A Mobile Speech System Alexander I. Rudnicky, Stephen Reed, Eric H. Thayer
WHEELS: A Conversational System in the Automobile Classifieds Domain Helen Meng, Senis Busayapongchai, James Glass, David Goddeau, Lee Hetherington, Edward Hurley, Christine Pao, Joseph Polifroni, Stephanie Seneff, Victor Zue
Effective Human-computer Cooperative Spoken Dialogue: The AGS Demonstrator M.D. Sadek, A. Ferrieux, A. Cozannet, P. Bretier, F. Panaget, J. Simonin
Dialog in the RAILTEL Telephone-based System S.K. Bennacef, L. Devillers, S. Rosset, Lori Lamel
Dialogue Processing in a Conversational Speech Translation System Alon Lavie, Lori Levin, Yan Qu, Alex Waibel, Donna Gates, Marsal Gavaldà, Laura Mayfield, Maite Taboada

ThP2S2 -- Neural Models of Speech Processing II

Chair: Eric D. Young, Johns Hopkins University

Novel Speech Processing Mechanism Derived from Auditory Neocortical Circuit Analysis Boris Aleksandrovsky, James Whitson, Gretchen Andes, Gary Lynch, Richard Granger
Modeling Neurons in the Anteroventral Cochlear Nucleus for Amplitude Modulation (AM) Processing: Application to Speech Sound Ping Tang, Jean Rouat
Noise Suppression and Loudness Normalization in an Auditory Model-based Acoustic Front-end Halewijn Vereecken, Jean-Pierre Martens
A Psychoacoustic Model for the Noise Masking of Voiceless Plosive Bursts Jim Hant, Brian Strope, Abeer Alwan
Training Machine Classifiers to Match the Performance of Human Listeners in a Natural Vowel Classification Task Martin Hunke, Thomas Holton
A Neural Matrix Model for Active Tracking of Frequency-modulated Tones Kiyoaki Aikawa, Hideki Kawahara, Minoru Tsuzaki