Volume 4 Contents

SuA1L1 -- Robust Speech Processing

Chair: Mazin Rahim, AT&T Labs - Research

Channel and Noise Normalization Using Affine Transformed Cepstrum Xiaoyu Zhang, Richard J. Mammone
Spectral Estimation and Normalisation for Robust Speech Recognition Tom Claes, Fei Xie, Dirk Van Compernolle
Trellis Encoded Vector Quantization for Robust Speech Recognition Wu Chou, Nambi Seshadri, Mazin Rahim
Phone Clustering using the Bhattacharyya Distance Brian Mak, Etienne Barnard
Variability of Lombard Effects Under Different Noise Conditions Atsushi Wakao, Kazuya Takeda, Fumitada Itakura
Lombard Effect Compensation and Noise Suppression for Noisy Lombard Speech Recognition Sang-mun Chi, Yung-Hwan Oh

SuA1L2 -- Dialects and Speaking Styles

Chair: Jim Hieronymus, Bell Labs - Lucent Technologies

The Use of Shibboleth Words for Automatically Classifying Speakers by Dialect A.W.F. Huggins, Yogen Patel
The Organization of Dialect Diversity in North America William Labov
Data Collection of Japanese Dialects and its Influence into Speech Recognition Ikuo Kudo, Takao Nakama, Tomoko Watanabe, Reiko Kameyama
Statistical Dialect Classification Based on Mean Phonetic Features David R. Miller, James Trischitta
Norwegian Numerals: a Challenge to Automatic Speech Recognition Knut Kvale
Evaluation of the Telefónica I+D Natural Numbers Recognizer over Different Dialects of Spanish from Spain and America C. de la Torre, J. Caminero-Gil, J. Alvarez, C. Martín del Alamo, L. Hernández-Gómez

SuA1L3 -- Production and Perception of Prosody

Chair: Gunnar Fant, KTH

Rhythmic Constraints on English Stress Timing Fred Cummins, Robert F. Port
On the Interaction of Clash, Focus and Phonological Phrasing Irene Vogel, Steve Hoskins
On the Quantal Nature of Speech Timing Gunnar Fant, Anita Kruckenberg
Differential Perception of Tonal Contours Through the Syllable David House
Pitch, Loudness, and Segmental Duration Correlates: Towards a Model for the Phonetic Aspects of Finnish Prosody Martti Vainio, Toomas Altosaar
Prosodic Manipulation System of Speech Material for Perceptual Experiments Nobuaki Minematsu, Seiichi Nakagawa, Keikichi Hirose

SuA1P1 -- Topics in ASR and Search

Chair: Enrico Bocchieri, AT&T Labs - Research

Clustered Language Models with Context-Equivalent States J.P. Ueberla, I. R. Gransden
Modeling of Contextual Effects and its Application to Word Spotting Yuji Yonezawa, Masato Akagi
A New Keyword Spotting Algorithm with Pre-calculated Optimal Thresholds J. Junkawitsch, L. Neubauer, H. Höge, G. Ruske
Detection of Ambiguous Portions of Signal Corresponding to OOV Words or Misrecognized Portions of Input Roxane Lacouture, Yves Normandin
Techniques for Approximating a Trigram Language Model Fabio Brugnara, Marcello Federico
Unsupervised and Incremental Speaker Adaptation under Adverse Environmental Conditions Keizaburo Takagi, Koichi Shinoda, Hiroaki Hattori, Takao Watanabe
An Adaptive-Beam Pruning Technique for Continuous Speech Recognition Hugo Van hamme, Filip Van Aelten
Data Based Filter Design for RASTA-like Channel Normalization in ASR Carlos Avendano, Sarel van Vuuren, Hynek Hermansky
A Comparison of Time Conditioned and Word Conditioned Search Techniques for Large Vocabulary Speech Recognition S. Ortmanns, H. Ney, Frank Seide, I. Lindam
Language-model Look-ahead for Large Vocabulary Speech Recognition S. Ortmanns, H. Ney, A. Eiden
A New Search Algorithm in Segmentation Lattices of Speech Signals Jean-Luc Husson, Yves Laprie
LR-Parser-driven Viterbi Search with Hypotheses Merging Mechanism Using Context-dependent Phone Models Tomokazu Yamada, Shigeki Sagayama
Discrete-Utterance Recognition with a Fast Match Based on Total Data Reduction Jan Nouza
On-line Garbage Modeling with Discriminant Analysis for Utterance Verification J. Caminero, C. de la Torre, L. Villarrubia, C. Martín, L. Hernández
Cheating with Imperfect Transcripts Paul Placeway, John Lafferty
Novel Training Method for Classifiers used in Speaker Adaptation Naoto Iwahashi
Large Vocabulary Word Recognition based on a Graph-structured Dictionary Katsuki Minamino
A Word Graph Based N-Best Search in Continuous Speech Recognition Bach-Hiep Tran, Frank Seide, Volker Steinbiss
Viterbi Beam Search with Layered Bigrams David M. Goblirsch
A Wave Decoder for Continuous Speech Recognition Eric Burhke, Wu Chou, Qiru Zhou
Long Term On-line Speaker Adaptation for Large Vocabulary Dictation Eric Thelen
Incremental Generation of Word Graphs Gerhard Sagerer, Heike Rautenstrauch, G. A. Fink, Bernd Hildebrandt, A. Jusek, Franz Kummert
Improvement in N-Best Search for Continuous Speech Recognition Irina Illina, Yifan Gong
Sethos: The UPC Speech Understanding System Antonio Bonafonte, José B. Mariño, Albino Nogueiras
Segmental Search for Continuous Speech Recognition Pietro Laface, Luciano Fissore, A. Maro, Franco Ravera

SuA1P2 -- Multimodal Dialogue/HCI

Chair: Donald Hindle, AT&T Labs - Research

An Investigation into the Generation of Mouth Shapes for a Talking Head A. P. Breen, E. Bowers, W. Welsh
A Text-to-audiovisual-speech Synthesizer for French Bertrand Le Goff, Christian Benoît
Analysis of Head Movements and its Role in Spoken Dialogue Yuri Iwano, Shioya Kageyama, Emi Morikawa, Shu Nakazato, Katsuhiko Shirai
RWC Multimodal Database for Interactions by Integration of Spoken Language and Visual Information Satoru Hayamizu, Osamu Hasegawa, Katunobu Itou, Katuhiko Sakaue, Kazuyo Tanaka, Shigeki Nagaya, Masayuki Nakazawa, T. Endoh, Fumio Togawa, Kenji Sakamoto, Kazuhiko Yamamoto
About the Relationship Between Eyebrow Movements and Fo Variations Christian Cavé, Isabelle Guaïtella, Roxane Bertrand, Serge Santi, Françoise Harlay, Robert Espesser
How Many Words is a Picture Really Worth? Laurel Fais, Kyung-ho Loken-Kim, Tsuyoshi Morimoto
Visual Synthesis of Source Acoustic Speech Through Kohonen Neural Networks A. Lagana`, F. Lavagetto, A. Storace
Audio-visual Speech Perception Without Speech Cues Helena M. Saldaña, David B. Pisoni, Jennifer M. Fellowes, Robert E. Remez

SuA1S1 -- Multilingual Speech Processing I

Chair: Alex Waibel, Carnegie Mellon University

Multilingual Speech Recognition at Dragon Systems Jim Barnett, A. Corrada, G. Gao, L. Gillick, Y. Ito, S. Lowe, L. Manganaro, B. Peskin
Multi-lingual Phoneme Recognition Exploiting Acoustic-phonetic Similarities of Sounds Joachim Köhler
Japanese Speech Databases for Robust Speech Recognition Atsushi Nakamura, Shoichi Matsunaga, Tohru Shimizu, Masahiro Tonomura, Yoshinori Sagisaka
Spoken Language Processing in a Multilingual Context Lori F. Lamel, M. Adda-Decker, Jean Luc Gauvain, G. Adda
Multilingual Human-computer Interactions: From Information Access to Language Learning Victor Zue, Stephanie Seneff, Joseph Polifroni, Helen Meng, James Glass
SpeeData: Multilingual Spoken Data Entry U. Ackermann, B. Angelini, F. Brugnara, M. Federico, D. Giuliani, R. Gretter, G. Lazzari, H. Niemann

SuA2L1 -- Acoustics in Synthesis

Chair: Michael Macon, Georgia Institute of Technology

Pseudo-articulatory Representations in Speech Synthesis and Recognition William H. Edmondson, Jon P. Iles, Dorota J. Iskra
Synthesis of Initial (/s/-) Stop-liquid Clusters using HLsyn David R. Williams
Synthesis of Trill Chilin Shih
Phone-based Speech Synthesis with Neural Network and Articulatory Control W.K. Lo, P.C. Ching
Analysis of Ten Vowel Sounds Across Gender and Regional/Cultural Accent P. Martland, S.P. Whiteside, Steve W. Beet, L. Baghai-Ravary
Speech Morphing by Gradually Changing Spectrum Parameter and Fundamental Frequency Masanobu Abe

SuA2L2 -- Pitch and Rate

Chair: David Talkin, Entropic Research Laboratory

The Multi-Lag-Window Method for Robust Extended-range F0 Determination Edouard Geoffrois
Nonlinear Estimation of DEGG Signals with Applications to Speech Pitch Detection Kenneth E. Barner
Pitch Analysis Methods for Cross-Speaker Comparison John. A. Maidment, M. Luisa Garcia-Lecumberri
Continuous Adaptation of Linear Models with Impulsive Excitation Steve W. Beet, L. Baghai-Ravary
Quantitative Analysis of the Local Speech Rate and its Application to Speech Synthesis Sumio Ohno, Masamichi Fukumiya, Hiroya Fujisaki
A Fast and Reliable Rate of Speech Detector Jan P. Verhasselt, Jean-Pierre Martens

SuA2L3 -- Acoustic Modeling II

Chair: Li Deng, University of Waterloo

Context Modeling and Clustering in Continuous Speech Recognition Jean-Claude Junqua, Lorenzo Vassallo
Hierarchical Partition of the Articulatory State Space for Overlapping-feature Based Speech Recognition Li Deng, Jim Jian-Xiong Wu
A Fuzzy Acoustic-phonetic Decoder for Speech Recognition Olivier Oppizzi, David Fournier, Philippe Gilles, Henri Méloni
Syllable-level Desynchronisation of Phonetic Features for Speech Recognition Katrin Kirchhoff
A Probabilistic Framework for Feature-based Speech Recognition James Glass, Jane Chang, Michael McCandless
Modeling Context-dependent Phonetic Units in a Continuous Speech Recognition System for Mandarin Chinese Jim Jian-Xiong Wu, Li Deng, Jacky Chan

SuA2P1 -- General ASR Posters

Chair: Lori Lamel, LIMSI-CNRS

JANUS-II: Towards Spontaneous Spanish Speech Recognition Puming Zhan, Klaus Ries, Marsal Gavaldà, Donna Gates, Alon Lavie, Alex Waibel
Reduced Semi-continuous Models for Large Vocabulary Continuous Speech Recognition in Dutch Kris Demuynck, Jacques Duchateau, Dirk Van Compernolle
Validating Different Flexible Vocabulary Approaches on the Swiss French PolyPhone and PolyVar Databases Andrei Constantinescu, Olivier Bornet, Gilles Caloz, Gérard Chollet
Use of a Reliability Coefficient in Noise Cancelling by Neural Net and Weighted Matching Algorithms Nestor Becérra Yoma, Fergus R. McInnes, Mervyn A. Jack
Likelihood Normalization Using an Ergodic HMM for Continuous Speech Recognition Kazuhiko Ozeki
Dynamic Control of a Production Model Laurence Candille, Henri Méloni
Speech Recognition Using Sub-word Units Dependent On Phonetic Contexts Of Both Training and Recognition Vocabularies Hiroaki Hattori, Eiko Yamada
Hidden Markov Models Merging Acoustic and Articulatory Information to Automatic Speech Recognition Bruno Jacob, Christine Senac
Creation of Unseen Triphones from Diphones and Monophones using a Speech Production Approach Mats Blomberg, Kjell Elenius
Speaker-independent Dictation of Chinese Speech with 32K Vocabulary Bo Xu, Bing Ma, Shuwu Zhang, Fei Qu, Taiyi Huang
Using Accent-specific Pronunciation Modelling for Robust Speech Recognition J.J. Humphries, P.C. Woodland, D. Pearce
Dictionary Learning for Spontaneous Speech Recognition Tilo Sloboda, Alex Waibel
Comparison of Channel Normalisation Techniques for Automatic Speech Recognition Over the Phone Johan de Veth, Louis Boves
Anchor Point Detection for Continuous Speech Recognition in Spanish: The Spotting of Phonetic Events Manuel A. Leandro, Jose M. Pardo
Cepstral Compensation by Polynomial Approximation for Environment-independent Speech Recognition Bhiksha Raj, Evandro B. Gouvêa, Pedro J. Moreno, Richard M. Stern
Effect of Speech Coders on Speech Recognition Performance B.T. Lilly, K.K. Paliwal
Wavelet Transforms For Non-uniform Speech Recogntion Systems Léonard Janer, Josep Martí, Climent Nadeu, Eduardo Lleida-Solano
A Binaural Model as a Front-end for Isolated Word Recognition Tsuyoshi Usagawa, Markus Bodden, Klaus Rateitschek
A New Speech Enhancement: Speech Stream Segregation Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata

SuA2S1 -- Multilingual Speech Processing II

Chair: Alex Waibel, Carnegie Mellon University

Head Automata for Speech Translation Hiyan Alshawi
Word Clustering with Parallel Spoken Language Corpora Ye-Yi Wang, John Lafferty, Alex Waibel
Toward Translating Korean Speech Into Other Languages Jae-Woo Yang, Youngjik Lee
VERBMOBIL: The Evolution of a Complex Large Speech-to-Speech Translation System Thomas Bub, Johannes Schwinn
Translation of Conversational Speech with JANUS-II Alon Lavie, Alex Waibel, Lori Levin, Donna Gates, Marsal Gavaldà, Torsten Zeppenfeld, Puming Zhan, Oren Glickman

SuP1L1 -- Data-based Synthesis

Chair: Yoshinori Sagisaka, ATR Interpreting Telecommunications Research Laboratory

Non-segmental Analysis and Synthesis Based on a Speech Database Andrew Slater, John Coleman
Microsegment Synthesis - Economic Principles in a Low-cost Solution Ralf Benzmüller, William J. Barry
Whistler: A Trainable Text-to-Speech System X.D. Huang, A. Acero, J. Adcock, H.W. Hon, J. Goldsmith, J. Liu, Mike Plumpe
Generation of Multiple Synthesis Inventories by a Bootstrapping Procedure Thomas Portele, Karl-Heinz Stöber, Horst Meyer, Wolfgang Hess
Modeling Segmental Duration in German Text-to-Speech Synthesis Bernd Möbius, Jan P.H. van Santen
Autolabelling Japanese ToBI Nick Campbell

SuP1L2 -- Speaker Identification and Verification

Chair: Doug Reynolds, MIT Lincoln Laboratory

General Phrase Speaker Verification Using Sub-word Background Models and Likelihood-ratio Scoring S. Parthasarathy, A.E. Rosenberg
Unknown-Multiple Signal Source Clustering Problem Using Ergodic HMM and Applied to Speaker Classification J. Murakami, M. Sugiyama, H. Watanabe
GMM and ARVM Cooperation and Competition for Text-independent Speaker Recognition on Telephone Speech J.-L. Le Floch, C. Montacié, M.-J. Caraty
Selective use of the Speech Spectrum and a VQGMM Method for Speaker Identification Qiguang Lin, Ea-Ee Jan, ChiWei Che, Dong-Suk Yuk, James L. Flanagan
Speaker Verification through Large Vocabulary Continuous Speech Recognition Michael Newman, Larry Gillick, Yoshiko Ito, Don McAllaster, Barbara Peskin
Predictive Neural Networks in Text Independent Speaker Verification: an Evaluation on the SIVA Database Andrea Paoloni, Susanna Ragazzini, G. Ravaioli

SuP1L3 -- Acoustic Phonetics

Chair: Nick Campbell, ATR Interpreting Telecommunications Research Laboratory

Durational Characterstics of Hindi Consonant Clusters Nisheeth Shrotriya, Rajesh Verma, S.K. Gupta, S.S. Agrawal
The Use of Wavelet Transforms in Phoneme Recognition Beng T. Tan, Minyue Fu, Andrew Spray, Phillip Dermody
Acoustic Properties of Phonemes in Continuous Speech for Different Speaking Rate Hisao Kuwabara
Prosodic Parameterization of Spoken Japanese Based on a Model of the Generation Process of F0 Contours Hiroya Fujisaki, Sumio Ohno
A Logistic Regression Model for Detecting Prominences Arman Maghbouleh
High-quality Prosodic Modification of Speech Signals Beat Pfister

SuP1P1 -- Perception of Vowels and Consonants

Chair: Doug Whalen, Haskins Laboratories

On the Syllable Structures of Chinese Relating to Speech Recognition Jialu Zhang
Perceptual Assimilation of American English Vowels by Japanese Listeners W. Strange, Reiko Akahane-Yamada, B.H. Fitzgerald, R. Kubo
Context and Speaker Effects in the Perceptual Assimilation of German Vowels by American Listeners W. Strange, O.-S. Bohn, S. A. Trent, M.C. McNair, K.C. Bielec
Examination of a Perceptual Non-native Speech Contrast: Pharyngealized/Non-pharyngealized Discrimination by French-speaking Adults Mohamed Zahid
Context-dependent Relevance of Burst and Transitions for Perceived Place in Stops: It's in Production, not Perception Roel Smits
The Perception of Morae in Long Vowels Comparison Among Japanese, Korean and English Speakers Ryoji Baba, Kaori Omuro, Hiromitsu Miyazono, Tsuyoshi Usagawa, Masahiko Higuchi
Juncture Cues to Disfluency Robin J. Lickley
Effects of Duration and Formant Movement on Vowel Perception James R. Sawusch
Benchmarking Human Performance for Continuous Speech Recognition N. Deshmukh, R.J. Duncan, A. Ganapathiraju, J. Picone
Intelligibility of Speech with Filtered Time Trajectories of Spectral Envelopes Takayuki Arai, Misha Pavel, Hynek Hermansky, Carlos Avendano
Perceptual Use of Vowel and Speaker Information in Breath Sounds D. H. Whalen, Sonya M. Sheffert
The Role of Neighborhood Relative Frequency in Spoken Word Recognition Philippe Mousty, Monique Radeau, Ronald Peereman, Paul Bertelson
Transitional Probability and Phoneme Monitoring James M. McQueen, Mark A. Pitt
Identification of Vowel Features from French Stop Bursts Anne Bonneau
Listening in a Second Language Z.S. Bond, Thomas J. Moore, Beverley Gable
Acoustic Correlates to the Effects of Talker Variability on the Perception of English /r/ and /l/ by Japanese Listeners James S. Magnuson, Reiko Akahane-Yamada
Perception of Lexical Tone Across Languages: Evidence for a Linguistic Mode of Processing Denis Burnham, Elizabeth Francis, Di Webster, Sudaporn Luksaneeyanawin, Chayada Attapaiboon, Francisco Lacerda, Peter Keller

SuP2LP -- Closing Ceremony and Plenary Lecture

Chairs: H. Timothy Bunnell, Alfred I. duPont Institute; and Richard A. Foulds, Alfred I. duPont Institute

Natural Communication with Machines - Progress and Challenge James L. Flanagan