Table of Contents
Volume 1
ThA1LP -- Opening Ceremony and Plenary Lecture
1 The Comparative Study of Spoken-Language Processing
Anne Cutler
ThA2L1 -- Large Vocabulary
2 New Developments in the INRS Continuous Speech Recognition System
Z. Li, M. Heon, Douglas O'Shaughnessy
6 On Designing Pronunciation Lexicons for Large Vocabulary, Continuous Speech Recognition
Lori Lamel, Gilles Adda
10 Word Graph Rescoring Using Confidence Measures
Pablo Fetter, Frédéric Dandurand, Peter Regel-Brietzmann
14 A Bottom-up Approach for Handling Unseen Triphones in Large Vocabulary Continuous Speech Recognition
X.L. Aubert, Peter Beyerlein, Meinhard Ullrich
18 Discriminative Optimisation of Large Vocabulary Recognition Systems
V. Valtchev, P.C. Woodland, S. J. Young
22 Japanese Large-vocabulary Continuous-speech Recognition using a Business-newspaper Corpus
Tatsuo Matsuoka, Katsutoshi Ohtsuki, Takeshi Mori, Sadaoki Furui, Katsuhiko Shirai
26 Handling Compound Nouns in a Swedish Speech-understanding System
David Carter, Jaan Kaja, Leonardo Neumeyer, Manny Rayner, Fuliang Weng, Mats Wiren
30 Initial Evaluation of a Preselection Module for a Flexible Large Vocabulary Speech Recognition System in
Telephone Environment
J. Macias-Guarasa, A. Gallardo, J. Ferreiros, Jose M. Pardo, L. Villarrubia
ThA2L2 -- Multimodal ASR (Face and Lips)
34 Asynchronous Integration of Visual Information in an Automatic Speech Recognition System
Mamoun Alissali, Paul Deleglise, Alexandrina Rogozan
38 Audiovisual Speech Recognition using Multiscale Nonlinear Image Decomposition.
I.A. Matthews, J. Bangham, S.J. Cox
42 Robust Audiovisual Integration using Semicontinuous Hidden Markov Models
Qin Su, Peter L. Silsbee
46 The Effect of Visual Information on Word Initial Consonant Perception of Dysarthric Speech
Richard P. Schumeyer, Kenneth E. Barner
50 A Multiple Deformable Template Approach for Visual Speech Recognition
Devi Chandramohan, Peter L. Silsbee
54 Speaker Independent Bimodal Phonetic Recognition Experiments
P. Cosi, E. Magno Caldognetto, F. Ferrero, M. Dugatto, K. Vagges
58 Speechreading using Shape and Intensity Information
Juergen Luettin, Neil A. Thacker, Steve W. Beet
62 Speaker Identification by Lipreading
Juergen Luettin, Neil A. Thacker, Steve W. Beet
ThA2L3 -- Perception of Words
66 How Word Onsets Drive Lexical Access and Segmentation: Evidence from Acoustics, Phonology and Processing
David W. Gow Jr., Janis Melvold, Sharon Manuel
70 RAW: A Real-speech Model for Human Word Recognition
David van Kuijk, Peter Wittenburg, Ton Dijkstra
74 How Facilitatory can Lexical Information Be During Word Recognition? Evidence from Moroccan Arabic
Mehdi Meftah, Sami Boudelaa
78 Effects of Frequency on the Auditory Perception of Open- Versus Closed-class Words
Alette P. Haveman
82 Phonotactic and Metrical Influences on Adult Ratings of Spoken Nonsense Words
Michael S. Vitevitch, Paul A. Luce, Jan Charles-Luce, David Kemmerer
86 Lipreading Supplemented by Voice Fundamental Frequency: To What Extent Does the Addition of Voicing Increase
Lexical Uniqueness for the Lipreader?
Edward T. Auer Jr., Lynne E. Bernstein
90 Strategies Used in Rhyme-Monitoring
S. te Riele, S.G. Nooteboom, H. Quené
94 How do Dutch Listeners Process Words with Epenthetic Schwa?
Wilma van Donselaar, Cecile Kuijpers, Anne Cutler
ThA2P1 -- Phonetics, Transcription, and Analysis
98 Whole-word Phonetic Distances and the PGPfone Alphabet
Patrick Juola, Philip Zimmermann
102 Automatic Vowel Quality Description using a Variable Mapping to an Eight Cardinal Vowel Reference Set
Shuping Ran, J. Bruce Millar, Phil Rose
106 Automatic Detection and Segmentation of Pronunciation Variants in German Speech Corpora
Andreas Kipp, Maria-Barbara Wesenick, Florian Schiel
110 ANGIE: A New Framework for Speech Analysis Based on Morpho-phonological Modelling
Stephanie Seneff, Raymond Lau, Helen Meng
114 Perceptual Contrast in the Korean and English Vowel System Normalized
Byunggon Yang
118 On Phonetic Characteristics of Pause in the Korean Read Speech
Yong-Ju Lee, Sook-hyang Lee
121 Cross-Language Effects of Lexical Stress in Word Recognition: The Case of Arabic English Bilinguals
Sami Boudelaa, Mehdi Meftah
125 Automatic Generation of German Pronunciation Variants
Maria-Barbara Wesenick
129 Estimating the Quality of Phonetic Transcriptions and Segmentations of Speech Signals
Maria-Barbara Wesenick, Andreas Kipp
133 An Acoustic Analysis of Contemporary Vowels of the Standard Slovenian Language
Bojan Petek, Rastislav Sustarsic,Smiljana Komar
137 Using Decision Trees to Construct Optimal Acoustic Cues
Sandrine Robbe, Anne Bonneau, Sylvie Coste, Yves Laprie
141 Maximum Jaw Displacement in Contrastive Emphasis
Donna Erickson, Osamu Fujimura
145 Subglottal Pressure and Final Lowering in English
Rebecca Herman, Mary Beckman, Kiyoshi Honda
149 Phonological Variation: Epenthesis and Deletion of Schwa in Dutch
Cecile Kuijpers, Wilma van Donselaar, Anne Cutler
Populations
ThA2P2 -- Spoken Language Processing for Special Populations
153 Feedback Considerations for Speech Training Systems
James J. Mahshie
157 Clinical Applications of Computer-Based Speech Training for Children with Hearing Impairment
Anne-Marie Öster
161 Enhancing Information-rich Regions of Natural VCV and Sentence Materials Presented in Noise
Valerie Hazan, Andrew Simpson
165 Speech Perceptual Abilities of Children with Specific Reading Difficulty (Dyslexia)
Valerie Hazan, Alan Adlard
169 Bimodal Perception of Spectrum Compressed Speech
Larry D. Paarmann, Michael K. Wynne
173 Effect of Sentential Context on Syllabic Stress Perception by Hearing-impaired Listeners
Dragana Barac-Cikoja, Sally Revoile
176 Applications of Automatic Speech Recognition to Speech and Language Development in Young Children
Martin Russell, Catherine Brown, Adrian Skilling, Rob Series, Julie Wallace, Bill Bohnam, Paul Barker
180 Sub-band Adaptive Speech Enhancement for Hearing Aids
D. R. Campbell
184 Adapting a TTS System to a Reading Machine for the Blind
Thomas Portele, Juergen Kraemer
ThA2S1 -- Dialogue Special Session I
188 Modeling of Spoken Dialogue with and without Visual Information
Katsuhiko Shirai
192 Multimodal Discourse Modelling in a Multi-user Multi-domain Environment
Stephanie Seneff, David Goddeau, Christine Pao, Joseph Polifroni
196 Automatic Acquisition of Probabilistic Dialogue Models
Kenji Kita, Yoshikazu Fukui, Masaaki Nagata, Tsuyoshi Morimoto
200 Units of Dialogue Management: An Example
Paul Heisterkamp, Scott McGlashan
204 Error Resolution During Multimodal Human-computer Interaction
Sharon Oviatt, Robert VanGent
208 Improved Spontaneous Dialogue Recognition Using Dialogue and Utterance Triggers by Adaptive Probability Boosting
Ramesh R. Sarukkai, Dana H. Ballard
212 Speech Recognition for Spontaneously Spoken German Dialogues
Kai Hübener, Uwe Jost, Henrik Heine
216 Using Prosodic Information to Constrain Language Models for Spoken Dialogue
Paul Taylor, Hiroshi Shimodaira, Stephen Isard, Simon King, Jacqueline Kowtko
ThP1L1 -- Language Modeling I
220 Combination of Word-based and Category-based Language Models
T.R. Niesler, P.C. Woodland
224 A Multi-level Lexical-semantics Based Language Model Design for Guided Integrated Continuous Speech Recognition
Francisco J. Valverde-Albacete, Jose M. Pardo
228 A Category Based Approach for Recognition of Out-of-Vocabulary Words
Florian Gallwitz, Elmar Noeth, Heinrich Niemann
232 Scalable Backoff Language Models
Kristie Seymore, Ronald Rosenfeld
236 Modeling Long Distance Dependence in Language: Topic Mixtures vs. Dynamic Cache Models
R. Iyer, Mari Ostendorf
240 Bayesian Estimation Methods for N-Gram Language Model Adaptation
Marcello Federico
ThP1L2 -- Feature Extraction for Speech Recognition I
244 Feature Dimension Reduction Using Reduced-Rank Maximum Likelihood Estimation for Hidden Markov Models
Don X. Sun
248 Using Multi-Level Segmentation Coefficients to Improve HMM Speech Recognition
Kai Hübener
252 A Comparative Study of Linear Feature Transformation Techniques for Automatic Speech Recognition
T. Eisele, R. Haeb-Umbach, D. Langmann
256 Inclusion of Temporal Information into Features for Speech Recognition
Ben Milner
260 New Cepstral Representation using Wavelet Analysis and Spectral Transformation for Robust Speech Recognition
Hubert Wassner, Gérard Chollet
264 Wavelet Based Feature Extraction for Phoneme Recognition
C.J. Long, S. Datta
ThP1L3 -- Speech Production - Measurement and Modeling
268 Extraction of Tongue Contours in X-ray Images with Minimal User Interaction
Yves Laprie, Marie-Odile Berger
272 Three-dimensional Measurement of the Vocal Tract by MRI
Didier Demolin, Thierry Metens, Alain Soquet
276 Syllable Affiliation of Final Consonant Clusters Undergoes a Phase Transition Over Speaking Rates
Philip Gleason, Betty Tuller, J. A. Scott Kelso
279 Towards a Biomechanical Model of the Larynx
Arthur Lobo, Michael O'Malley
283 Generating Intonation by Superposing Gestures
Yann Morlec, Gérard Bailly, Vèronique Aubergé
287 Effects of Auditory Feedback on F0 Trajectory Generation
Hideki Kawahara, Hiroko Kato, J. C. Williams
ThP1P1 -- Speech Coding / HMMs and NNs in ASR
291 On the Effects of Accent and Language on Low Rate Speech Coders
I. S. Burnett, J. J. Parry
295 VQ Codevector Index Assignment Using Genetic Algorithms for Noisy Channels
J.S. Pan, Fergus R. McInnes, Mervyn A. Jack
299 An Improved Vector Quantization Algorithm for Speech Transmission Over Noisy Channels
Gavin C. Cawley
302 Very Low Delay and High Quality Coding of 20 Hz-15 kHz Speech Signals at 64 kbit/s
C. Murgia, G. Feng, A. Le Guyader, C. Quinquis
306 Application of Speaker Modification Techniques to Phonetic Vocoding
Carlos M. Ribeiro, Isabel M. Trancoso
310 Entropy Coded Vector Quantization with Hidden Markov Models
Tadashi Yonezaki, Kiyohiro Shikano
314 An Application of Recurrent Neural Networks to Low Bit Rate Speech Coding
Minoru Kohata
318 CELP Coding System Based on Mel-Generalized Cepstral Analysis
Kazuhito Koishida, Keiichi Tokuda, Takao Kobayashi, Satoshi Imai
322 Wideband Re-synthesis of Narrowband CELP-coded Speech Using Multiband Excitation Model
Cheung-Fat Chan, Wai-Kwong Hui
326 Recurrent Neural Networks for Phoneme Recognition
Takuya Koizumi, Mikio Mori, Shuji Taniguchi, Mitsutoshi Maruya
330 A Model for the Acoustic Phonetic Structure of Arabic Language using a Single Ergodic Hidden Markov Model
M.A. Mokhtar, A. Zein-el-Abddin
334 Modelling Long Term Variability Information in Mixture Stochastic Trajectory Framework
Yifan Gong, Irina Illina, Jean-Paul Haton
338 Segmental Phonetic Features Recognition by means of Neural-fuzzy Networks and Integration in an N-best Solutions
Post-processing
T. Moudenc, R. Sokol, G. Mercier
342 Stochastic Trajectory Model with State-Mixture for Continuous Speech Recognition
Irina Illina, Yifan Gong
346 Recognition of Spelled Names over the Telephone
Hermann Hild, Alex Waibel
350 Optimal Tying of HMM Mixture Densities using Decision Trees
Gilles Boulianne, Patrick Kenny
354 Speech Recognition Using an Enhanced FVQ Based on a Codeword Dependent Distribution Normalization and Codeword
Weighting by Fuzzy Objective Function
Hwan Jin Choi, Yung Hwan Oh
358 Using the Self-Organizing Map to Speed up the Probability Density Estimation for Speech Recognition with Mixture
Density HMMs
Mikko Kurimo, Panu Somervuo
ThP1S1 -- Dialogue Special Session II
362 Combining the Detection and Correction of Speech Repairs
Peter A. Heeman, Kyung-ho Loken-Kim, James F. Allen
366 Generating Spontaneous Elliptical Utterance
Yuji Sagawa, Wataru Sugimoto, Noboru Ohnishi
370 Developing the Modelling of Swedish Prosody in Spontaneous Dialogue
Gösta Bruce, Marcus Filipsson, Johan Frid, Björn Granström, Kjell Gustafson, Merle Horne, David House,
Birgitta Lastow, Paul Touati
374 Spoken Language Generation in a Multimedia System
Shimei Pan, Kathleen R. McKeown
378 Synthesizing Dialogue Speech of Japanese Based on the Quantitative Analysis of Prosodic Features
Keikichi Hirose, Mayumi Sakata, Hiromichi Kawanami
382 Spoken Dialogue Interface in a Dual Task Situation
Shuichi Tanaka, Shu Nakazato, Keiichiro Hoashi, Katsuhiko Shirai
ThP1S2 -- Neural Models of Speech Processing I
* How is Information About Speech Encoded in the Peripheral Auditory System?
Eric D. Young
* Spectral Shape Analysis in the Central Auditory System
Shihab Shamma
ThP2L1 -- Language Modeling II
386 Modeling Disfluencies in Conversational Speech
Man-hung Siu, Mari Ostendorf
390 Evaluation of a Language Model using a Clustered Model Backoff
John Miller, Fil Alleva
394 Language Modeling Using X-grams
Antonio Bonafonte, José B. Mariño
398 Class Phrase Models For Language Modelling
Klaus Ries, Finn Dag Buo, Alex Waibel
402 Introducing Linguistic Constraints into Statistical Language Modeling
Petra Geutner
406 Language Modeling with Stochastic Automata
Jianying Hu, William Turin, Michael K. Brown
ThP2L2 -- Feature Extraction for Speech Recognition II
410 New Fast Wavelet Packet Transform Algorithms for Frame Synchronized Speech Processing
Andrzej Drygajlo
414 Frequency-Warping in Speech
S. Umesh, L. Cohen, N. Marinovic, D. Nelson
418 Extracting Speech Features from Human Speech-like Noise
Daisuke Kobayashi, Shoji Kajita, Kazuya Takeda, Fumitada Itakura
422 Subband-Crosscorrelation Analysis for Robust Speech Recognition
Shoji Kajita, Kazuya Takeda, Fumitada Itakura
426 A New ASR Approach Based on Independent Processing and Recombination of Partial Frequency Bands
Hervé Bourlard, Stéphane Dupont
430 Frequency and Time Filtering of Filter-bank Energies for HMM Speech Recognition
Climent Nadeu, José B. Mariño, Javier Hernando, Albino Nogueiras
ThP2L3 -- Vowels
434 Temporal Cues for Vowels and Universals of Vowel Inventories
Carrie E. Lang, John J. Ohala
438 Acoustic Variability in Spontaneous Conversational Speech of American English Talkers
Ann K. Syrdal
442 Cross-language Speech Perception: Swedish, English, and Spanish Speakers' Perception of Front Rounded Vowels
Raquel Willerman, Patricia K. Kuhl
446 Inter-language Vowel Perception and Production by Korean and Japanese Listeners
John C.L. Ingram, See-Gyoon Park
450 Intelligibility and Acoustic Correlates of Japanese Accented English Vowels
Diane Kewley-Port, Reiko Akahane-Yamada, Kiyoaki Aikawa
454 Segmentation Strategies for Spoken Language Recognition: Evidence from Semi-bilingual Japanese Speakers of English
Kiyoko Yoneyama
ThP2P1 -- NNs and Stochastic Modeling
458 Integrating Connectionist, Statistical and Symbolic Approaches for Continuous Spoken Korean Processing
Geunbae Lee, Jong-Hyeok Lee, Kyubong Park, Byung-Chang Kim
462 Towards ASR on Partially Corrupted Speech
Hynek Hermansky, Sangita Timberwala, Misha Pavel
466 Parametric Trajectory Models for Speech Recognition
Herbert Gish, Kenney Ng
470 Use of Gaussian Selection in Large Vocabulary Continuous Speech Recognition Using HMMs
K.M. Knill, M.J.F. Gales, S. J. Young
474 Cross Phone State Clustering using Lexical Stress and Context
J. Hogberg, K. Sjolander
478 Likelihood Ratio Decoding and Confidence Measures for Continuous Speech Recognition
Eduardo Lleida, Richard C. Rose
482 A Study on Continuous Chinese Speech Recognition Based on Stochastic Trajectory Models
Xiaohui Ma, Yifan Gong, Yuqing Fu, Jiren Lu, Jean-Paul Haton
486 A Proposal for a New Algorithm of Reference Interval-free Continuous DP for Real-time Speech or Text Retrieval
Yoshiaki Itoh, Jiro Kiyama, Hiroshi Kojima, Susumu Seki, Ryuichi Oka
490 Language Modeling by String Pattern N-gram for Japanese Speech Recognition
Akinori Ito, Masaki Kohda
494 Statistical Language Modeling using a Variable Context Length
Reinhard Kneser
498 A Comparison of Hybrid HMM Architectures Using Global Discriminative Training
Finn Tore Johansen
502 Improved Probability Estimation with Neural Network Models
Wei Wei, Etienne Barnard, Mark Fanty
506 A Neural Network Using Acoustic Sub-word Units for Continuous Speech Recognition
Ha-Jin Yu, Yung-Hwan Oh
510 On the Error Criteria in Neural Networks as a Tool for Human Classification Modelling
Louis F. M. ten Bosch, Roel Smits
514 A Non-linear Filtering Approach to Stochastic Training of the Articulatory-acoustic Mapping Using the EM Algorithm
Gordon Ramsay
518 A Tool for Automated Design of Language Models
Y.P. Yang, J.R. Deller Jr.
522 Acoustic-phonetic Decoding Based on Elman Predictive Neural Networks
F. Freitag, E. Monte
526 On Improving Discrimination Capability of an RNN Based Recognizer
Tan Lee, P.C. Ching
530 An Evaluation of Statistical Language Modeling for Speech Recognition using a Mixed Category of Both Words and
Parts-of-speech
Yumi Wakita, Jun Kawai, Hitoshi Iida
ThP2S1 -- Dialogue Special Session III
534 A Dialogue Control Strategy Based on the Reliability of Speech Recognition
Yasuhisa Niimi, Yutaka Kobayashi
538 SpeechWear: A Mobile Speech System
Alexander I. Rudnicky, Stephen Reed, Eric H. Thayer
542 WHEELS: A Conversational System in the Automobile Classifieds Domain
Helen Meng, Senis Busayapongchai, James Glass, David Goddeau, Lee Hetherington, Edward Hurley, Christine
Pao, Joseph Polifroni, Stephanie Seneff, Victor Zue
546 Effective Human-computer Cooperative Spoken Dialogue: The AGS Demonstrator
M.D. Sadek, A. Ferrieux, A. Cozannet, P. Bretier, F. Panaget, J. Simonin
550 Dialog in the RAILTEL Telephone-based System
S.K. Bennacef, L. Devillers, S. Rosset, Lori Lamel
554 Dialogue Processing in a Conversational Speech Translation System
Alon Lavie, Lori Levin, Yan Qu, Alex Waibel, Donna Gates, Marsal Gavaldà, Laura Mayfield, Maite Taboada
ThP2S2 -- Neural Models of Speech Processing II
558 Novel Speech Processing Mechanism Derived from Auditory Neocortical Circuit Analysis
Boris Aleksandrovsky, James Whitson, Gretchen Andes, Gary Lynch, Richard Granger
562 Modeling Neurons in the Anteroventral Cochlear Nucleus for Amplitude Modulation (AM) Processing: Application to
Speech Sound
Ping Tang, Jean Rouat
566 Noise Suppression and Loudness Normalization in an Auditory Model-based Acoustic Front-end
Halewijn Vereecken, Jean-Pierre Martens
570 A Psychoacoustic Model for the Noise Masking of Voiceless Plosive Bursts
Jim Hant, Brian Strope, Abeer Alwan
574 Training Machine Classifiers to Match the Performance of Human Listeners in a Natural Vowel Classification Task
Martin Hunke, Thomas Holton
578 A Neural Matrix Model for Active Tracking of Frequency-modulated Tones
Kiyoaki Aikawa, Hideki Kawahara, Minoru Tsuzaki
Volume 2
FrA1L1 -- Utterance Verification and Word Spotting
582 A User-Configurable System for Voice Label Recognition
Richard C. Rose, Eduardo Lleida, G.W. Erhart, R.V. Grubbe
586 Keyword Spotting Enhancement for Video Soundtrack Indexing
Philippe Gelin, Chris. J. Wellekens
590 New Efficient Fillers for Unlimited Word Recognition and Keyword Spotting
Rachida El Méliani, Douglas O'Shaughnessy
594 Automatic Transcription of General Audio Data: Preliminary Analyses
Michelle S. Spina, Victor Zue
598 Transcribing Radio News
Francis Kubala, Tasos Anastasakos, Hubert Jin, Long Nguyen, Richard Schwartz
602 Correcting Recognition Errors via Discriminative Utterance Verification
Anand R. Setlur, Rafid A. Sukkar, John Jacob
FrA1L2 -- Acquisition/Learning Training L2 Learners
606 Does Training in Speech Perception Modify Speech Production?
Reiko Akahane-Yamada, Yoh'ichi Tohkura, Ann R. Bradlow, David B. Pisoni
610 Phrase-Final Lengthening and Stress-Timed Shortening in the Speech of Native Speakers and Japanese Learners of
English
Motoko Ueyama
614 Japanese Accentuations by Foreign Students and Japanese Speakers of Non-Tokyo Dialect
Nobuko Yamada
618 Devoicing of Japanese Vowels by Taiwanese Learners of Japanese
J. Kevin Varden, Tsutomu Sato
622 Fluency and Use of Segmental Dialect Features in the Acquisition of a Second Language (French) by English Speakers
Danièle Archambault, Catherine Foucher, Blagovesta Maneva
626 Estimating Child and Adolescent Formant Frequency Values From Adult Data
P. Martland, S.P. Whiteside, Steve W. Beet, L. Baghai-Ravary
FrA1L3 -- Focus, Stress and Accent
630 Acoustic Correlates of Linguistic Stress and Accent in Dutch and American English
Agaath M.C. Sluijter, Vincent J. van Heuven
634 On the Levels of Accentuation in Spoken Japanese
Hiroya Fujisaki, Sumio Ohno, Osamu Tomita
638 Tonal Distinctions Between Emphatic Stress and Pretonic Lengthening in Quebec French
Linda Thibault, Marise Ouellet
642 Distinction Between 'Normal' Focus and 'Contrastive/Emphatic' Focus
Anja (Petzold) Elsner
646 Perception of Tonal Accent by Americans Learning Japanese
Yukihiro Nishinuma, Masako Arai, Takako Ayusawa
650 Modeling Intra-Speaker Pitch Range Variation: Predicting F0 Targets when "Speaking Up"
Elizabeth Shriberg, D. Robert Ladd, Jacques Terken
FrA1P1 -- Spoken Language Dialogue and Conversation
654 Predicting Dialogue Acts for a Speech-To-Speech Translation System
Norbert Reithinger, Ralf Engel, Michael Kipp, Martin Klesen
658 Automatic Speech Translation Based on the Semantic Structure
Johannes Müller, Holger Stahl, Manfred Lang
662 A Methodology for Application Development for Spoken Language Systems
Lewis M. Norton, Carl E. Weir, K.W. Scholz, Deborah A. Dahl, Ahmed Bouzid
665 A New Restaurant Guide Conversational System: Issues in Rapid Prototyping for Specialized Domains
Stephanie Seneff, Joseph Polifroni
669 Semantic Interpretation of a Japanese Complex Sentence in an Advisory Dialogue - Focused on the Postpositional
Word "KEDO,'' Which Works as a Conjunction Between Clauses
Tadahiko Kumamoto, Akira Ito
673 A Korean Morphological Analyzer for Speech Translation System
Youngkuk Hong, Myoung-Wan Koo, Gijoo Yang
677 Generic and Domain-specific Aspects of the Waxholm NLP and Dialog Modules
Rolf Carlson, Sheri Hunnicutt
681 A Real-Time System for Summarizing Human-Human Spontaneous Spoken Dialogues
Megumi Kameyama, Goh Kawai, Isao Arima
685 Evaluation of Spoken Language Understanding and Dialogue Systems
Bernd Hildebrandt, Heike Rautenstrauch, Gerhard Sagerer
689 Inter-Speaker Interaction of F0 in Dialogs
Kuniko Kakita
693 A Robust Dialogue System for Making an Appointment
Hans Brandt-Pook, Gernot A. Fink, Bernd Hildebrandt, Franz Kummert, Gerhard Sagerer
697 Segmentation of Spoken Dialogue by Interjections, Disfluent Utterances and Pauses
Kazuyuki Takagi, Shuichi Itahashi
701 A Form-Based Dialogue Manager for Spoken Language Applications
David Goddeau, Helen Meng, Joe Polifroni, Stephanie Seneff, Senis Busayapongchai
705 The Design of Complex Telephony Applications Using Large Vocabulary Speech Technology
S.J. Whittaker, D.J. Attwater
709 Building 10,000 Spoken Dialogue Systems
Stephen Sutton, David G. Novick, Ronald A. Cole, Pieter Vermeulen, Jacques de Villiers, Johan Schalkwyk,
Mark Fanty
713 Speaker Intention Modeling for Large Vocabulary Mandarin Spoken Dialogues
Yen-Ju Yang, Lee-Feng Chien, Lin-Shan Lee
717 Hybrid Language Models and Spontaneous Legal Discourse
P.E. Kenne, Mary O'Kane
721 Topic Change and Local Perplexity in Spoken Legal Dialogue
P.E. Kenne, Mary O'Kane
725 Intonational Cues to Discourse Structure in Japanese
Jennifer J. Venditti, Marc Swerts
729 Principles for the Design of Cooperative Spoken Human-Machine Dialogue
Niels Ole Bernsen, Hans Dybkjær, Laila Dybkjær
733 Development and Comparison of Three Syllable Stress Classifiers
Karen L. Jenkin, Michael S. Scordilis
FrA1P2 -- Speech Disorders
737 Interaction of Speech Disorders with Speech Coders: Effects on Speech Intelligibility
D.G. Jamieson, Li Deng, M. Price, Vijay Parsa, J. Till
741 Detecting Arytenoid Cartilage Misplacement through Acoustic and Electroglottographic Jitter Analysis
Maurílio N. Vieira, Arnold G. D. Maran, Fergus R. McInnes, Mervyn A. Jack
745 Robust F0 and Jitter Estimation in Pathological Voices
Maurílio N. Vieira, Fergus R. McInnes, Mervyn A. Jack
749 Speech Monitoring of Infective Laryngitis
F. Plante, H. Kessler, B.M.G. Cheetham, J. Earis
753 Searching for Nonlinear Relations in Whitened Jitter Time Series
J. Schoentgen, R. De Guchteneere
757 Vocal Fold Pathology Assessment using AM Autocorrelation Analysis of the Teager Energy Operator
Liliana Gavidia-Ceballos, John H.L. Hansen, James F. Kaiser
761 Continuous Positive Airway Pressure (CPAP) in the Treatment of Hypernasality
David P. Kuehn
764 Enhancement of Alaryngeal Speech by Adaptive Filtering
Carol Y. Espy-Wilson, Venkatesh R. Chari, Caroline B. Huang
768 Simulation of Disordered Speech Using a Frequency-Domain Vocal Tract Model
Li Deng, Xuemin Shen, D.G. Jamieson, J. Till
772 A Stochastic Model of Fundamental Period Perturbation and Its Application to Perception of Pathological Voice
Quality
Yasuo Endo, Hideki Kasuya
776 A Screening Test for Speech Pathology Assessment Using Objective Quality Measures
Eric J. Wallen, John H.L. Hansen
780 Recent Advances in Hypernasal Speech Detection using the Nonlinear Teager Energy Operator
Douglas A. Cairns, John H.L. Hansen, James F. Kaiser
FrA1S1 -- Vocal Tract Geometry I
784 Human Palate and Related Structures: Their Articulatory Consequences
Kiyoshi Honda, Shinji Maeda, Michiko Hashi, Jim Dembowski, John R. Westbury
788 A Continuum Mechanics Representation of Tongue Deformation
Edward P. Davis, Andrew Douglas, Maureen Stone
793 From MRI and Acoustic Data to Articulatory Synthesis: A Case Study of the Lateral Approximants in American English
Philbert Bangayan, Abeer Alwan, Shrikanth Narayanan
797 Liquids in Tamil
Shrikanth Narayanan, Abigail Kaun, Dani Byrd, Peter Ladefoged, Abeer Alwan
FrA2L1 -- Prosody in ASR and Segmentation
801 Modeling Hyperarticulate Speech during Human-computer Error Resolution
Sharon Oviatt, Gina-Anne Levow, Margaret MacEachern, Karen Kuhn
805 Using Stress to Disambiguate Spoken Thai Sentences Containing Syntactic Ambiguity
Siripong Potisuk, Mary P. Harper, Jackson T. Gandour
809 Use of Prosodic Information to Integrate Acoustic and Linguistic Knowledge in Continuous Mandarin Speech
Recognition with Very Large Vocabulary
Hung-yun Hsieh, Ren-yuan Lyu, Lin-shan Lee
813 Word Boundary Detection using Pitch Variations
G.V. Ramana Rao, J. Srichand
817 Detection of Phrase Boundaries in Japanese by Low-Pass Filtering of Fundamental Frequency Contours
Atsuhiro Sakurai, Keikichi Hirose
821 A New Method for Speech Delexicalization, and its Application to the Perception of French Prosody
V. Pagel, N. Carbonell, Yves Laprie
FrA2L2 -- Acquisition and Learning by Machine
825 Task Adaptation for Dialogues Via Telephone Lines
Udo Bub
829 The Influence of Bigram Constraints on Word Recognition by Humans: Implications for Computer Speech Recognition
Ronald A. Cole, Yonghong Yan, Troy Bailey
833 ALICE: Acquisition of Language In Conversational Environment - An Approach to Weakly Supervised Training of
Spoken Language System for Language Porting
Tetsunori Kobayashi
837 Pitch Pattern Clustering of User Utterances in Human-Machine Dialogue
Takashi Yoshimura, Satoru Hayamizu, Hiroshi Ohmura, Kazuyo Tanaka
841 Simplifying Language through Error-correcting Decoding
J.C. Amengual, E. Vidal, J.M. Benedí
845 A Mixed Approach to Speech Understanding
Mauro Cettolo, Anna Corazza, Renato De Mori
FrA2L3 -- Dialogue Systems
849 Speech Recognition for an Information Kiosk
J.L. Gauvain, J.J. Gangolf, L. Lamel
853 Localizing an Automatic Inquiry System for Public Transport Information
Helmer Strik, Albert Russel, Henk van den Heuvel, Catia Cucchiarini, Louis Boves
857 Prompt Constrained Natural Language - Evolving the Next Generation of Telephony Services
Stephen M. Marcus, Deborah W. Brown, Randy G. Goldberg, Max S. Schoeffler, William R. Wetzel, Richard R.
Rosinski
861 Key-Phrase Detection and Verification for Flexible Speech Understanding
Tatsuya Kawahara, Chin-Hui Lee, Biing-Hwang Juang
865 Interactive Recovery from Speech Recognition Errors in Speech User Interfaces
Bernhard Suhm, Brad Myers, Alex Waibel
869 Estimation of Language Models for New Spoken Language Applications
Sunil Issar
FrA2P1 -- Speech Enhancement and Robust Processing
873 H-infinity Filtering for Speech Enhancement
Xuemin Shen, Li Deng, Anisa Yasmin
877 A Comparitive Analysis of Channel-Robust Features and Channel Equalization Methods for Speech Recognition
Saeed V. Vaseghi, Ben Milner
881 Robust Speech Recognition Features Based on Temporal Trajectory Filtering of Frequency Band Spectrum
Jia-lin Shen, Wen-liang Hwang, Lin-shan Lee
885 Durational Modelling for Improved Connected Digit Recognition
Kevin Power
889 Study on the Dereverberation of Speech Based on Temporal Envelope Filtering
Carlos Avendano, Hynek Hermansky
893 Estimating Markov Model Structures
Thorsten Brants
897 A Fertility Channel Model for Post-Correction of Continuous Speech Recognition
Eric K. Ringger, James F. Allen
901 Restoration of Wide Band Signal from Telephone Speech using Linear Prediction Error Processing
Hiroshi Yasukawa
905 Smoothed Spectral Subtraction for a Frequency-Weighted HMM in Noisy Speech Recognition
Hiroshi Matsumoto, Noboru Naitoh
909 A Simple Architecture for using Multiple Cues in Sound Separation
William S. Woods, Martin Hansen, Thomas Wittkop, Birger Kollmeier
913 On the Robust Automatic Segmentation of Spontaneous Speech
Bojan Petek, Ove Andersen, Paul Dalsgaard
917 Bayesian Adaptation of Speech Recognizers to Field Speech Data
C.G. Miglietta, C. Mokbel, D. Jouvet, J. Monné
921 Sub-band Adaptive Filtering Applied to Speech Enhancement
A. J. Darlington, D. J. Campbell
925 Noise Robust Estimate of Speech Dynamics for Speaker Recognition
J.P. Openshaw, John S. Mason
929 Overview of Speech Enhancement Techniques for Automatic Speaker Recognition
Javier Ortega-García, Joaquín González-Rodríguez
933 Dynamic Features for Segmental Speech Recognition
Naomi Harte, Saeed V. Vaseghi, Ben Milner
937 Speech Recognition Based on a Model of Human Auditory System
Takuya Koizumi, Mikio Mori, Shuji Taniguchi
941 APVQ Encoder Applied to Wideband Speech Coding
J.M. Salavedra, E. Masgrau
945 Simple Fast Vector Quantization of the Line Spectral Frequencies
Jin Zhou, Yair Shoham, Ali Akansu
FrA2S1 -- Vocal Tract Geometry II
949 Speaker Individualities of Vocal Tract Shapes of Japanese Vowels Measured by Magnetic Resonance Images
Chang-Sheng Yang, Hideki Kasuya
953 Vocal Tract Acoustics Using the Transmission Line Matrix (TLM) Method
S. El-Masri, X. Pelorson, P. Saguet, P. Badin
957 Building Sensori-motor Prototypes from Audiovisual Exemplars
Gérard Bailly
961 Parameterized VT Area Function Inversion
Mats Båvegård, Gunnar Fant
965 An Improved Vocal Tract Model of Vowel Production Implementing Piriform Resonance and Transvelar Nasal Coupling
Jianwu Dang, Kiyoshi Honda
969 Pseudo-articulatory Speech Synthesis for Recognition using Automatic Feature Extraction from X-Ray Data
C. S. Blackburn, S. J. Young
FrP1L1 -- Speaker Adaptation and Normalization I
973 N-best-based Instantaneous Speaker Adaptation Method for Speech Recognition
Tomoko Matsui, Sadaoki Furui
977 Mixture Splitting Technic and Temporal Control in a HMM-based Recognition System
C. Montacié, M.-J. Caraty, C. Barras
981 A Unified Spectral Transformation Adaptation Approach for Robust Speech Recognition
Lei Yao, Dong Yu, Taiyi Huang
985 On-line Adaptive Learning of the Correlated Continuous Density Hidden Markov Models for Speech Recognition
Qiang Huo, Chin-Hui Lee
989 Speaker Adaptation by Modeling the Speaker Variation in a Continuous Speech Recognition System
Nikko Ström
993 An Enquiring System of Unknown Words in TV News by Spontaneous Repetition (Application of Speaker Normalization by
Speaker Subspace Projection)
Yasuo Ariki, Shigeaki Tagashira
FrP1L2 -- Spoken Language and NLP I
997 Language Understanding using Hidden Understanding Models
Richard Schwartz, Scott Miller, David Stallard, John Makhoul
1001 Processing of Semantic Information in Fluently Spoken Language
Allen L. Gorin
1005 Automatic Linguistic Segmentation of Conversational Speech
Andreas Stolcke, Elizabeth Shriberg
1009 Towards Understanding Spontaneous Speech: Word Accuracy vs. Concept Accuracy
M. Boros, W. Eckert, Florian Gallwitz, G. Görz, G. Hanrieder, Heinrich Niemann
1013 A Stochastic Case Frame Approach for Natural Language Understanding
Wolfgang Minker, S.K. Bennacef, J.L. Gauvain
1017 Improving Speech Understanding by Incorporating Database Constraints and Dialogue History
Frank Seide, Bernhard Rüber, Andreas Kellner
FrP1L3 -- Spoken Discourse Analysis/Synthesis
1021 A New Discourse Structure Model for Spontaneous Spoken Dialogue
Tetsuro Chino, Hiroyuki Tsuboi
1025 An Architecture for Spoken Dialogue Management
David Duff, Barbara Gates, Susann LuperFoy
1029 Pausing Strategies in Discourse in Dutch
Monique E. van Donzel, Florien J. Koopmans-van Beinum
1033 Filled Pauses as Markers of Discourse Structure
Marc Swerts, Anne Wichmann, Robbert-Jan Beun
1037 The Prosodic Analysis of Korean Dialogue Speech - Through a Comparative Study with Read Speech
Cheol-jae Seong, Minsoo Hahn
1041 Changing the Topic: How Long Does it Take?
Mary O'Kane, P.E. Kenne
FrP1P1 -- Acoustic Modeling I
1045 Learning Pronunciation Dictionary from Speech Data
Christian-Michael Westendorf, Jens Jelitto
1049 The Trended HMM with Discriminative Training for Phonetic Classification
C. Rathinavelu, Li Deng
1053 Improving Decision Trees for Acoustic Modeling
Ariane Lazaridès, Yves Normandin, Roland Kuhn
1057 An Improved Training Algorithm in HMM-based Speech Recognition
Gongjun Li, Taiyi Huang
1061 Speech Recognition Using a Strong Correlation Assumption for the Instantaneous Spectra
J. Ming, P. O'Boyle, J. McMahon, F. J. Smith
1065 On Parameter Filtering in Continuous Subword-unit-based Speech Recognition
Pau Pachès-Leal, Climent Nadeu
1069 Estimation of Statistical Phoneme Center Considering Phonemic Environments
Shigeki Okawa, Katsuhiko Shirai
1073 Integration of Context-dependent Durational Knowledge into HMM-based Speech Recognition
Xue Wang, Louis F. M. ten Bosch, Louis C. W. Pols
1077 Speech Recognition Based on Acoustically Derived Segment Units
T. Fukada, M. Bacchiani, K.K. Paliwal, Yoshinori Sagisaka
1081 Robust Gender-dependent Acoustic-phonetic Modelling in Continuous Speech Recognition Based on a New Automatic
Male/Female Classification
Rivarol Vergin, Azarshid Farhat, Douglas O'Shaughnessy
1085 A Codebook Adaptation Algorithm for SCHMM Using Formant Distribution
Tae Young Yang, Won Ho Shin, Weon Goo Kim, Dae Hee Youn
1089 Parameter Tying for Flexible Speech Recognition
J. Simonin, S. Bodin, D. Jouvet, K. Bartkova
1093 Word-spotting Based on Inter-word and Intra-word Diphone Models
Tsuneo Nitta, Shin'ichi Tanaka, Yasuyuki Masai, Hiroshi Matsu'ura
1097 Duration Modeling with Expanded HMM Applied to Speech Recognition
Antonio Bonafonte, Josep Vidal, Albino Nogueiras
1101 Different Strategies for Distribution Clustering using Discrete, Semicontinuous and Continuous HMMs in CSR
Ricardo de Córdoba, José M. Pardo
1105 Improved HMM Phone and Triphone Models for Realtime ASR Telephony Applications
Ilija Zeljkovic, Shrikanth Narayanan
1109 Improved Extended HMM Composition by Incorporating Power Variance
Yasuhiro Minami, Sadaoki Furui
1113 Optimal Filtering and Smoothing for Speech Recognition using a Stochastic Target Model
Gordon Ramsay, Li Deng
1117 Speech Recognition Using Syllable-Like Units
Zhihong Hu, Johan Schalkwyk, Etienne Barnard, Ronald A. Cole
FrP1S1 -- Physics and Simulation of the Vocal Tract I
1121 Search for Unexplored Effects in Speech Production
C.H. Coker, M.H. Krane, B.Y. Reis, R.A. Kubli
* Computational Models for Speech Generation
S. Levinson
1125 Articulatory Synthesis from X-rays and Inversion for an Adaptive Speech Robot
P. Badin, C. Abry
FrP2L1 -- Speaker Adaptation and Normalization II
1129 Adaptive Recognition Method Based on Posterior Use of Distribution Pattern of Output Probabilities
Jin-Song Zhang, Beiqian Dai, Changfu Wang, Hingkeung Kwan, Keikichi Hirose
1133 Iterative Unsupervised Adaptation Using Maximum Likelihood Linear Regression
P.C. Woodland, D. Pye, M.J.F. Gales
1137 A Compact Model for Speaker-Adaptive Training
Tasos Anastasakos, John McDonough, Richard Schwartz, John Makhoul
1141 Iterative Unsupervised Speaker Adaptation for Batch Dictation
Shigeru Homma, Jun-ichi Takahashi, Shigeki Sagayama
1145 Rapid Unsupervised Adaptation to Children's Speech on a Connected-Digit Task
Daniel C. Burnett, Mark Fanty
1149 Speaker Adaptation Using Tree Structured Shared-State HMMs
Jun Ishii, Masahiro Tonomura, Shoichi Matsunaga
FrP2L2 -- Spoken Language and NLP II
1153 Learning to Parse Spontaneous Speech
Finn Dag Buo, Alex Waibel
1157 Spontaneous Speech and Natural Language Processing ALPES: A Robust Semantic-led Parser
Jean-Yves Antoine
1161 The Natural Language Processing Module for a Voice Assisted Operator at Telefónica I+D
J. Alvarez-Cercadillo, J. Caminero-Gil, C. Crespo-Casas, D. Tapias-Merino
1165 Compound Words in Large-Vocabulary German Speech Recognition Systems
André Berton, Pablo Fetter, Peter Regel-Brietzmann
1169 Prosody, Empty Categories and Parsing - A Success Story
Anton Batliner, A. Feldhaus, S. Geissler, T. Kiss, Ralf Kompe, Elmar Nöth
1173 "Almost Parsing" Technique for Language Modeling
B. Srinivas
FrP2L3 -- Duration and Rhythm
1177 From Segmental Duration Properties to Rhythmic Structure: A Study of Interactions Between High and Low Level
Constraints
Marise Ouellet, Benoît Tardif
1181 Analysis of Context-dependent Segmental Duration for Automatic Speech Recognition
Xue Wang, Louis C. W. Pols, Louis F. M. ten Bosch
1185 The Role of the Rhythmic Groups in the Segmentation of Continuous French Speech
Delphine Dahan
1189 The Implications of Temporal Patterns for the Prosody of Boundary Signaling in Connected Speech
Zita McRobbie-Utasi
1193 Experimental Phonetic Study of the Syllable Duration of Korean with Respect to the Positional Effect
Hyunbok Lee, Cheol-jae Seong
1197 Timing of Pitch Movements and Accentuation of Syllables
Dik J. Hermes
FrP2P1 -- Acoustic Analysis
1201 A Probabilistic Approach to AMDF Pitch Detection
Goangshiuan S. Ying, Leah H. Jamieson, Carl D. Michell
1205 From Sagittal Cut to Area Function: An RMI Investigation
Alain Soquet, Véronique Lecuit, Thierry Metens, Didier Demolin
1209 Pitch Detection and Voiced/Unvoiced Decision Algorithm Based on Wavelet Transforms
Léonard Janer, Juan José Bonet, Eduardo Lleida-Solano
1213 Decomposition of Speech Signals into a Deterministic and a Stochastic Part
Yannis Stylianou
1217 Improved Glottal Closure Instant Detector based on Linear Prediction and Standard Pitch Concept
Cheol-Woo Jo, Ho-Gyun Bang, W.A. Ainsworth
1221 Analysis of Speech Segments using Variable Spectral/Temporal Resolution
Xihong Wang, Stephen A. Zahorian, Stefan Auberg
1225 Time-based Clustering for Phonetic Segmentation
Brian Eberman, William Goldenthal
1229 Formant Analysis Using Mixtures of Gaussians
Parham Zolfaghari, Tony Robinson
1233 Deriving Articulatory Representations from Speech with Various Excitation Modes
Hywel B. Richards, John S. Mason, Melvyn J. Hunt, John S. Bridle
1237 "Blind" Speech Segmentation: Automatic Segmentation of Speech Without Linguistic Knowledge
Manish Sharma, Richard J. Mammone
1241 Speech Synthesis Using a Nonlinear Energy Damping Model for the Vocal Folds Vibration Effect
Hiroshi Ohmura, Kazuyo Tanaka
1245 Neural Networks Learning with L1 Criteria and Its Efficiency in Linear Prediction of Speech Signals
Munehiro Namba, Hiroyuki Kamata, Yoshihisa Ishida
1249 Preprocessing and Neural Classification of English Stop Consonants [b,d,g,p,t,k]
A. Esposito, C. E. Ezin, M. Ceccarelli
1253 A Comparison of Modified k-means(MKM) and NN based Real Time Adaptive Clustering Algorithms for Articulatory Space
Codebook Formation
K.S. Ananthakrishnan
1257 A Novel Approach to the Estimation of Voice Source and Vocal Tract Parameters from Speech Signals
Wen Ding, Hideki Kasuya
1261 Syllable Detection in Read and Spontaneous Speech
Hartmut R. Pfitzinger, Susanne Burger, Sebastian Heid
1265 Maximum Likelihood Learning of Auditory Feature Maps for Stationary Vowels
Kuansan Wang, Chin-Hui Lee, Biing-Hwang Juang
1269 Explicit Segmentation of Speech using Gaussian Models
Antonio Bonafonte, Albino Nogueiras, Antonio Rodriguez-Garrido
1273 A Comparison of Several Recent Methods of Fundamental Frequency and Voicing Decision Estimation
E. Mousset, W.A. Ainsworth, José A. R. Fonollosa
1277 Robust Pitch Estimation with Harmonics Enhancement in Noisy Environments Based on Instantaneous Frequency
Toshihiko Abe, Takao Kobayashi, Satoshi Imai
1281 Integrated Polispectrum on Speech Recognition
Asunción Moreno, Miquel Rutllán
FrP2S1 -- Physics and Simulation of the Vocal Tract II
1285 Analysis of Acoustic Properties of the Nasal Tract Using 3-D FEM
Hisayoshi Suzuki, Takayoshi Nakai, Hirosi Sakakibara
1289 Experiments with Analysis By Synthesis of Glottal Airflow
Johan Liljencrants
Volume 3
SaA1L1 -- Speech Recognition Using HMMs and NNs
1293 An Incremental Speaker-Adaptation Technique for Hybrid HMM-MLP Recognizer
Joao P. Neto, Ciro A. Martins, Luís B. Almeida
1297 Phoneme Segmentation of Continuous Speech using Multi-layer Perceptron
Youngjoo Suh, Youngjik Lee
1301 Stochastic Perceptual Speech Models with Durational Dependence
Jeff Bilmes, Nelson Morgan, Su-Lin Wu, Hervé Bourlard
1305 Boosting the Performance of Connectionist Large Vocabulary Speech Recognition
G.D. Cook, A.J. Robinson
1309 HMMs and OWE Neural Network for Continuous Speech Recognition
Nicolas Pican, Dominique Fohr, Jean-François Mari
1313 Smoothed Local Adaptation of Connectionist Systems
Steve Waterhouse, Dan Kershaw, Tony Robinson
SaA1L2 -- Adverse Environments and Multiple Microphones
1317 Robust Speech Recognition with Speaker Localization by a Microphone Array
Takeshi Yamada, Satoshi Nakamura, Kiyohiro Shikano
1321 Sound Source Localization in Reverberant Environments using an Outlier Elimination Algorithm
Ea-Ee Jan, James L. Flanagan
1325 The 1995 Abbot LVCSR System for Multiple Unknown Microphones
Dan Kershaw, Tony Robinson, Steve Renals
1329 Experiments of Speech Recognition in a Noisy and Reverberant Environment using a Microphone Array and HMM
Adaptation
D. Giuliani, M. Omologo, P. Svaizer
1333 Increasing Robustness in GMM Speaker Recognition Systems for Noisy and Reverberant Speech with Low Complexity
Microphone Arrays
Joaquín González-Rodríguez, Javier Ortega-García, César Martin, Luis Hernández
1337 Robust Automatic Speech Recognition Using a Multi-channel Signal Separation Front-End
Kuan-Chieh Yen, Yunxin Zhao
SaA1L3 -- Prosodic Synthesis in Dialogue
1341 Prosody Generation in Text-to-Speech Conversion Using Dependency Graphs
Anders Lindström, Ivan Bretan, Mats Ljungqvist
1345 Extraction Method of Non-restrictive Modification in Japanese as a Marked Factor of Prosody
Hisako Asano, Hisashi Ohara, Yoshifumi Ooyama
1349 Modeling Contrast in the Generation and Synthesis of Spoken Language
Scott Prevost
1353 A Left-to-right Processing Model of Pausing in Japanese Based on Limited Syntactic Information
Hajime Tsukada
1357 Modeling of Intonation Bearing Emphasis for TTS-Synthesis of Greek Dialogues
D. Galanis, V. Darsinos, G. Kokkinakis
1361 Synthesizing Prosody: a Prominence-based Approach
Barbara Heuft, Thomas Portele
SaA1P1 -- Speech Synthesis
1365 Multilingual Text Analysis for Text-to-Speech Synthesis
Richard Sproat
1369 Spoken-style Explanation Generator for Japanese Kanji using a Text-to-speech System
Yoshifumi Ooyama, Hisako Asano, Koji Matsuoka
1373 A Method for Estimating Prosodic Symbol from Text for Japanese Text-To-Speech Synthesis
Ken-ichi Magata, Tomoki Hamagami, Mitsuo Komura
1377 Statistical Methods in Data-driven Modeling of Spanish Prosody for Text to Speech
E. López-Gonzalo, J.M. Rodríguez-García
1381 Intonation Processing for TTS Using Stylization and Neural Network Learning Method
Jung-Chul Lee, Youngjik Lee, Sang-Hun Kim, Minsoo Hahn
1385 Generating F0 Contours from ToBI Labels using Linear Regression
Alan W. Black, Andrew J. Hunt
1389 The Broad Study of Homograph Disambiguity for Mandarin Speech Synthesis
Wern-Jun Wang, Shaw-Hwa Hwang, Sin-Horng Chen
1393 The MBROLA project: Towards a Set of High Quality Speech Synthesizers Free of Use for Non Commercial Purposes
T. Dutoit, V. Pagel, N. Pierret, F. Bataille, O. Van der Vrecken
1397 Training Data Selection for Voice Conversion Using Speaker Selection and Vector Field Smoothing
Makoto Hashimoto, Norio Higuchi
1401 A New Voice Transformation Method Based on Both Linear and Nonlinear Prediction Analysis
Ki Seung Lee, Dae Hee Youn, Il Whan Cha
1405 On the Transformation of the Speech Spectrum for Voice Conversion
G. Baudoin, Yannis Stylianou
1409 Spectral Analysis of Synthetic Speech and Natural Speech with Noise over the Telephone Line
Cristina Delogu, Andrea Paoloni, Susanna Ragazzini, Paola Ridolfi
1413 A New Speech Synthesis System Based on the ARX Speech Production Model
Weizhong Zhu, Hideki Kasuya
1417 Speech Synthesis Using the CELP Algorithm
Geraldo Lino de Campos, Evandro Bacci Gouvêa
1421 A Mandarin Text-to-Speech System
Shaw-Hwa Hwang, Sin-Horng Chen, Yih-Ru Wang
1425 Residual-based Speech Modification Algorithms for Text-to-Speech Synthesis
M.D. Edgington, A. Lowry
1429 A Generalized LR Parser for Text-to-speech Synthesis
Per Olav Heggtveit
1433 Enhanced Shape-invariant Pitch and Time-scale Modification for Concatenative Speech Synthesis
M.P. Pollard, B.M.G. Cheetham, C.C. Goodyear, M.D. Edgington, A. Lowry
1437 An Excitation Synchronous Pitch Waveform Extraction Method and its Application to the VCV-concatenation Synthesis
of Japanese Spoken Words
Yasuhiko Arai, Ryo Mochizuki, Hirofumi Nishimura, Takashi Honda
1441 A New Chinese Text-to-Speech System with High Naturalness
Ren-Hua Wang, Qinfeng Liu, Difei Tang
1445 Voice Conversion Based on Topological Feature Maps and Time-variant Filtering
Ansgar Rinscheid
SaA1P2 -- Instructional Technology for Spoken Language
1449 Language Training System Utilizing Speech Modification
Meron Yoram, Keikichi Hirose
1453 Perception of English /r/ and /l/ Speech Contrasts by Native Korean Listeners with Extensive English-language
Experience
D.G. Jamieson, K. Yu
1457 Automatic Text-independent Pronunciation Scoring of Foreign Language Student Speech
Leonardo Neumeyer, Horacio Franco, Mitchel Weintraub, Patti Price
1461 Assessing the Contribution of Instructional Technology in the Teaching of Pronunciation
Antônio Simoes
1465 Detection of Foreign Speakers' Pronunciation Errors for Second Language Training - Preliminary Results
Maxine Eskenazi
1469 Foreign Accent in Intonation Patterns - A Contrastive Study Applying a Quantitative Model of the F0 Contour
Hansjörg Mixdorff
1473 Input Modality Effects in Foreign Accent
Duncan J. Markham, Yasuko Nagano-Madsen
SaA1S1 -- Multimodal Spoken Language Processing I
1477 For Speech Perception by Humans or Machines, Three Senses are Better than One
Lynne E. Bernstein, Christian Benoît
1481 A Few Factors Which Affect the Degree of Incorporating Lip-read Information into Speech Perception
Kaoru Sekiyama, Yoh'ichi Tohkura, Michio Umeda
1485 Characterizing Audiovisual Information During Speech
E. Vatikiotis-Bateson, K.G. Munhall, Y. Kasahara, F. Garcia, H. Yehia
1489 The Implications of the Tadoma Method of Speechreading for Spoken Language Processing
Charlotte M. Reed
1493 Seeing Speech in Space and Time: Psychological and Neurological Findings
Ruth Campbell
SaA2L1 -- Prosody - Phonological/Phonetic Measures
1497 What's in the "Pure" Prosody?
Volker Strom, Christina Widera
1501 F0 Declination in Read-aloud and Spontaneous Speech
Marc Swerts, Eva Strangert, Mattias Heldner
1505 Prediction of Prosodic Phrase Boundaries Considering Variable Speaking Rate
Yeon-jun Kim, Yung-hwan Oh
1509 Prediction of F0 Parameter of Contextualized Utterances in Dialogue
Yoichi Yamashita, Riichiro Mizoguchi
1513 The Production and Perception of Potentially Ambiguous Intonation Contours by Speakers of Russian and Japanese
V. Makarova, J. Matsui
1517 What is Invariant and What is Optional in the Realization of a FOCUSED Word? A Cross-dialectal Study of Swedish
Sentences With Moving Focus
Robert Eklund
SaA2L2 -- Phonetics and Perception
1521 Quantifying Spectral Characteristics of Fricatives
Christine H. Shadle, Sheila J. Mair
1525 Acoustic Characteristics of Ejectives in Ingush
Natasha Warner
1529 An Acoustic Profile of Consonant Reduction
R.J.J.H. van Son, Louis C. W. Pols
1533 Devoicing in Post-vocalic Canadian-French Obstruants
Danièle Archambault, Blagovesta Maneva
1537 Paying Attention to Speaking Rate
Alexander L. Francis, Howard C. Nusbaum
1541 The Lack of Invariance Problem and the Goal of Speech Perception
Irene Appelbaum
SaA2L3 -- Language Acquisition
1545 The Acoustic Structure of Vowels in Mothers' Speech to Infants and Adults
Jean E. Andruski, Patricia K. Kuhl
1549 Acoustical Characteristics of Sound Production of Deaf and Normally Hearing Infants
Chris J. Clement, Florien J. Koopmans-van Beinum, Louis C. W. Pols
1553 Learning Non-native Vowel Categories
John Kingston, Christine Bartels, José Benkí, Deanna Moore, Jeremy Rice, Rachel Thorburn, Neil Macmillan
1557 Word Recognition by Japanese Infants
P.A. Halle, Toshisada Deguchi, Yuji Tamekawa, B. Boysson-Bardies, Shigeru Kiritani
1561 Investigations of the Word Segmentation Abilities of Infants
Peter W. Jusczyk
1565 Developmental Change in Perception of Clause Boundaries by 6- and 10-Month-old Japanese Infants
Akiko Hayashi, Yuji Tamekawa, Toshisada Deguchi, Shigeru Kiritani
SaA2P1 -- Production and Prosody Posters
1569 A Frequency Domain Method for Parametrization of the Voice Source
Paavo Alku, Erkki Vilkman
1573 Glottal Correlates of the Word Stress and the Tense/Lax Opposition in German
Krzysztof Marasek
1577 Coarticulatory Stability in American English /r/
Suzanne Boyce, Carol Y. Espy-Wilson
1581 An MRI-based Analysis of the English /r/ and /l/ Articulations
Shinobu Masaki, Reiko Akahane-Yamada, Mark K. Tiede, Yasuhiro Shimada, Ichiro Fujimoto
1585 Does Lexical Stress or Metrical Stress Better Predict Word Boundaries in Dutch?
David van Kuijk
1589 Optopalatograph (OPG): A New Apparatus for Speech Production Analysis
A. A. Wrench, A. D. McIntosh, W. J. Hardcastle
1593 Prediction of Vowel Systems using a Deductive Approach
René Carré
1597 Distinctions Between [t] and [tch] using Electropalatography Data
Sheila J. Mair, Celia Scully, Christine H. Shadle
1601 Relating Formants and Articulation in Intelligibility Test Words
Michiko Hashi, Raymond D. Kent, John R. Westbury, Mary J. Lindstrom
1605 The Role of Coarticulation in the Perception of Vowel Quality in Modern Standard Arabic
Imad Znagui, Mohamed Yeou
1609 Updating the Reading EPG
Simon Arnfield, Wilf Jones
1612 Lexical Stress Detection on Stress-minimal Word Pairs
Goangshiuan S. Ying, Leah H. Jamieson, Ruxin Chen, Carl D. Mitchell
1616 An Acoustic Study of the Interaction Between Stressed and Unstressed Syllables in Spoken Mandarin
Jing Wang
1620 Automatic Detection of Accent Nuclei at the Head of Words for Speech Recognition
Nobuaki Minematsu, Seiichi Nakagawa
1624 Automatic Generation of Prosodic Structure for High Quality Mandarin Speech Synthesis
Fu-chiang Chou, Chiu-yu Tseng, Lin-shan Lee
1628 A Study on Japanese Prosodic Pattern and its Modeling in Restricted Speech
Tomoki Hamagami, Ken-ichi Magata, Mitsuo Komura
1632 A Phonetic Study of Focus in Intransitive Verb Sentences
Steve Hoskins
* Variation in Vocal Fold Vibration Associated with Prosodic Conditions
Shigeru Kiritani, Hiroshi Imagawa, Seiji Niimi
1636 Goethe for Prosody
Stefan Rapp
1640 Prosodic Cues in Syntactically Ambiguous Strings; An Interactive Speech Planning Mechanism
K.A. Straub
1644 A Functional Model for Generation of the Local Components of F0 Contours in Chinese
Jinfu Ni, Ren-Hua Wang, Deyu Xia
1648 The Acquisition of Voiceless Stops in the Interlanguage of Second Language Learners of English and Spanish
Marie Fellbaum
SaA2S1 -- Multimodal Spoken Language Processing II
1652 Studies of the McGurk Effect: Implications for Theories of Speech Perception
Kerry P. Green
1656 Using the Visual Component in Automatic Speech Recognition
N. M. Brooke
1660 Perceptual Organization of Speech in One and Several Modalities: Common Functions, Common Resources
Robert E. Remez
1664 Multi-modal Encoding of Speech in Memory: A First Report
David B. Pisoni, Helena M. Saldaña, Sonya M. Sheffert
SaP1L1 -- User-Machine Interfaces
1668 Evaluating Automatic Speech Recognition as a Component of a Multi-input Device Human-computer Interface
B.A. Mellor, C. Baber, C. Tunley
1672 Data Collection for the MASK Kiosk: WOz vs Prototype System
A. Life, I. Salter, J.N. Temem, F. Bernard, S. Rosset, S.K. Bennacef, Lori Lamel
1676 An Experimental Japanese/English Interpreting Video Phone System
M. Karaorman, T.H. Applebaum, T. Itoh, M. Endo, Y. Ohno, M. Hoshimi, T. Kamai, K. Matsui, K. Hata, S.
Pearson, J.-C. Janqua
1680 User Participation and Compliance in Speech Automated Telecommunications Applications
Sara Basson, Stephen Springer, Cynthia Fong, Hong Leung, Ed Man, Michele Olson, John Pitrelli, Ranvir
Singh, Suk Wong
1684 Embedding Speech in Web Interfaces
Samuel Bayer
1688 Voice-activated Home Banking System and its Field Trial
Toshihiro Isobe, Masatoshi Morishima, Fuminori Yoshitani, Nobuo Koizumi, Ken'ya Murakami
SaP1L2 -- TTS Systems and Rules
1692 A Text Analyzer for Korean Text-to-Speech Systems
Sangho Lee, Yung-Hwan Oh
1696 Design and Evaluation of a Phonological Phrase Parser for Spanish Text-to-Speech
Helen E. Karn
1700 Comparison of Two Tree-Structured Approaches for Grapheme-to-Phoneme Conversion
Ove Andersen, Roland Kuhn, Ariane Lazaridès, Paul Dalsgaard, Jürgen Haas, Elmar Nöth
1704 A Recurrent Network that Learns to Pronounce English Text
M.J. Adamson, R.I. Damper
1708 Archisegment-based Letter-to-Phone Conversion for Concatenative Speech Synthesis in Portuguese
Eleonora Cavalcante Albano, Agnaldo Antonio Moreira
1712 A New Method of Generating Speech Synthesis Units Based on Phonological Knowledge and Clustering Technique
Yuki Yoshida, Shin'ya Nakajima, Kazuo Hakoda, Tomohisa Hirokawa
SaP1L3 -- Prosody and Labeling
1716 Consistency in Transcription and Labelling of German Intonation with GToBI
Martine Grice, Matthias Reyelt, Ralf Benzmüller, Jörg Mayer, Anton Batliner
1720 Syntactic-prosodic Labeling of Large Spontaneous Speech Data-bases
Anton Batliner, R. Kompe, A. Kiessling, H. Niemann, E. Nöth
1724 Relationship Between Discourse Structure and Dynamic Speech Rate
Florien J. Koopmans-van Beinum, Monique E. van Donzel
1728 Using Prosodic Clues to Decide When to Produce Back-channel Utterances
Nigel Ward
1732 Dialog Act Classification with the Help of Prosody
Marion Mast, Ralf Kompe, Stefan Harbeck, Andreas Kiessling, Heinrich Niemann, Elmar Nöth, E. G.
Schukat-Talamazzini, V. Warnke
1736 Using Lexical Stress in Continuous Speech Recognition for Dutch
David van Kuijk, Henk van den Heuvel, Louis Boves
SaP1P1 -- Speaker/Language Identification and Verification
1740 Automatic Accent Classification of Foreign Accented Australian English Speech
Karsten Kumpf, Robin W. King
1744 Discriminative Adaptation for Speaker Verification
F. Korkmazskiy, Biing-Hwang Juang
1748 Perceptual Features of Unknown Foreign Languages as Revealed by Multi-dimensional Scaling
V. Stockmal, D. Muljani, Z.S. Bond
1752 On-line Incremental Adaptation for Speaker Verification using Maximum Likelihood Estimates of CDHMM Parameters
Kin Yu, John S. Mason
1756 Combining Methods to Improve Speaker Verification Decision
Dominique Genoud, Frédéric Bimbot, Guillaume Gravier, Gérard Chollet
1760 Incremental Speaker Adaptation with Minimum Error Discriminative Training for Speaker Identification
Cesar Martín del Alamo, J. Alvarez, C. de la Torre, F.J. Poyatos, L. Hernández
1764 Frame Level Likelihood Normalization for Text-independent Speaker Identification using Gaussian Mixture Models
Konstantin P. Markov, Seiichi Nakagawa
1768 On Using Prosodic Cues in Automatic Language Identification
Ann E. Thymé-Gobbel, Sandra E. Hutchins
1772 Speaker Recognition Model using Two-dimensional Mel-Cepstrum and Predictive Neural Network
Tadashi Kitamura, Shinsai Takei
1776 Unknown Language Rejection in Language Identification System
Hingkeung Kwan, Keikichi Hirose
1780 Spoken Language Identification using Large Vocabulary Speech Recognition
James L. Hieronymus, Shubha Kadambe
1784 Accent Identification
Carlos Teixeira, Isabel M. Trancoso, António Serralheiro
1788 Comparison of Text-independent Speaker Recognition Methods on Telephone Speech with Acoustic Mismatch
Sarel van Vuuren
1792 On the Sources of Inter- and Intra-speaker Variability in the Acoustic Dynamics of Speech
Xue Yang, J. Bruce Millar, Iain Macleod
1796 Language Identification with Inaccurate String Matching
Kay M. Berkling, Etienne Barnard
1800 Robust Prosodic Features for Speaker Identification
M.J. Carey, E.S. Parris, H. Lloyd-Thomas, S.J. Bennett
1804 Text Independent Speaker Identification on Noisy Environments by Means of Self Organizing Maps
E. Monte, J. Hernando, X. Miró, A. Adolf
1808 Language-identification Using Language-dependent Phonemes and Language-independent Speech Units
Paul Dalsgaard, Ove Andersen, Hanne Hesselager, Bojan Petek
SaP1S1 -- Large Vocabulary Speech Recognition: The Switchboard Domain
* Large Vocabulary Speech Recognition: The Switchboard Domain
Ronald Rosenfeld, Hervé Bourlard
SaP1S2 -- Emotion in Recognition and Synthesis I
* Adding the Affective Dimension: A New Look in Speech Analysis and Synthesis
Klaus R. Scherer
1812 Ethological Theory and the Expression of Emotion in the Voice
John J. Ohala
1816 Synthesizing Emotions in Speech: Is it Time to Get Excited?
Iain R. Murray, John L. Arnott
SaP2L1 -- Stochastic Techniques in Robust Speech Recognition
1820 A Study on Task-independent Subword Selection and Modeling for Speech Recognition
Chin-Hui Lee, Biing-Hwang Juang, Wu Chou, J.J. Molina-Perez
1824 Simultaneous ANN Feature and HMM Recognizer Design using String-based Minimum Classification Error (MCE) Training
Mazin G. Rahim, Chin-Hui Lee
1828 Quantizing Mixture-weights in a Tied-mixture HMM
Sunil K. Gupta, Frank K. Soong, Raziel Haimi-Cohen
1832 Variance Compensation within the MLLR Framework for Robust Speech Recognition and Speaker Adaptation
M.J.F. Gales, D. Pye, P.C. Woodland
1836 Maximum-likelihood Stochastic Matching Approach to Non-linear Equalization for Robust Speech Recognition
A.C. Surendran, Chin-Hui Lee, Mazin G. Rahim
1840 Estimation of Channel Bias for Telephone Speech Recognition
Jen-Tzung Chien, Hsiao-Chuan Wang, Lee-Min Lee
SaP2L2 -- Prosodic Synthesis in Text to Speech
1844 Synthesis of English Intonation using Explicit Models of Reading and Spontaneous Speech
M. E. Johnson
1848 Implementation and Evaluation of a Model for Synthesis of Swedish Intonation
Merle Horne, Marcus Filipsson
1852 Natural Prosody Generation for Domain Specific Text-to-Speech Systems
Nobuyuki Katae, Shinta Kimura
1856 Improving Text-to-Speech Synthesis
Mark Tatham, Eric Lewis
1860 Synthesis of Stressed Speech from Isolated Neutral Speech Using HMM-based Models
Sahar E. Bou-Ghazale, John H.L. Hansen
1864 Modeling Segment Intonation for Slovene TTS System
Ales Dobnikar
SaP2L3 -- Dialogue Events
1868 Word Predictability After Hesitations: A Corpus-based Study
Elizabeth Shriberg, Andreas Stolcke
1872 Interruptions and Intonation
Li-chiung Yang
1876 On not Recognizing Disfluencies in Dialogue
Robin J. Lickley, Ellen Gurman Bard
1880 A Theory of Word Frequencies and its Application to Dialogue Move Recognition
Phil Garner, Sue Browning, Roger Moore, Martin Russell
1884 Utterance Units and Grounding in Spoken Dialogue
David R. Traum, Peter A. Heeman
1888 Coordinating Turn-taking with Gaze
David G. Novick, Brian Hansen, Karen Ward
SaP2P1 -- Databases and Tools
1892 BABEL: An Eastern European Multi-language Database
Peter Roach, Simon Arnfield, W. Barry, J. Baltova, M. Boldea, A. Fourcin, W. Gonet, R. Gubrynowicz, E.
Hallum, L. Lamel, K. Marasek, A. Marchal, E. Meister, K. Vicsi
1894 USTC95---A Putonghua Corpus
Ren-Hua Wang, Deyu Xia, Jinfu Ni, Bicheng Liu
1898 Telephone Data Collection using the World Wide Web
Edward Hurley, Joseph Polifroni, James Glass
1902 The "SIVA" Speech Database for Speaker Verification: Description and Evaluation
M. Falcone, A. Gallo
1906 A Multi-level Description of Date Expressions in German Telephone Speech
Christoph Draxler
1910 Viterbi Search Visualization Using Vista: A Generic Performance Visualization Tool
Robert H. Halstead Jr., Ben Serridge, Jean-Manuel Van Thong, William Goldenthal
1914 A Multilingual Phonetic Representation and Analysis System for Different Speech Databases
Toomas Altosaar, Matti Karjalainen, Martti Vainio
1918 FRESCO: The French Telephone Speech Data Collection - Part of the European SpeechDat(M) Project
D. Langmann, R. Haeb-Umbach, Louis Boves, E. den Os
1922 Predicting the Out-of-Vocabulary Rate and the Required Vocabulary Size for Speech Processing Applications
Johannes Müller, Holger Stahl, Manfred Lang
1926 AMULET: Automatic MUltisensor Speech Labelling and Event Tracking: Study of the Spatio-temporal Correlations in
Voiceless Plosive Production
Nathalie Parlangeau, Alain Marchal
1930 Constructing Multi-level Speech Database for Spontaneous Speech Processing
Minsoo Hahn, Sanghun Kim, Jung-Chul Lee, Yong-Ju Lee
1934 Preliminaries to a Romanian Speech Database
Marian Boldea, Alin Doroga, Tiberiu Dumitrescu, Maria Pescaru
1938 Labelled Data Bank of Spoken Standard German The Kiel Corpus of Read/Spontaneous Speech
Klaus J. Kohler
1942 SAPPHIRE: An Extensible Speech Analysis and Recognition Tool Based on Tcl/Tk
Lee Hetherington, Michael McCandless
1946 Automatic Detection of Topic Boundaries and Keywords in Arbitrary Speech Using Incremental Reference Interval-free
Continuous DP
Jiro Kiyama, Yoshiaki Itoh, Ryuichi Oka
1950 Very-large-vocabulary Mandarin Voice Message File Retrieval using Speech Queries
Bo-Ren Bai, Lee-Feng Chien, Lin-Shan Lee
1954 Gandalf - A Swedish Telephone Speaker Verification Database
H. Melin
1958 The DCIEM Map Task Corpus: Spontaneous Dialogue Under Sleep Deprivation and Drug Treatment
Ellen Gurman Bard, C. Sotillo, A. H. Anderson, M. M. Taylor
1962 The Nemours Database of Dysarthric Speech
Xavier Menéndez-Pidal, James B. Polikoff, Shirley M. Peters, Jennie E. Leonzio, H.T. Bunnell
1966 POST: Parallel Object-oriented Speech Toolkit
Jean Hennebert, Dijana Petrovska Delacrétaz
SaP2S2 -- Emotion in Recognition and Synthesis II
1970 Recognizing Emotion in Speech
Frank Dellaert, Thomas Polzin, Alex Waibel
1974 Emotions in Time Domain Synthesis
Barbara Heuft, Thomas Portele, Monika Rauth
1978 Word Class Driven Synthesis of Prosodic Annotations
Simon Arnfield
1981 Dynamical Modelling of Vowel Sounds as a Synthesis Tool
M. Banbrook, S. McLaughlin
1985 Emotional Speech Elicited using Computer Games
Tom Johnstone
1989 Automatic Statistical Analysis of the Signal and Prosodic Signs of Emotion in Speech
Roddy Cowie, Ellen Douglas-Cowie
Volume 4
SuA1L1 -- Robust Speech Processing
1993 Channel and Noise Normalization Using Affine Transformed Cepstrum
Xiaoyu Zhang, Richard J. Mammone
1997 Spectral Estimation and Normalisation for Robust Speech Recognition
Tom Claes, Fei Xie, Dirk Van Compernolle
2001 Trellis Encoded Vector Quantization for Robust Speech Recognition
Wu Chou, Nambi Seshadri, Mazin Rahim
2005 Phone Clustering using the Bhattacharyya Distance
Brian Mak, Etienne Barnard
2009 Variability of Lombard Effects Under Different Noise Conditions
Atsushi Wakao, Kazuya Takeda, Fumitada Itakura
2013 Lombard Effect Compensation and Noise Suppression for Noisy Lombard Speech Recognition
Sang-mun Chi, Yung-Hwan Oh
SuA1L2 -- Dialects and Speaking Styles
2017 The Use of Shibboleth Words for Automatically Classifying Speakers by Dialect
A.W.F. Huggins, Yogen Patel
* The Organization of Dialect Diversity in North America
William Labov
2021 Data Collection of Japanese Dialects and its Influence into Speech Recognition
Ikuo Kudo, Takao Nakama, Tomoko Watanabe, Reiko Kameyama
2025 Statistical Dialect Classification Based on Mean Phonetic Features
David R. Miller, James Trischitta
2028 Norwegian Numerals: a Challenge to Automatic Speech Recognition
Knut Kvale
2032 Evaluation of the Telefónica I+D Natural Numbers Recognizer over Different Dialects of Spanish from Spain and
America
C. de la Torre, J. Caminero-Gil, J. Alvarez, C. Martín del Alamo, L. Hernández-Gómez
SuA1L3 -- Production and Perception of Prosody
2036 Rhythmic Constraints on English Stress Timing
Fred Cummins, Robert F. Port
2040 On the Interaction of Clash, Focus and Phonological Phrasing
Irene Vogel, Steve Hoskins
2044 On the Quantal Nature of Speech Timing
Gunnar Fant, Anita Kruckenberg
2048 Differential Perception of Tonal Contours Through the Syllable
David House
2052 Pitch, Loudness, and Segmental Duration Correlates: Towards a Model for the Phonetic Aspects of Finnish Prosody
Martti Vainio, Toomas Altosaar
2056 Prosodic Manipulation System of Speech Material for Perceptual Experiments
Nobuaki Minematsu, Seiichi Nakagawa, Keikichi Hirose
SuA1P1 -- Topics in ASR and Search
2060 Clustered Language Models with Context-Equivalent States
J.P. Ueberla, I. R. Gransden
2063 Modeling of Contextual Effects and its Application to Word Spotting
Yuji Yonezawa, Masato Akagi
2067 A New Keyword Spotting Algorithm with Pre-calculated Optimal Thresholds
J. Junkawitsch, L. Neubauer, H. Höge, G. Ruske
2071 Detection of Ambiguous Portions of Signal Corresponding to OOV Words or Misrecognized Portions of Input
Roxane Lacouture, Yves Normandin
2075 Techniques for Approximating a Trigram Language Model
Fabio Brugnara, Marcello Federico
2079 Unsupervised and Incremental Speaker Adaptation under Adverse Environmental Conditions
Keizaburo Takagi, Koichi Shinoda, Hiroaki Hattori, Takao Watanabe
2083 An Adaptive-Beam Pruning Technique for Continuous Speech Recognition
Hugo Van hamme, Filip Van Aelten
2087 Data Based Filter Design for RASTA-like Channel Normalization in ASR
Carlos Avendano, Sarel van Vuuren, Hynek Hermansky
2091 A Comparison of Time Conditioned and Word Conditioned Search Techniques for Large Vocabulary Speech Recognition
S. Ortmanns, H. Ney, Frank Seide, I. Lindam
2095 Language-model Look-ahead for Large Vocabulary Speech Recognition
S. Ortmanns, H. Ney, A. Eiden
2099 A New Search Algorithm in Segmentation Lattices of Speech Signals
Jean-Luc Husson, Yves Laprie
2103 LR-Parser-driven Viterbi Search with Hypotheses Merging Mechanism Using Context-dependent Phone Models
Tomokazu Yamada, Shigeki Sagayama
2107 Discrete-Utterance Recognition with a Fast Match Based on Total Data Reduction
Jan Nouza
2111 On-line Garbage Modeling with Discriminant Analysis for Utterance Verification
J. Caminero, C. de la Torre, L. Villarrubia, C. Martín, L. Hernández
2115 Cheating with Imperfect Transcripts
Paul Placeway, John Lafferty
2119 Novel Training Method for Classifiers used in Speaker Adaptation
Naoto Iwahashi
2123 Large Vocabulary Word Recognition based on a Graph-structured Dictionary
Katsuki Minamino
2127 A Word Graph Based N-Best Search in Continuous Speech Recognition
Bach-Hiep Tran, Frank Seide, Volker Steinbiss
2131 Viterbi Beam Search with Layered Bigrams
David M. Goblirsch
2135 A Wave Decoder for Continuous Speech Recognition
Eric Burhke, Wu Chou, Qiru Zhou
2139 Long Term On-line Speaker Adaptation for Large Vocabulary Dictation
Eric Thelen
2143 Incremental Generation of Word Graphs
Gerhard Sagerer, Heike Rautenstrauch, G. A. Fink, Bernd Hildebrandt, A. Jusek, Franz Kummert
2147 Improvement in N-Best Search for Continuous Speech Recognition
Irina Illina, Yifan Gong
2151 Sethos: The UPC Speech Understanding System
Antonio Bonafonte, José B. Mariño, Albino Nogueiras
2155 Segmental Search for Continuous Speech Recognition
Pietro Laface, Luciano Fissore, A. Maro, Franco Ravera
SuA1P2 -- Multimodal Dialogue/HCI
2159 An Investigation into the Generation of Mouth Shapes for a Talking Head
A. P. Breen, E. Bowers, W. Welsh
2163 A Text-to-audiovisual-speech Synthesizer for French
Bertrand Le Goff, Christian Benoît
2167 Analysis of Head Movements and its Role in Spoken Dialogue
Yuri Iwano, Shioya Kageyama, Emi Morikawa, Shu Nakazato, Katsuhiko Shirai
2171 RWC Multimodal Database for Interactions by Integration of Spoken Language and Visual Information
Satoru Hayamizu, Osamu Hasegawa, Katunobu Itou, Katuhiko Sakaue, Kazuyo Tanaka, Shigeki Nagaya, Masayuki
Nakazawa, T. Endoh, Fumio Togawa, Kenji Sakamoto, Kazuhiko Yamamoto
2175 About the Relationship Between Eyebrow Movements and Fo Variations
Christian Cavé, Isabelle Guaïtella, Roxane Bertrand, Serge Santi, Françoise Harlay, Robert Espesser
2179 How Many Words is a Picture Really Worth?
Laurel Fais, Kyung-ho Loken-Kim, Tsuyoshi Morimoto
2183 Visual Synthesis of Source Acoustic Speech Through Kohonen Neural Networks
A. Lagana`, F. Lavagetto, A. Storace
2187 Audio-visual Speech Perception Without Speech Cues
Helena M. Saldaña, David B. Pisoni, Jennifer M. Fellowes, Robert E. Remez
SuA1S1 -- Multilingual Speech Processing I
2191 Multilingual Speech Recognition at Dragon Systems
Jim Barnett, A. Corrada, G. Gao, L. Gillick, Y. Ito, S. Lowe, L. Manganaro, B. Peskin
2195 Multi-lingual Phoneme Recognition Exploiting Acoustic-phonetic Similarities of Sounds
Joachim Köhler
2199 Japanese Speech Databases for Robust Speech Recognition
Atsushi Nakamura, Shoichi Matsunaga, Tohru Shimizu, Masahiro Tonomura, Yoshinori Sagisaka
2203 Spoken Language Processing in a Multilingual Context
Lori F. Lamel, M. Adda-Decker, Jean Luc Gauvain, G. Adda
2207 Multilingual Human-computer Interactions: From Information Access to Language Learning
Victor Zue, Stephanie Seneff, Joseph Polifroni, Helen Meng, James Glass
2211 SpeeData: Multilingual Spoken Data Entry
U. Ackermann, B. Angelini, F. Brugnara, M. Federico, D. Giuliani, R. Gretter, G. Lazzari, H. Niemann
SuA2L1 -- Acoustics in Synthesis
2215 Pseudo-articulatory Representations in Speech Synthesis and Recognition
William H. Edmondson, Jon P. Iles, Dorota J. Iskra
2219 Synthesis of Initial (/s/-) Stop-liquid Clusters using HLsyn
David R. Williams
2223 Synthesis of Trill
Chilin Shih
2227 Phone-based Speech Synthesis with Neural Network and Articulatory Control
W.K. Lo, P.C. Ching
2231 Analysis of Ten Vowel Sounds Across Gender and Regional/Cultural Accent
P. Martland, S.P. Whiteside, Steve W. Beet, L. Baghai-Ravary
2235 Speech Morphing by Gradually Changing Spectrum Parameter and Fundamental Frequency
Masanobu Abe
SuA2L2 -- Pitch and Rate
2239 The Multi-Lag-Window Method for Robust Extended-range F0 Determination
Edouard Geoffrois
2243 Nonlinear Estimation of DEGG Signals with Applications to Speech Pitch Detection
Kenneth E. Barner
2247 Pitch Analysis Methods for Cross-Speaker Comparison
John. A. Maidment, M. Luisa Garcia-Lecumberri
2250 Continuous Adaptation of Linear Models with Impulsive Excitation
Steve W. Beet, L. Baghai-Ravary
2254 Quantitative Analysis of the Local Speech Rate and its Application to Speech Synthesis
Sumio Ohno, Masamichi Fukumiya, Hiroya Fujisaki
2258 A Fast and Reliable Rate of Speech Detector
Jan P. Verhasselt, Jean-Pierre Martens
SuA2L3 -- Acoustic Modeling II
2262 Context Modeling and Clustering in Continuous Speech Recognition
Jean-Claude Junqua, Lorenzo Vassallo
2266 Hierarchical Partition of the Articulatory State Space for Overlapping-feature Based Speech Recognition
Li Deng, Jim Jian-Xiong Wu
2270 A Fuzzy Acoustic-phonetic Decoder for Speech Recognition
Olivier Oppizzi, David Fournier, Philippe Gilles, Henri Méloni
2274 Syllable-level Desynchronisation of Phonetic Features for Speech Recognition
Katrin Kirchhoff
2277 A Probabilistic Framework for Feature-based Speech Recognition
James Glass, Jane Chang, Michael McCandless
2281 Modeling Context-dependent Phonetic Units in a Continuous Speech Recognition System for Mandarin Chinese
Jim Jian-Xiong Wu, Li Deng, Jacky Chan
SuA2P1 -- General ASR Posters
2285 JANUS-II: Towards Spontaneous Spanish Speech Recognition
Puming Zhan, Klaus Ries, Marsal Gavaldà, Donna Gates, Alon Lavie, Alex Waibel
2289 Reduced Semi-continuous Models for Large Vocabulary Continuous Speech Recognition in Dutch
Kris Demuynck, Jacques Duchateau, Dirk Van Compernolle
2293 Validating Different Flexible Vocabulary Approaches on the Swiss French PolyPhone and PolyVar Databases
Andrei Constantinescu, Olivier Bornet, Gilles Caloz, Gérard Chollet
2297 Use of a Reliability Coefficient in Noise Cancelling by Neural Net and Weighted Matching Algorithms
Nestor Becérra Yoma, Fergus R. McInnes, Mervyn A. Jack
2301 Likelihood Normalization Using an Ergodic HMM for Continuous Speech Recognition
Kazuhiko Ozeki
2305 Dynamic Control of a Production Model
Laurence Candille, Henri Méloni
2309 Speech Recognition Using Sub-word Units Dependent On Phonetic Contexts Of Both Training and Recognition
Vocabularies
Hiroaki Hattori, Eiko Yamada
2313 Hidden Markov Models Merging Acoustic and Articulatory Information to Automatic Speech Recognition
Bruno Jacob, Christine Senac
2316 Creation of Unseen Triphones from Diphones and Monophones using a Speech Production Approach
Mats Blomberg, Kjell Elenius
2320 Speaker-independent Dictation of Chinese Speech with 32K Vocabulary
Bo Xu, Bing Ma, Shuwu Zhang, Fei Qu, Taiyi Huang
2324 Using Accent-specific Pronunciation Modelling for Robust Speech Recognition
J.J. Humphries, P.C. Woodland, D. Pearce
2328 Dictionary Learning for Spontaneous Speech Recognition
Tilo Sloboda, Alex Waibel
2332 Comparison of Channel Normalisation Techniques for Automatic Speech Recognition Over the Phone
Johan de Veth, Louis Boves
2336 Anchor Point Detection for Continuous Speech Recognition in Spanish: The Spotting of Phonetic Events
Manuel A. Leandro, Jose M. Pardo
2340 Cepstral Compensation by Polynomial Approximation for Environment-independent Speech Recognition
Bhiksha Raj, Evandro B. Gouvêa, Pedro J. Moreno, Richard M. Stern
2344 Effect of Speech Coders on Speech Recognition Performance
B.T. Lilly, K.K. Paliwal
2348 Wavelet Transforms For Non-uniform Speech Recogntion Systems
Léonard Janer, Josep Martí, Climent Nadeu, Eduardo Lleida-Solano
2352 A Binaural Model as a Front-end for Isolated Word Recognition
Tsuyoshi Usagawa, Markus Bodden, Klaus Rateitschek
2356 A New Speech Enhancement: Speech Stream Segregation
Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata
SuA2S1 -- Multilingual Speech Processing II
2360 Head Automata for Speech Translation
Hiyan Alshawi
2364 Word Clustering with Parallel Spoken Language Corpora
Ye-Yi Wang, John Lafferty, Alex Waibel
2368 Toward Translating Korean Speech Into Other Languages
Jae-Woo Yang, Youngjik Lee
2371 VERBMOBIL: The Evolution of a Complex Large Speech-to-Speech Translation System
Thomas Bub, Johannes Schwinn
2375 Translation of Conversational Speech with JANUS-II
Alon Lavie, Alex Waibel, Lori Levin, Donna Gates, Marsal Gavaldà, Torsten Zeppenfeld, Puming Zhan, Oren
Glickman
SuP1L1 -- Data-based Synthesis
2379 Non-segmental Analysis and Synthesis Based on a Speech Database
Andrew Slater, John Coleman
2383 Microsegment Synthesis - Economic Principles in a Low-cost Solution
Ralf Benzmüller, William J. Barry
2387 Whistler: A Trainable Text-to-Speech System
X.D. Huang, A. Acero, J. Adcock, H.W. Hon, J. Goldsmith, J. Liu, Mike Plumpe
2391 Generation of Multiple Synthesis Inventories by a Bootstrapping Procedure
Thomas Portele, Karl-Heinz Stöber, Horst Meyer, Wolfgang Hess
2395 Modeling Segmental Duration in German Text-to-Speech Synthesis
Bernd Möbius, Jan P.H. van Santen
2399 Autolabelling Japanese ToBI
Nick Campbell
SuP1L2 -- Speaker Identification and Verification
2403 General Phrase Speaker Verification Using Sub-word Background Models and Likelihood-ratio Scoring
S. Parthasarathy, A.E. Rosenberg
2407 Unknown-Multiple Signal Source Clustering Problem Using Ergodic HMM and Applied to Speaker Classification
J. Murakami, M. Sugiyama, H. Watanabe
2411 GMM and ARVM Cooperation and Competition for Text-independent Speaker Recognition on Telephone Speech
J.-L. Le Floch, C. Montacié, M.-J. Caraty
2415 Selective use of the Speech Spectrum and a VQGMM Method for Speaker Identification
Qiguang Lin, Ea-Ee Jan, ChiWei Che, Dong-Suk Yuk, James L. Flanagan
2419 Speaker Verification through Large Vocabulary Continuous Speech Recognition
Michael Newman, Larry Gillick, Yoshiko Ito, Don McAllaster, Barbara Peskin
2423 Predictive Neural Networks in Text Independent Speaker Verification: an Evaluation on the SIVA Database
Andrea Paoloni, Susanna Ragazzini, G. Ravaioli
SuP1L3 -- Acoustic Phonetics
2427 Durational Characterstics of Hindi Consonant Clusters
Nisheeth Shrotriya, Rajesh Verma, S.K. Gupta, S.S. Agrawal
2431 The Use of Wavelet Transforms in Phoneme Recognition
Beng T. Tan, Minyue Fu, Andrew Spray, Phillip Dermody
2435 Acoustic Properties of Phonemes in Continuous Speech for Different Speaking Rate
Hisao Kuwabara
2439 Prosodic Parameterization of Spoken Japanese Based on a Model of the Generation Process of F0 Contours
Hiroya Fujisaki, Sumio Ohno
2443 A Logistic Regression Model for Detecting Prominences
Arman Maghbouleh
2446 High-quality Prosodic Modification of Speech Signals
Beat Pfister
SuP1P1 -- Perception of Vowels and Consonants
2450 On the Syllable Structures of Chinese Relating to Speech Recognition
Jialu Zhang
2454 Can a Moraic Nasal Occur Word-initially in Japanese?
Takashi Otake, Kiyoko Yoneyama
2458 Perceptual Assimilation of American English Vowels by Japanese Listeners
W. Strange, Reiko Akahane-Yamada, B.H. Fitzgerald, R. Kubo
2462 Context and Speaker Effects in the Perceptual Assimilation of German Vowels by American Listeners
W. Strange, O.-S. Bohn, S. A. Trent, M.C. McNair, K.C. Bielec
2466 Examination of a Perceptual Non-native Speech Contrast: Pharyngealized/Non-pharyngealized Discrimination by
French-speaking Adults
Mohamed Zahid
2470 Context-dependent Relevance of Burst and Transitions for Perceived Place in Stops: It's in Production, not
Perception
Roel Smits
2474 The Perception of Morae in Long Vowels Comparison Among Japanese, Korean and English Speakers
Ryoji Baba, Kaori Omuro, Hiromitsu Miyazono, Tsuyoshi Usagawa, Masahiko Higuchi
2478 Juncture Cues to Disfluency
Robin J. Lickley
2482 Effects of Duration and Formant Movement on Vowel Perception
James R. Sawusch
2486 Benchmarking Human Performance for Continuous Speech Recognition
N. Deshmukh, R.J. Duncan, A. Ganapathiraju, J. Picone
2490 Intelligibility of Speech with Filtered Time Trajectories of Spectral Envelopes
Takayuki Arai, Misha Pavel, Hynek Hermansky, Carlos Avendano
2494 Perceptual Use of Vowel and Speaker Information in Breath Sounds
D. H. Whalen, Sonya M. Sheffert
2498 The Role of Neighborhood Relative Frequency in Spoken Word Recognition
Philippe Mousty, Monique Radeau, Ronald Peereman, Paul Bertelson
2502 Transitional Probability and Phoneme Monitoring
James M. McQueen, Mark A. Pitt
2506 Identification of Vowel Features from French Stop Bursts
Anne Bonneau
2510 Listening in a Second Language
Z.S. Bond, Thomas J. Moore, Beverley Gable
2514 Perception of Lexical Tone Across Languages: Evidence for a Linguistic Mode of Processing
Denis Burnham, Elizabeth Francis, Di Webster, Sudaporn Luksaneeyanawin, Chayada Attapaiboon, Francisco
Lacerda, Peter Keller
2518 Acoustic Correlates to the Effects of Talker Variability on the Perception of English /r/ and /l/ by Japanese
Listeners
James S. Magnuson, Reiko Akahane-Yamada
SuP2LP -- Closing Ceremony and Plenary Lecture
2522 Natural Communication with Machines - Progress and Challenge
James L. Flanagan
* Unavailable at time of printing