Text Version
Speech Research
duPont Hospital for Children/University of
Delaware


Speech Synthesis Projects

The speech synthesis projects seek to create high quality, natural sounding synthesized speech. Although a number of high quality synthesizers exist today, most use a restricted number of voices with no regional dialects. Few high quality female-voice speech synthesizers exist, and even fewer exist with a realistic sounding child's voice. A nonspeaking person using speech synthesis as a surrogate voice in an AAC device must choose between existing voices for an unlimited vocabulary, or a limited vocabulary with the voice of choice. This project seeks to eliminate the need for such a choice by developing a method for creating high quality, unlimited synthesized speech in any male, female, or child's voice.

To meet this objective, diphones have been chosen as the unit of concatenation for synthesis. Diphones are segments of speech that include the transition from a relatively stationary region of one phoneme to a similar region in an adjacent phoneme. Thus, diphones begin and end roughly in the middle of phonemes and span the transition between adjacent phonemes. First suggested by Peterson, Wang, and Silversten in 1958, diphones solve a number of problems that have been encountered with other synthesis units. Phonemes, the most logical synthesis unit, require very little storage space and allow for generation of an unlimited vocabulary. However, when using phonemes in synthesizing speech, complex algorithms are required to model the complicated transitions between phonemes. If transitions are not modeled appropriately, the resulting speech can be quite unnatural sounding. Additionally, there has been only weak success to date in using this method to model female and child speakers, and using this technique to model the speech of a particular talker is especially difficult.

The ASEL speech research group has several active synthesis projects. One component of our work is the developement of automatic diphone extraction techniques to speed the development of diphone inventories and improve the quality of the synthetic speech. We are continuing to improve our Text-to-Phonetics translation software to generate accurate word pronunciations and sentence level prosodic structure. Additionally, we are investigating coding methods for improving the quality of speech synthesized from diphones. To make our TTS system available to others, we are developing Windows and Unix APIs and TTS servers to allow our software to be used in standard applications and communication devices. Finally, in collaboration with the University of Puerto Rico, we are developing Spanish diphone inventorys and will be designing new rule systems for Spanish which are compatible with our existing TTS system.

A demo of our TTS system ModelTalker is now on line if you'd like to hear what our synthesizer and automatically extracted voices sound like.


Navigation


Speech Home Page

ModelTalker

ASEL Home Page

Projects
Publications
Related Links
Staff
Facilities
Upcoming Events

This document was last updated on April 7, 1998
Web Comments/Questions: yarringt@asel.udel.edu