NonText Version

Speech Research Laboratory
AI. duPont Hospital for Children
and the
University of Delaware

SRL Main Page (text)

Info for Users (text)
Info for Clinicians (text)
Info for Researchers (text)

ModelTalker Speech Synthesizer

STAR: Speech Training, Assessment and Remediation (text)

Language and Speech

Newsletter (text)

Contact Us!

Research Info

Biphone Constrained Concatenation

The current version of the ModelTalker synthesizer is rooted in the diphone concatenation approach, but borrows some of the advantages of the more general unit concatenation schemes. This is a technology which selects units that are longer than diphones (biphones) comprising two adjacent phonemes from a corpus of speech and stores them in a synthesis database. During the synthesis process, ModelTalker searches among all available biphones to find ones from which contextually appropriate diphones can be extracted, and then concatenates those diphones to form the synthetic output. By postponing the selection of diphones until an utterance is to be synthesized (and when all of the context factors are known), the process of finding an optimal set of diphones is greatly simplified.

A simple extension of the algorithm we use for this process allows us to also store complete utterances in the synthesis database along with the biphones needed for general synthesis. If the synthesizer is then asked to produce an utterance which happens to be stored in its entirety in the database, it can reproduce the stored utterance exactly. This makes it possible for ModelTalker to blend smoothly between "synthetic" utterances that have the quality of recorded speech and synthetic utterances which sound more synthetic, but nonetheless retain the voice characteristics of the talker who recorded the speech originally.

The heart of this system is the speech database. The database is created using software we refer to as BCCdb. BCCdb takes the inventory of speech created by InvTool software and a file containing a list of wavefiles with the corresponding phoneme string, syllable boundary information, and syllable stress information (also used by InvTool). BCCdb converts this into a database of speech that can be searched efficiently for long stretches of speech as well as the goodness of acoustical match of two segments of speech that may potentially be appended for synthesis.

Projects | Publications | Related Links | Staff |
Facilities | Events | Positions