Link to Text Version Speech Research Lab
design bar
Speech Research Info
Link to Speech Research Lab Main Page
Link to info for users of Augmentative and Alternative Communication Devices

Link to info for Clinicians

Link to info for Speech Researchers

Link to Model Talker Speech Synthesizer

Link to STAR : Speech Training, Assessment and Remediation

Link to Language and Speech Journal

Link to Speech Research Lab Newsletters

Contact Us!

 

 

 

 

 

Biphone Constrained Concatenation

The current version of the ModelTalker synthesizer is rooted in the diphone concatenation approach, but borrows some of the advantages of the more general unit concatenation schemes. This is a technology which selects units that are longer than diphones (biphones) comprising two adjacent phonemes from a corpus of speech and stores them in a synthesis database. During the synthesis process, ModelTalker searches among all available biphones to find ones from which contextually appropriate diphones can be extracted, and then concatenates those diphones to form the synthetic output. By postponing the selection of diphones until an utterance is to be synthesized (and when all of the context factors are known), the process of finding an optimal set of diphones is greatly simplified.

A simple extension of the algorithm we use for this process allows us to also store complete utterances in the synthesis database along with the biphones needed for general synthesis. If the synthesizer is then asked to produce an utterance which happens to be stored in its entirety in the database, it can reproduce the stored utterance exactly. This makes it possible for ModelTalker to blend smoothly between "synthetic" utterances that have the quality of recorded speech and synthetic utterances which sound more synthetic, but nonetheless retain the voice characteristics of the talker who recorded the speech originally.

The heart of this system is the speech database. The database is created using software we refer to as BCCdb. BCCdb takes the inventory of speech created by InvTool software and a file containing a list of wavefiles with the corresponding phoneme string, syllable boundary information, and syllable stress information (also used by InvTool). BCCdb converts this into a database of speech that can be searched efficiently for long stretches of speech as well as the goodness of acoustical match of two segments of speech that may potentially be appended for synthesis.

Projects | Publications | Related Links | Staff |
Facilities | Events | Positions