AI. duPont Hospital for Children
University of Delaware
SRL Main Page (text)Info for Users (text)
Info for Clinicians
An issue we are exploring is the effects of different amounts and contents of recorded speech on the quality of the resulting synthesized speech. The Inventory composition directly affects the quality of the speech synthesis. Obviously, the more words, phrases and examples of each biphone that are recorded, the better the quality of the resulting synthesized speech. However, especially for people with ALS, there is a limitation on how much speech a person can record before becoming tired. The inventory must contain the recorded speech necessary for unrestricted English synthesis as well as the words and phrases that the user wants to be synthesized with "recording quality" (e.g. "Have a nice day!", family names, etc.), yet should be compact and manageable in size.
Our goal is to discover and eliminate redundancies in the current inventory. Currently the inventory consists of about 1400 words and phrases. However, it is very likely that a number of those words and phrases can be eliminated from the inventory without any noticeable effect on the quality of the resulting synthesized speech. It is also possible that users may be willing to sacrifice some quality in exchange for the ease of recording a smaller inventory. It is our intent to quantify the degradation in synthetic speech quality for different sized inventories, thus giving users a choice of inventory size and a good feel for the resulting synthetic speech quality they can expect.
The goal is to select the smallest list that provides high quality speech and allows synthesis of any possible English utterance and leads to user satisfaction.