Here are some general guidelines about how to record the sentences so that the quality of your synthetic voice is as high as it can get. Keep these guidelines in mind as you record.
What’s In The Inventory:
Just to warn you, not all of the items in the "sentence" list that you will be recording are actually complete sentences. Many of them are, and were chosen because they are frequently-used sentences by individuals who use voice synthesizers. Other list items consist of frequently-used phrases, and a few are single words. In many sentences, the words are in roughly grammatical English order, but the whole doesn’t really make sense, such as "Afternoons collect the counselor." Finally, some of the "sentences" contain nonsense words (see below). All of these different types of "sentences" follow the same general recording guidelines.
In addition, there will be a few sentences for which no audio prompt will be given. Instead, a female voice will say, "Please record the written prompt." The visual prompt will be displayed, and you are to record it as well as possible. A few such sentences are in the inventory for research purposes. This will also occur if you decide to add words or sentences of your own to the inventory (see the next section).
It is important that your recording conditions be as consistent and as constant as possible over the course of all your recordings. This is one of the purposes of the Performance Meters: to make sure that all of your recordings fall within the same narrow ranges of pitch and volume. But there are other conditions that need to be held fairly constant as well.
One obvious way of ensuring constant conditions is to do all of your recording in one session, but this will probably not be possible. Currently, it takes a healthy individual at least 3 hours to record 1650 sentences; an individual who is already losing his voice will take more time, and may also get tired more easily. So you will probably have to take several recording sessions to finish your recording. Try to record as much as you can in each session, so that you can record everything in fewer sessions.
Pick a consistent recording environment that you will use for all your recording sessions. Obviously, you want to pick a very quiet area, as free as possible from outside noises like TVs, radios, children, and open windows. If your recording room has windows, we suggest hanging some curtains or blankets over them to dampen any sound coming from outside. In addition, try to pick a consistent recording time for all of your sessions—preferably in the morning, when you are alert and well-rested.
As discussed back in Section 2, it is critical that your microphone be in the same position relative to your mouth from session to session. Try not to alter or bend your head-mounted microphone in any way during or between sessions. Relevant computer settings, such as the Recording Volume, should also be kept constant.
Also, it is possible for some general characteristics of your voice and speech to change from session to session. For example, you may speak slowly on some days, more quickly on others. It is a good idea to go back and listen to many of your recordings from previous sessions before you start a new session, to try to match the sound of your own voice as much as possible.
It is also very important that you pronounce the sentences accurately and precisely, and exactly as InvTool expects you to pronounce them. Let the audio prompts be your guide. Their purpose is to instruct you in exactly how to pronounce each word and sentence. Obviously, you shouldn’t try to imitate the prompt voice—for example, don’t try to sound male if you are female! After all, the purpose of InvTool is to capture your own voice. However, you should try to pronounce the sentences as the prompt voice does, as closely as possible.
For example, there are thousands of words in the English language that have multiple pronunciations, depending on the who speaks them, or the context in which they are spoken. For instance, you may pronounce the word ‘either’ roughly as "EE-ther", while someone else may pronounce it more like "EYE-ther". In cases such as these, you should pronounce the words just as the audio prompt does. If it says "EYE-ther", you should also say "EYE-ther".
There are also many cases of nonsense words in the sentence list, such as ‘fothe’ and ‘aggs’. Listen closely to the audio prompts to learn exactly how to say these words.
You should also try to imitate the intonation of the audio prompts. All of the sentences should be pronounced with the ordinary falling intonation of a declarative sentence—even if the prompt is a question. (ModelTalker adds the appropriate rising intonation to questions that it synthesizes.) None of the words in the sentence should be given extra emphasis.
In general, the sentences should sound casual but clear: not too stilted, but not mumbled either. The words should smoothly flow together as they do in ordinary speech. Do not put artifical moments of silence between the words—or your synthetic voice will sound just as artificial. Again, let the audio prompts be your guide.
One consonant you should be especially careful in pronouncing is the letter 'T'. This consonant has somewhat different pronunciations in different cases. For example, at the beginning of a word, as in 'top', it sounds hard, clear, and breathy. But in the middle of many words, like 'atom' and 'butter', it is much softer, and sounds almost like a 'D'. And at the end of a word, as in 'ant' or 'heat', you may not hear it very distinctly at all.
As you are recording, you may be tempted to over-enunciate your T's, pronouncing them all like the one in 'top'. Resist this temptation. Pronounce them as you normally would, because this is what the Pronunciation Meter will be expecting.
As mentioned before in Section 4, be careful not to chop off your recordings by pressing the "Record" button too late, or the "Stop" button too early. Make sure you leave about half a second of silence at the beginning and end of each recording.