Source Characteristics



Voicing Source

Glottal Volume Velocity

The source for voiced speech is generated by the quasi-periodic vibration of the vocal folds. An air pressure difference is created across the closed vocal folds by contraction of the chest and/or diaphragm muscles. When the pressure difference becomes sufficiently large, the vocal folds are forced apart and air begins to flow through the glottis; this is the abduction phase of the glottal cycle. When the pressure difference between the sub-glottal and supraglotttal passages is sufficiently reduced, air flow begins to reduce and the glottis begins to close. This is the adduction phase of the glottal cycle. Adduction occurs more rapidly than abduction, apparently because the tension on the vocal folds is reinforced by Bernoulli effects. The glottis quickly closes, resulting in the closed phase of the glottal cycle. These phases are shown in the figure below as "adduct", "abduct", and "closed" respectively.

In this figure, a modeled glottal volume velocity function is plotted (ordinate with arbitrary units) against time on the abscissa (units are also arbitrary). The model which generated this function was described by Rosenberg (1970), and is implemented in the program synsrc included with this tutorial. Rosenberg tested several model source functions to identify those which produced the most natural sounding synthetic vowels. The model used in synsrc is the preferred model which uses a function of cubic form to describe the abduction phase and a quadratic function to describe the adduction phase. An example of this waveform is also included as file rint.wav.

Pressure

While it is easiest to envision the physical process of vocal fold vibration in terms of the volume velocity function, The actual excitation of the vocal tract is generated by the pressure changes associated with the cyclic variation in volume velocity. The following figure shows both the flow (same function as above) and the pressure waveform estimated from the flow. The X and Y axis units are again arbitrary. Pressure increases during the abduction phase, drops sharply during the adduction phase, and returns to zero during the closed phase of each glottal cycle (for this model function). This function was also generated by the program synsrc.

An Alternative Model

A more recent model which directly estimates the pressure waveform has been described by Fant, Liljencrants, and Lin (1985). An example of the output of this model is shown below plotted in arbitrary units of pressure against time and can be found in file lsrc.wav (again, generated by program synsrc). In this model, parameters control the total period duration, the point of maximum rate of closure, the rate of return to zero flow from the point of maximum closure rate, and the amplitude and angular frequency of the sinusoidal part of the pitch cycle. Since some of these parameters are not easily estimated from an inverse filtered glottal waveform, Lin (1990) provides a FORTRAN program which accepts information about the timing of waveform features (more like Rosenberg's model) to generate the model source function. It is this FORTRAN function which synsrc uses.

References

Fant, G., Liljencrants, J., and Lin, Q. (1985). A four parameter model
    of vocal flow. STL-QPSR 4/1985, pp. 1-13.

Lin, Q. (1990). Speech Production Theory and Articulatory Speech
    Synthesis. TRITA-TOM-90-1, Appendix B.

Rosenberg, S. (1970). Glottal pulse shape and vowel quality. J. Acoust.
    Soc. Am., 49, 2 (part 2), 583-590.