Source Characteristics
Voicing Source
Glottal Volume Velocity
The source for voiced speech is generated by the quasi-periodic
vibration of the vocal folds. An air pressure difference is created
across the closed vocal folds by contraction of the chest and/or
diaphragm muscles. When the pressure difference becomes sufficiently
large, the vocal folds are forced apart and air begins to flow through
the glottis; this is the abduction phase of the glottal cycle. When
the pressure difference between the sub-glottal and supraglotttal
passages is sufficiently reduced, air flow begins to reduce and the
glottis begins to close. This is the adduction phase of the glottal
cycle. Adduction occurs more rapidly than abduction, apparently
because the tension on the vocal folds is reinforced by Bernoulli
effects. The glottis quickly closes, resulting in the closed phase of
the glottal cycle. These phases are shown in the figure below as
"adduct", "abduct", and "closed" respectively.
In this figure, a modeled glottal volume velocity function is plotted
(ordinate with arbitrary units) against time on the abscissa (units
are also arbitrary). The model which generated this function was
described by Rosenberg (1970), and is implemented in the program
synsrc included with this tutorial. Rosenberg tested several
model source functions to identify those which produced the most
natural sounding synthetic vowels. The model used in synsrc is
the preferred model which uses a function of cubic form to describe
the abduction phase and a quadratic function to describe the adduction
phase. An example of this waveform is also included as file rint.wav.
Pressure
While it is easiest to envision the physical process of vocal fold
vibration in terms of the volume velocity function, The actual
excitation of the vocal tract is generated by the pressure changes
associated with the cyclic variation in volume velocity. The following
figure shows both the flow (same function as above) and the pressure
waveform estimated from the flow. The X and Y axis units are again
arbitrary. Pressure increases during the abduction phase, drops
sharply during the adduction phase, and returns to zero during the
closed phase of each glottal cycle (for this model function). This
function was also generated by the program
synsrc.
An Alternative Model
A more recent model which directly estimates the pressure
waveform has been described by Fant, Liljencrants, and Lin (1985). An
example of the output of this model is shown below plotted in
arbitrary units of pressure against time and can be found in file lsrc.wav (again, generated by program synsrc). In this model, parameters control
the total period duration, the point of maximum rate of closure, the
rate of return to zero flow from the point of maximum closure rate,
and the amplitude and angular frequency of the sinusoidal part of the
pitch cycle. Since some of these parameters are not easily estimated
from an inverse filtered glottal waveform, Lin (1990) provides a
FORTRAN program which accepts information about the timing of waveform
features (more like Rosenberg's model) to generate the model source
function. It is this FORTRAN function which
synsrc uses.
References
Fant, G., Liljencrants, J., and Lin, Q. (1985). A four parameter model
of vocal flow. STL-QPSR 4/1985, pp. 1-13.
Lin, Q. (1990). Speech Production Theory and Articulatory Speech
Synthesis. TRITA-TOM-90-1, Appendix B.
Rosenberg, S. (1970). Glottal pulse shape and vowel quality. J. Acoust.
Soc. Am., 49, 2 (part 2), 583-590.