Speech Production


The vocal tract may be viewed as a set of resonating cavities bounded by anatomical structures which are either fixed or movable.


Sound is generated in several ways and at several locations in the human vocal tract. The most common sound generation sources are the quasi-periodic vibration of the vocal cords and turbulent noise generated by the passage of air through a narrow constriction, usually in the oral cavity. More rarely, sounds are generated by plosive release of air (following the buildup of pressure behind an obstruction in the vocal tract), implosion (following the creation of a vacuum behind an obstruction in the vocal tract), and clicks created by, the action of the tongue pulling away from the roof of the mouth.

Wherever generated, the sounds underlying speech are relatively unstructured. For instance, the buzz-like sound of vocal cord vibration has a relatively simple spectral properties with harmonics at frequencies corresponding to integer multiples of the fundamental frequency. Similarly, friction noise generated by turbulence has a relatively broad frequency distribution with a somewhat high pass characteristic.

Despite the simple characteristics of the sound sources used in speech, the speech signal itself is complexly structured in both frequency and time. This structure derives from the response characteristics of the vocal tract with resonances (poles) and anti-resonances (zeroes) located at frequencies determined by a variety of factors, but primarily the length and cross-sectional area of the vocal tract above the location of the sound source. The signal is further structured in time by the motions of articulators which constantly effects changes in the vocal tract response characteristics.

Next Topic?