PROSTRAN -- Local Version 1.0

Name

Prostran - Prosody transplantation program.

Synopsis

prostran [options] <source> <template> <output>

Options

-f0:<percent>: Percentage of the difference between source and template f0 to impose on the output. -f0:0 would cause the output to have the f0 contour of the source. -f0:100 would cause the output to have the f0 contour of the template. -f0:50 would interpolate an f0 contour midway between that of the source and template. Default is 100%
-rms:<percent>: Interpolation parameter analogous to -f0 for waveform amplitude. Default is 100%
-time:<percent>: Interpolation parameter analogous to -f0 for speech timing. Default is 100%
-jitter:<percent>: Introduces complimentary period to period variations in duration expressed as a percentage of the pitch period duration. Default 0%
-smooth:<npass>: Number of passes with a binomial smoothing function to apply to the computed distance matrix. Default 0.
-win:<msec>: Analysis window length in msec for computing the distance matrix. Default 25.
-ord:<model_order>: Number of coefficients in the cepstrum analysis used to compute the distance matrix. Default 8.
-pun:<cost>: Cost associated with aligning epochs outside of corresponding segments in each waveform. Default 0.0
-dwt:<cost>: Cost of taking a diagonal step in traversing the distance matrix. Default is based on number of epochs in each utterance. When the utterances are of equal numbers of epochs, the default value is 2.0
-vwt:<cost>: Cost of talking aligning a voiced epoch with a voiceless epoch. Default 2.0
-debug: Prints information about the waveform files and the mapping function.

Description

Prostran copies speech, one pitch period at a time, from an input (source) file to an output file. In the process, prostran can alter the duration of each pitch period to change f0, delete or duplicate pitch periods to change segmental durations (or to preserve segmental durations when f0 is being changed), and modify the amplitude of pitch periods to change the amplitude envelop of the source speech. All of the possible modifications are based upon the values of corresponding parameters in a template waveform file.

The program uses dynamic time warping to compute the period by period alignment between the source and template waveforms. Once the alignment is computed, prostran estimates target values for each of the prosodic parameters (f0, rms amplitude, and timing) by interpolating between the corresponding source and template pitch periods (or voiceless epochs).

The dynamic time warping finds a mapping function which relates each epoch (a pitch period or a voiceless frame of speech) of the template file to one (not necessarily unique) epoch of the source file such that the total bark cepstral distance between related epochs in the two utterances is minimal. Several weighting factors additionally influence the distance calculations. In particular, there are:

additional costs associated with aligning a voiced and voiceless epoch in the two utterances. The -vwt parameter specifies this cost.
additional costs associated with choosing a diagonal path from one node to the next in the distance matrix. Since a diagonal path is "shorter" in terms of the total number of nodes which must be traversed, it will be chosen unless non-diagonal paths are much less costly. The shortest possible path through the matrix (in terms of the number of steps) is equal to the number of epochs in the longer of the two utterances. The longest possible path (that does not involve any backward steps) is equal to the total number of epochs in both utterances. The diagonal weight that will equalize the two paths is thus (NX + NY) / MAX(NX, NY). This is the default -dwt value. It makes all possible paths equal in length and that is sometimes undesirable since it may encourage paths in which a single frame is duplicated many times. When this occurs, it leads to a very buzzy tonal quality to the speech. To reduce the problem, either reduce the size of -dwt (don't use values less than 1.0), or increase the amount of smoothing in the distance matrix, or both.
an optional cost which punishes alignment of epochs that lie outside the bounds of corresponding segments in the two utterances. For this to be used, the utterances much each contain all and only the same segments, and the segment definitions must collectively span each waveform (i.e., there can be no gaps or overlaps among the segments in a single waveform).

To use prostran, it is necessary to have both the .wav files and .pps (pitch period) files present in the working directory. The .pps files may be created with wedw by pitch tracking the files. Prostran creates a .pps file for the output waveform file. See the wedw documentation for more infromation about .pps files.

AUTHOR

H.T.Bunnell, S.R.Hoskins