PROSTRAN -- Local Version 1.0

Name

Prostran - Prosody transplantation program.

Synopsis

prostran [options] <source> <template> <output>

Options

-f0:<percent>
Percentage of the difference between source and template f0 to impose on the output. -f0:0 would cause the output to have the f0 contour of the source. -f0:100 would cause the output to have the f0 contour of the template. -f0:50 would interpolate an f0 contour midway between that of the source and template. Default is 100%
-rms:<percent>
Interpolation parameter analogous to -f0 for waveform amplitude. Default is 100%
-time:<percent>
Interpolation parameter analogous to -f0 for speech timing. Default is 100%
-jitter:<percent>
Introduces complimentary period to period variations in duration expressed as a percentage of the pitch period duration. Default 0%
-smooth:<npass>
Number of passes with a binomial smoothing function to apply to the computed distance matrix. Default 0.
-win:<msec>
Analysis window length in msec for computing the distance matrix. Default 25.
-ord:<model_order>
Number of coefficients in the cepstrum analysis used to compute the distance matrix. Default 8.
-pun:<cost>
Cost associated with aligning epochs outside of corresponding segments in each waveform. Default 0.0
-dwt:<cost>
Cost of taking a diagonal step in traversing the distance matrix. Default is based on number of epochs in each utterance. When the utterances are of equal numbers of epochs, the default value is 2.0
-vwt:<cost>
Cost of talking aligning a voiced epoch with a voiceless epoch. Default 2.0
-debug
Prints information about the waveform files and the mapping function.

Description

Prostran copies speech, one pitch period at a time, from an input (source) file to an output file. In the process, prostran can alter the duration of each pitch period to change f0, delete or duplicate pitch periods to change segmental durations (or to preserve segmental durations when f0 is being changed), and modify the amplitude of pitch periods to change the amplitude envelop of the source speech. All of the possible modifications are based upon the values of corresponding parameters in a template waveform file.

The program uses dynamic time warping to compute the period by period alignment between the source and template waveforms. Once the alignment is computed, prostran estimates target values for each of the prosodic parameters (f0, rms amplitude, and timing) by interpolating between the corresponding source and template pitch periods (or voiceless epochs).

The dynamic time warping finds a mapping function which relates each epoch (a pitch period or a voiceless frame of speech) of the template file to one (not necessarily unique) epoch of the source file such that the total bark cepstral distance between related epochs in the two utterances is minimal. Several weighting factors additionally influence the distance calculations. In particular, there are:

To use prostran, it is necessary to have both the .wav files and .pps (pitch period) files present in the working directory. The .pps files may be created with wedw by pitch tracking the files. Prostran creates a .pps file for the output waveform file. See the wedw documentation for more infromation about .pps files.

AUTHOR

H.T.Bunnell, S.R.Hoskins