PROSTRAN -- Local Version 1.0
Name
Prostran - Prosody transplantation program.
Synopsis
prostran [options] <source> <template> <output>
Options
- -f0:<percent>
- Percentage of the difference between source and
template f0 to impose on the output. -f0:0 would
cause the output to have the f0 contour of the
source. -f0:100 would cause the output to have the
f0 contour of the template. -f0:50 would interpolate
an f0 contour midway between that of the source and
template. Default is 100%
- -rms:<percent>
- Interpolation parameter analogous to -f0 for
waveform amplitude. Default is 100%
- -time:<percent>
- Interpolation parameter analogous to -f0 for speech
timing. Default is 100%
- -jitter:<percent>
- Introduces complimentary period to period variations
in duration expressed as a percentage of the pitch
period duration. Default 0%
- -smooth:<npass>
- Number of passes with a binomial smoothing function
to apply to the computed distance matrix. Default 0.
- -win:<msec>
- Analysis window length in msec for computing the
distance matrix. Default 25.
- -ord:<model_order>
- Number of coefficients in the cepstrum analysis used
to compute the distance matrix. Default 8.
- -pun:<cost>
- Cost associated with aligning epochs outside
of corresponding segments in each waveform. Default 0.0
- -dwt:<cost>
- Cost of taking a diagonal step in traversing the
distance matrix. Default is based on number of epochs
in each utterance. When the utterances are of equal
numbers of epochs, the default value is 2.0
- -vwt:<cost>
- Cost of talking aligning a voiced epoch with a
voiceless epoch. Default 2.0
- -debug
- Prints information about the waveform files and the
mapping function.
Description
Prostran copies speech, one pitch period at a time, from an
input (source) file to an output file. In the process, prostran can
alter the duration of each pitch period to change f0, delete or
duplicate pitch periods to change segmental durations (or to preserve
segmental durations when f0 is being changed), and modify the
amplitude of pitch periods to change the amplitude envelop of the
source speech. All of the possible modifications are based upon the
values of corresponding parameters in a template waveform
file.
The program uses dynamic time warping to compute the period by period
alignment between the source and template waveforms. Once the
alignment is computed, prostran estimates target values for each of
the prosodic parameters (f0, rms amplitude, and timing) by
interpolating between the corresponding source and template pitch
periods (or voiceless epochs).
The dynamic time warping finds a mapping function which relates each
epoch (a pitch period or a voiceless frame of speech) of the template
file to one (not necessarily unique) epoch of the source file such
that the total bark cepstral distance between related epochs in the
two utterances is minimal. Several weighting factors additionally
influence the distance calculations. In particular, there are:
- additional costs associated with aligning a voiced and voiceless epoch
in the two utterances. The -vwt parameter specifies this cost.
- additional costs associated with choosing a diagonal path from one
node to the next in the distance matrix. Since a diagonal path is
"shorter" in terms of the total number of nodes which must be
traversed, it will be chosen unless non-diagonal paths are much less
costly. The shortest possible path through the matrix (in terms of the
number of steps) is equal to the number of epochs in the longer of the
two utterances. The longest possible path (that does not involve any
backward steps) is equal to the total number of epochs in both
utterances. The diagonal weight that will equalize the two paths is
thus (NX + NY) / MAX(NX, NY). This is the default -dwt value. It makes
all possible paths equal in length and that is sometimes undesirable
since it may encourage paths in which a single frame is duplicated
many times. When this occurs, it leads to a very buzzy tonal quality
to the speech. To reduce the problem, either reduce the size of -dwt
(don't use values less than 1.0), or increase the amount of smoothing
in the distance matrix, or both.
- an optional cost which punishes alignment of epochs that lie
outside the bounds of corresponding segments in the two
utterances. For this to be used, the utterances much each contain all
and only the same segments, and the segment definitions must
collectively span each waveform (i.e., there can be no gaps or
overlaps among the segments in a single waveform).
To use prostran, it is necessary to have both the .wav files and .pps
(pitch period) files present in the working directory. The .pps files
may be created with wedw by pitch tracking the files. Prostran creates
a .pps file for the output waveform file. See the wedw documentation
for more infromation about .pps files.
AUTHOR
H.T.Bunnell, S.R.Hoskins