PTRACK -- Local Version 0.5 (Beta)


Ptrack - Pitch tracking program.


ptrack [options] <file.wav> [<file2.wav> <file3.wav> ...]


Set the length of the PP detecting wavelet in msec. This is the function that it convolves with the waveform to estimate where pitch periods begin. The default is 8.0. For very high pitched voices, it might work better with a smaller window, but try the default first.
Set the percentage of increase in duration allowed from one pitch period to the next. The default is 130 (i.e., up to a 30% increase is acceptable).
Set the percentage of decrease in duration allowed from one pitch period to the next. The default is 70 (i.e., up to a 30% decrease in period length is acceptable).
Sets the threshold for making a voiced/voiceless decision based on the output energy of the PP detector wavelet. The default is 1000.0 (a pretty arbitrary number having to do with the internal scaling of the waveform data). Make the number larger if it puts too many voiced markers in voiceless regions. Make the number smaller if it misses too many real (but low amplitude) pitch periods. Note that if it misses a lot of high amplitude pitch periods, your problem is more likely to be with the window length than with the threshold.


Ptrack attempts to locate and mark all pitch periods in a waveform file. The location of each pitch period is stored in an output .pps file which ptrack creates using the basename of the .wav file being analyzed. Regions of speech identified as unvoiced are also marked in small arbitrary sized epochs corresponding in duration roughly to the durations of pitch periods in adjacent regions of voiced speech.

A two pass algorithm is used for the pitch tracking. In the first pass, F0 is estimated every 20 msec from a 40 msec window using a comb filtering algorithm to detect harmonic spacing. The F0 data are smoothed with a 5-point median filter and passed to the second stage analysis which attempts to locate the onset of each pitch period in the time-domain signal, constrained by the F0 information from the frequency-domain analysis. The time domain algorithm convolves a wavelet kernel function with the waveform which typically produces its strongest output at the positive-going onset of a pitch period. As pitch periods are detected, the shape of the wavelet is modified to optimize detection of local pitch periods. The success of this process depends crucially on the polarity of the speech waveform. By convention, we store speech wavforms with polarity such that the initial (usually steepest slope and/or highest amplitude) waveform excursion is positive-going. Waveforms not in this polarity should be inverted before attempting to use ptrack.


The Windows/DOS version requires the cwavwdll.dll, cudll.dll, and cdspdll.dll dynamic link libraries to run. These libraries should be istalled in the /Windows/System directory.

Unix versions require libcwav, libcdsp, and libcu shared libraries somewhere on your LD_LIBRARY path (or equivalent).


H.T.Bunnell, S.P.Eberhardt