Ptrack attempts to locate and mark all pitch periods in a waveform file. The location of each pitch period is stored in an output .pps file which ptrack creates using the basename of the .wav file being analyzed. Regions of speech identified as unvoiced are also marked in small arbitrary sized epochs corresponding in duration roughly to the durations of pitch periods in adjacent regions of voiced speech.
A two pass algorithm is used for the pitch tracking. In the first pass, F0 is estimated every 20 msec from a 40 msec window using a comb filtering algorithm to detect harmonic spacing. The F0 data are smoothed with a 5-point median filter and passed to the second stage analysis which attempts to locate the onset of each pitch period in the time-domain signal, constrained by the F0 information from the frequency-domain analysis. The time domain algorithm convolves a wavelet kernel function with the waveform which typically produces its strongest output at the positive-going onset of a pitch period. As pitch periods are detected, the shape of the wavelet is modified to optimize detection of local pitch periods. The success of this process depends crucially on the polarity of the speech waveform. By convention, we store speech wavforms with polarity such that the initial (usually steepest slope and/or highest amplitude) waveform excursion is positive-going. Waveforms not in this polarity should be inverted before attempting to use ptrack.
The Windows/DOS version requires the cwavwdll.dll, cudll.dll, and cdspdll.dll dynamic link libraries to run. These libraries should be istalled in the /Windows/System directory.
Unix versions require libcwav, libcdsp, and libcu shared libraries somewhere on your LD_LIBRARY path (or equivalent).