ALIGN -- Local Version 1.2

NAME

align - waveform label alignment program

SYNOPSIS

align [-ana:<type> [-dist:<n>] [-cepstrum:<n>]] [-nfea:<n>] [-diag:<n>] [-npass:<n>] [-wipeout] [-noforce] <template_file> <target_file>

DESCRIPTION

align uses dynamic time warping (DTW) to place segment markers in target per the locations of segment markers in template. Both target and template must be waveform files in either ASEL or RIFF format and template must have at least one segment defined. It also helps if the two files are different renditions of the same utterance since align will attempt to match the two files and locate segment boundaries in target that are equivalent to those defined in template. On completion, all segments defined in template will also be defined in target.

Options

-ana:<type>

Specifies the type of acoustic analysis to be performed. align currently recognizes the analysis types:

lpc - the standard autocorrelation method LPC analysis (e.g., Markel & Gray, 1975);
plp - Perceptual linear prediction (Hermansky, 1990);
spc - Spectral Principal Components [see splpc (1)];
bwf - Bark weighted filterbank.

The last three of these are all closely related in that they provide a spectral representation of the speech which is adjusted to account for aspects of human speech perception. The perception-based analyses also tend to produce better results than simple LPC analysis. The default is bwf.

-dist:<n>

Distance measure. This parameter is meaningful only if the analysis type is LPC or PLP (i.e., -ana:plp or -ana:lpc). It indicates which of four possible likelyhood ratio based distance measures to use. Values of <n> may be the digits 1 to 4 indicating respectively,

ignore gain and average the likelihood ratios,
set spectra to zero mean levels and average likelihood ratios,
incorporate model gain into the averaged distance measures,
ignore gain using a geometric mean of the likelihood ratios (see lrdist (3) for a more complete description).
If -cepstrum is in effect, -dist:4 is meaningless and the first three distance options relate to how gain effects are incorporated into the cepstral distance.

-cepstrum:<n>

Modifies the LPC and PLP analysis types to produce <n> cepstral parameters rather than the default autoregressive parameters. Note that the <n> here becomes the number of features in the spectral model rather than the <n> of -nfea, while -nfea is still interpreted as the model order for the plp or lpc solution. For cepstral analysis, <n> should correspond to about 2 msec of signal (e.g., 20 samples at 10 kHz rate). Due to array dimension constraints, <n> should not exceed 32.

-nfea:<n>

Number of spectral features to derive from the acoustical analysis for use in computing the distance matrix. For the default bwf analysis, n is the number of filter bands. For the other analyses, n is the number of components to use. Default in either case is 8. The maximum number of features for bwf, spc, and -cepstrum analyses is 32. For standard lpc and plp the maximum is 30.

-diag:<cost>

Normally, there is an extra cost (a factor of 2.0) associated with diagonal steps in the DTW algorithm. The default factor of 2 makes a diagonal move as costly as the two moves needed to reach the same point without a diagonal move. When this cost factor is reduced (say to 1.2 or so), the algorithm shows a bias toward an alignment path that lies along the diagonal of the distance matrix. In the extreme case (-daig:0), this would lead to the program performing a very costly linear interpolation of time in the template file to time in the target file. Sometimes, it is useful to coerce the DTW algorithm by reducing the diagonal cost, but it is not something to make a habit of.

-noforce

By default, align forces the alignment path to include all frames up to the last analysis frame of each file. if -noforce is in effect, align will search alternatives than allow the end of the template file to correspond to a frame internal to the target file. This can be helpful if the target file has unnecessary silence at the end. However, the DTW procedure tends to work better when the corresponding beginnings and ends of the files are known and forced.

-wipeout

By default, align merges new segment definitions into the existing segment table for the target file, resolving name conflicts by appending digits to the end of segment names. If a conflicting name is already six characters long, the name is truncated to make room for the digit. Setting the -wipeout switch causes align to replace the existing segment table with only the new segment definitions.

<template_file>

Name of the waveform file with segment markers already defined that is to serve as a template for marking segments in the target file.

<target_file>

File to be marked.

NOTES
As compiled, align uses 20 msec analysis windows and 5 msec steps. Twenty msec is possibly suboptimal as a window length since it can allow moderate frame to frame amplitude variation due to low frequency pitch periods. With the 5 msec step size there is uncertainty in the exact location of corresponding spectral features and consequently, the locations of assigned segment markers are only approximate. Also, as compiled, at most 200 analysis frames from each of the waveforms can be computed. Hence, step size will be larger than 5 msec for files longer than 1 second in duration.
Before getting very fancy, try the default settings for all options since the defaults are the ones which seemed to work best over the widest range of uses. If the program fails to assign reasonable marks, try first switching to -ana:plp, and/or -ana:spc, then, experiment with -nfea:. Change the -diag: value if marks tend to bunch up close to one end of the file. Try the -ana:lpc as a last resort since it rarely seems to work as well as the perceptually based analyses.
This program is still under development. Expect it to change again fairly soon. (But it's getting closer)
AUTHOR
H.T.Bunnell