ALIGN -- Local Version 1.2
NAME
align - waveform label alignment program
SYNOPSIS
align
[-ana:<type> [-dist:<n>] [-cepstrum:<n>]]
[-nfea:<n>] [-diag:<n>] [-npass:<n>]
[-wipeout] [-noforce] <template_file>
<target_file>
DESCRIPTION
align
uses dynamic time warping (DTW) to place segment markers in target per
the locations of segment markers in template. Both target
and template must be waveform files in either ASEL or RIFF
format and template must have at least one segment defined. It
also helps if the two files are different renditions of the same
utterance since align will attempt to match the two files and
locate segment boundaries in target that are equivalent to those
defined in template. On completion, all segments defined in
template will also be defined in target.
Options
-
-ana:<type>
- Specifies the type of acoustic analysis to be performed. align
currently recognizes the analysis types:
- lpc - the standard autocorrelation method LPC analysis (e.g., Markel & Gray, 1975);
- plp - Perceptual linear prediction (Hermansky, 1990);
- spc - Spectral Principal Components [see splpc (1)];
- bwf - Bark weighted filterbank.
The last three of these are all closely
related in that they provide a spectral representation of the speech
which is adjusted to account for aspects of human speech
perception. The perception-based analyses also tend to produce better
results than simple LPC analysis. The default is bwf.
- -dist:<n>
- Distance measure. This parameter is
meaningful only if the analysis type is LPC or PLP (i.e.,
-ana:plp or -ana:lpc). It indicates which of four
possible likelyhood ratio based distance measures to use. Values of
<n> may be the digits 1 to 4 indicating respectively,
- ignore gain and average the likelihood ratios,
- set spectra to zero mean levels and average likelihood ratios,
- incorporate model gain into the averaged distance measures,
- ignore gain using a geometric mean of the likelihood ratios (see
lrdist (3) for a more complete description).
If -cepstrum is in effect, -dist:4 is meaningless and the first
three distance options relate to how gain effects are incorporated
into the cepstral distance.
-
-cepstrum:<n>
- Modifies the LPC and PLP analysis types to produce <n> cepstral
parameters rather than the default autoregressive parameters. Note
that the <n> here becomes the number of features in the spectral model
rather than the <n> of -nfea, while -nfea is still interpreted as the
model order for the plp or lpc solution. For cepstral analysis, <n>
should correspond to about 2 msec of signal (e.g., 20 samples at 10
kHz rate). Due to array dimension constraints, <n> should not exceed 32.
-
-nfea:<n>
- Number of spectral features to derive from the acoustical analysis for
use in computing the distance matrix. For the default bwf
analysis, n is the number of filter bands. For the other
analyses, n is the number of components to use. Default in
either case is 8. The maximum number of features for bwf,
spc, and -cepstrum analyses is 32. For standard lpc and
plp the maximum is 30.
-
-diag:<cost>
- Normally, there is an extra cost (a factor of 2.0) associated with
diagonal steps in the DTW algorithm. The default factor of 2 makes a
diagonal move as costly as the two moves needed to reach the same
point without a diagonal move. When this cost factor is reduced (say
to 1.2 or so), the algorithm shows a bias toward an alignment path
that lies along the diagonal of the distance matrix. In the extreme
case (-daig:0), this would lead to the program performing a very
costly linear interpolation of time in the template file to time in
the target file. Sometimes, it is useful to coerce the DTW algorithm
by reducing the diagonal cost, but it is not something to make a habit
of.
-
-noforce
- By default, align forces the alignment path to include all frames up
to the last analysis frame of each file. if -noforce is in
effect, align will search alternatives than allow the end of the
template file to correspond to a frame internal to the target
file. This can be helpful if the target file has unnecessary silence
at the end. However, the DTW procedure tends to work better when the
corresponding beginnings and ends of the files are known and forced.
-
-wipeout
- By default, align merges new segment definitions into the existing
segment table for the target file, resolving name conflicts by
appending digits to the end of segment names. If a conflicting name is
already six characters long, the name is truncated to make room for
the digit. Setting the -wipeout switch causes align to
replace the existing segment table with only the new segment
definitions.
-
<template_file>
- Name of the waveform file with segment markers already defined that is
to serve as a template for marking segments in the target file.
-
<target_file>
- File to be marked.
NOTES
As compiled, align uses 20 msec analysis windows and 5 msec
steps. Twenty msec is possibly suboptimal as a window length since it can
allow moderate frame to frame amplitude variation due to low frequency
pitch periods. With the 5 msec step size there is uncertainty in the
exact location of corresponding spectral features and consequently,
the locations of assigned segment markers are only approximate. Also,
as compiled, at most 200 analysis frames from each of the waveforms
can be computed. Hence, step size will be larger than 5 msec for files
longer than 1 second in duration.
Before getting very fancy, try the default settings for all options
since the defaults are the ones which seemed to work best over the
widest range of uses. If the program fails to assign reasonable marks,
try first switching to -ana:plp, and/or -ana:spc, then, experiment
with -nfea:. Change the -diag: value if marks tend to bunch up close
to one end of the file. Try the -ana:lpc as a last resort since it
rarely seems to work as well as the perceptually based analyses.
This program is still under development. Expect it to change again fairly
soon. (But it's getting closer)
AUTHOR
H.T.Bunnell