General File Structure

In the computer, digitized waveform files are similar to many kinds of files that do not contain readable text. Commonly, a waveform file consists of a short 'header' section which describes some of the properties of the file (like sampling rate and sample resolution) and possibly other things. Most of the contents of a waveform file, of course, will be the waveform data itself which is most commonly stored as a stream of machine readable (binary) values with successive values corresponding to successive samples.

While digitized speech waveforms are most often monaural, many waveform file structures allow for the storage of stereo sound as well. For stereo, each 'sample' really consists of two samples, one representing the 'left' channel and one the 'right' channel. Typically, such data is stored as a stream of two-value records with each record containing a sample value for each of the stereo channels. In some cases this approach is extended to more than two channels. In speech and linguistic research, one or more additional channels of waveform data may contain some related non-speech signals. These are often physiological signals like an Electroglotograph (EGG) signal, nasal flow measurements, or even measurements of motion of articulators obtained along with speech data. For such multi-channel data, each sample time consists of a record containing one sample value for each channel, thus sample rate for multi-channel data refers to the number of multi-sample records per second.