GENSYN - Generate Synthesis Prarmeters


Syntax: gensyn input[.par] output[.ksp]

gensyn accepts input from a data file which you create with a text editor. This file contains instructions in a readable format for the synthesis of a speech waveform file based on Klatt's (1980) speech synthesis program. gensyn creates, as output, an unformatted binary file that is the input file to the klatlk, sf, and synsrc programs. By convention, the input file to gensyn has the extension .par, and the output file from gensyn (input to klatlk etc.) has the extension .ksp. Hereafter, these files will be referred to as par and ksp files respectively.

The contents of the input par file for gensyn specify, first, the general synthesizer configuration (e.g., cascade/parallel, sampling rate, number of samples per frame, etc.). Second, following configuration settings, the par file contains a synthesis script which lists values of time-variable control parameters at specified points in time. This script describes the time-course of formant frequencies, source amplitude, fundamental frequency, source switching, and so forth throughout the duration of the speech to be synthesized.

As a side note, the synthesis programs klatlk and sf implement (with some modifications) the code for a software cascade/parallel synthesizer (Klatt, 1980). The program klatlk is essentially the 1980 Klatt code with the only substantial modification being the ability to disable automatic formant amplitude adjustments in the all-parallel configuration. This is a capability that is sometimes useful in psychoacoustic experiments. The sf program is a further modification of the Klatt code with the original excitation generation removed. sf is essentially a time-varying filter which must have an excitation signal as well as time-varying control parameters as input. Thus, sf requires both the ksp control parameter file and an input waveform file containing the synthesis excitation signal. The excitation signals for sf can be generated using the program synsrc which uses the same parameter files (as output by gensyn).

The following is a detailed description of the two major sections of a par file for gensyn. The names, meanings, default values, and ranges for all possible parameters of the synthesizer are given in the parameter table at the end of this document. For additional details on the meaning of the synthesizer parameters see Klatt (1980). The following assumes some familiarity with Klatt's description of the synthesizer.

Configuration Section

The first section of a par file must be the configuration section. gensyn locates this section by finding the word CONFIG starting in column 1 of a line in the file. On subsequent lines of the configuration section, you may specify default settings for any of the 39 control parameters. The syntax to be used in setting a parameter value is simply:

parameter_name = value

where parameter_name is one of the names listed in the control parameter table and value is a numeric value within the range acceptable for that parameter.

Note that all 39 parameters have default settings imposed by the synthesis software, and that values entered in the configuration section are only needed if you wish to change the default setting as shown in this parameter table.

In addition to setting default values within the configuration section, you can specify that a parameter will be constant or variable over time. By default, gensyn will assume that all parameters are constants, i.e., that they will not change their values from one time-frame to the next during synthesis. Consequently, any parameter that you do want to vary over time (like formant frequency, or amplitude) must be decalred a variable in the configuration section. To declare a parameter variable use the format:

parameter_name = VAR

Finally, the configuration section must end with the word ENDCONFIG starting in column 1 of a line.

For example, here is a short CONFIG section:

NWS=40; SR=8000
Note that the format is fairly simple. One or more parameter settings can be specified on a line. If more than one is given on a line, semicolons are used to separate the individual parameter=value pairs. Order is not significant.

Additionally, gensyn allows comments to be placed within the parameter file. A comment is any text following an exclamation mark (!), or text following an asterisk in column 1. This means that comments preceeded by an exclamation mark can follow parameter specifications on the same line, while comments preceeded by an asterisk must be placed on lines by themselves since the asterisk will be the first character in the line.

Synthesis Script

Following the ENDCONFIG statement, gensyn assumes that it will see a synthesis script consisting of two or more time-frame specifications. Each time-frame starts with a TIME statement (either absolute as TIME=nnn or relative as TIME+nnn where nnn is the time in msec), and ends at the next TIME=nnn statement or an END statement (indicating end of the synthesis parameters). Within a time frame, you may indicate a new value for any variable parameter.

Although you can specify values for every time-variable parameter at every possible time-frame, it is not generally necessary to do so. gensyn will use linear interpolation to fill-in parameter values if they are unspecified for any given time frame. This means that the values of parameters need only be specified at inflection points in their trajectories. For instance, an F2 transition may start at TIME=0 and end at TIME=40, while F1 may start at TIME=15 and end at TIME=40. In setting this up, it would not be necessary to specify a value for F2 at TIME=15, even though F1 (and possibly other parameters) must be specified at TIME=15.

Formatting rules for the synthesis script are the same as those for the configuration section: semicolons must separate specifications placed on a single line, but spacing is unimportant, and the order of parameters within a time frame is unimportant (except that a time frame must start with a TIME statement). An asterisk or exclamation mark may be used to indicate comments. See the example below for more information.


At present, the most serious limitation in gensyn is that, because of the way parameters are interpolated, all computation must be buffered in memory. There is thus a maximum length (in number of time frames) for files created using gensyn. The actual limit is a function of the amount of available memory and of the number of variable parameters. The fewer the number of variable parameters, the larger the number of time frames that can be specified.

The current PC version of gensyn buffers frame data in an array of length 16000. The number of time frames which can be handled is (16000 / NVAR), where NVAR is the number of variable parameters.

Example Parameter file for gensyn

*  This is an example input file for generation of Klatt synthesis parameters
* using the GENSYN program.  This file produces something like /bae/.
* To try:
*  Peak signal level:  -1.8dB
*  >
* Alternatively, instead of running KLATLK, try:
*  ?SYNSRC-I-Peak signal level:  -7.62
*  ?SF-I-Peak signal level:  -5.3
CONFIG  !Begin Configuration section
*-Initial values
  GAI=70; F0=110     !All other defaults are ok
*-Declare these parameters variable
*-----Frame data (Indicate changes to any parameters over time)
   AV=80;    F1=500;   F2=1500; F3=2250; F0=120
*-Values at end of 30 msec transition
   AV=80;    F1=650;   F2=1700; F3=2500
*-A bit more change out to 50 msec (but only for F2)
TIME=50; F2=1775
*-Now formants will be constant to end of the syllable
*-Show end of signal at 250 msec with F0 dropping from previous
*-designation at TIME=50
TIME=250; F0=90

Klatt Synthesis Parameters

AV Amplitude of voicing 0 0 80
AF Amplitude of Frication 0 0 80
AH Amplitude of Aspiration 0 0 80
AVS Amplitude of Sinusoidal voicing 0 0 80
F0 Fundamental Frequency 0 0 500
F1 First Formant 450 150 900
F2 Second Formant 1450 500 2500
F3 Third Formant 2450 1300 3500
F4 Fourth Formant 3300 2500 4500
FNZ Frequency of Nasal Zero 250 200 700
AN Amplitude of Nasal formant 0 0 80
A1 Amplitude of F1 (Parallel only) 0 0 80
A2 Amplitude of F2 " 0 0 80
A3 Amplitude of F3 " 0 0 80
A4 Amplitude of F4 " 0 0 80
A5 Amplitude of F5 " 0 0 80
A6 Amplitude of F6 " 0 0 80
AB Ampl of Cascade/Parallel Bypass 0 0 80
B1 Bandwidth of F1 50 40 500
B2 Bandwidth of F2 70 40 500
B3 Bandwidth of F3 110 40 500
SW Parallel/Cascade switch* 0 0 2
FGP Frequency of Glottal Pole 0 0 600
BGP Bandwidth of Glottal Pole 100 100 2000
FGZ Frequency of Glottal Zero 1500 0 5000
BGZ Bandwidth of Glottal Zero 6000 100 9000
B4 Bandwidth of F4 250 100 500
F5 Fifth Formant Frequency 3850 3500 4900
B5 Bandwidth of F5 200 150 700
F6 Sixth Formant Frequency 4900 4000 4999
B6 Bandwidth of F6 1000 200 2000
FNP Frequency of Nasal Pole 250 200 500
BNP Bandwidth of Nasal Pole 100 50 500
BNZ Bandwidth of Nasal Zero 100 50 500
FRA Second Glottal resonator bandwidth 200 100 1000
SR Sampling rate 10000 5000 20000
NWS Number of samples per frame 50 1 200
GAI Overall Gain control 48 0 80
NFC Number of cascaded formants 5 4 6
*Note that SW was binary in Klatt's implementation. Here, the values of SW have the following meaning: