GENSYN - Generate Synthesis Prarmeters



General


Syntax: gensyn input[.par] output[.ksp]

gensyn accepts input from a data file which you create with a text editor. This file contains instructions in a readable format for the synthesis of a speech waveform file based on Klatt's (1980) speech synthesis program. gensyn creates, as output, an unformatted binary file that is the input file to the klatlk, sf, and synsrc programs. By convention, the input file to gensyn has the extension .par, and the output file from gensyn (input to klatlk etc.) has the extension .ksp. Hereafter, these files will be referred to as par and ksp files respectively. The contents of the input par file for gensyn specify, first, the general synthesizer configuration (e.g., cascade/parallel, sampling rate, number of samples per frame, etc.). Second, following configuration settings, the par file contains a listing of parameter values at specified points in time. These are the values of the 39 control parameters described in Klatt (1980). While parameter values (e.g., frequencies of formants, amplitude of voicing, fundamental frequency, etc.) can be specified for every time-frame of the synthesis, it is not generally necessary to do so since gensyn will use linear interpolation to "fill in" all frames between non-adjacent time frames. An example par file is given later in this document.

As a side note, the synthesis programs klatlk and sf implement (with some modifications) the code for a software cascade/parallel synthesizer (Klatt, 1980). The program klatlk is essentially the 1980 Klatt code with the only substantial modification being the ability to disable automatic formant amplitude adjustments in the all-parallel configuration. This is a capability that is sometimes useful in psychoacoustic experiments. The sf program is a further modification of the Klatt code with the original excitation generation removed. sf is essentially a time-varying filter which must have an excitation signal as well as time-varying control parameters as input. Thus, sf requires both the ksp control parameter file and an input waveform file containing the synthesis excitation signal. The excitation signals for sf can be generated using the program synsrc which uses the same parameter files (as output by gensyn).

The following is a detailed description of the two major sections of a par file for gensyn. The names, meanings, default values, and ranges for all possible parameters of the synthesizer are given in the parameter table at the end of this document. For additional details on the meaning of the synthesizer parameters see Klatt (1980). The following assumes some familiarity with Klatt's description of the synthesizer.

Configuration Section

The first section of a par file must be the configuration section. gensyn locates this section by finding the word "CONFIG" starting in column 1 of a line in the file. On subsequent lines of the configuration section, you may specify default settings for any of the 39 control parameters. The syntax to be used in setting a parameter value is simply:

parameter_name=value

where parameter_name is one of the names listed in the control parameter table and value is a numeric value within the range acceptable for that parameter.

Note that all 39 parameters have default settings imposed by the synthesis software, and that values entered in the configuration section are only needed if you wish to change the default setting.

In addition to setting default values within the configuration section, you can specify that a parameter will be constant or variable over time. By default, gensyn will assume that all parameters are constants, i.e., that they will not change their values from one time-frame to the next during synthesis. Consequently, any parameter that you do want to vary over time (like formant frequency, or amplitude) must be decalred a variable in the configuration section. To declare a parameter variable use the format:

parameter_name=VAR

Finally, the configuration section must end with the word ENDCONFIG starting in column 1 of a line.

Within the configuration section, you may format parameter specifications rather freely. One or more specifications may be given per line. When multiple specifications are placed on a single line, each must be terminated with a semicolon. There are no constraints on the order in which parameter specifications are listed.

Additionally, gensyn allows comments to be placed within the parameter file to help you remember what you did and why you did it. A comment is any text following an exclamation mark (!), or text following an asterisk in column 1. This means that comments preceeded by an exclamation mark can follow parameter specifications on the same line, while comments preceeded by an asterisk must be placed on lines by themselves since the asterisk will be the first character in the line.

Frame Data

Following the ENDCONFIG statement, gensyn assumes that it will see two or more time-frame specifications. Each time-frame starts with a TIME=nnn (where nnn is the time in msec) statement, and ends at the next TIME=nnn statement or an END statement (indicating end of the synthesis parameters). Within a time frame, you may indicate a new value for any variable parameter.

gensyn will use linear interpolation to invent parameter values for time frames that you do not specify in the file, however, you must specify at least the first (TIME=0) and last time-frame for the synthesis. The interpolation of values for each parameter is independent of the interpolation for other parameters. Consequently, it is unnecessary to specify the value of every varying parameter at times when others are specified. For instance, an F2 transition may start at TIME=0 and end at TIME=40, while F1 may start at TIME=15 and end at TIME=40. In setting this up, it would not be necessary to specify a value for F2 at TIME=15, even though F1 (and possibly other parameters) must be specified at TIME=15.

Formatting rules for the frame data section of the file are the same as those for the configuration section. Semicolons must separate specifications placed on a single line, however, spacing is generally ignored, and the order of parameters within a time frame is unimportant. Again, an asterisk or exclamation mark may be used to indicate comments.

Limitations

At present, the most serious limitation in gensyn is that, because of the way parameters are interpolated, all computation must be buffered in memory. There is thus a maximum length (in number of time frames) for files created using gensyn. The actual limit is a function of the amount of available memory and of the number of variable parameters. The fewer the number of variable parameters, the larger the number of time frames that can be specified.

The current PC version of gensyn buffers frame data in an array of length 16000. The number of time frames which can be handled is (16000 / NVAR), where NVAR is the number of variable parameters.



Example Parameter file for gensyn

*  EXAMPLE.PAR
*  This is an example input file for generation of Klatt synthesis parameters
* using the GENSYN program.  This file produces something like /bae/.
*
* To try:
*  >GENSYN EXAMPLE EXAMPLE
*  >KLATLK EXAMPLE EXAMPLE
*  Peak signal level:  -1.8dB
*  >
* Alternatively, instead of running KLATLK, try:
*  >SYNSRC EXAMPLE EXSRC
*  ?SYNSRC-I-Peak signal level:  -7.62
*  >SF EXAMPLE,EXSRC EXAMPLE
*  ?SF-I-Peak signal level:  -5.3
*
CONFIG  !Begin Configuration section
*
*-Initial values
*
  GAI=70; F0=110     !All other defaults are ok
*
*-Declare these parameters variable
*
  F1=VAR; F2=VAR; F3=VAR; F0=VAR; AV=VAR
*
ENDCONFIG
*
*-----Frame data (Indicate changes to any parameters over time)
*
TIME=0  
   AV=80;    F1=500;   F2=1500; F3=2250; F0=120
*
*-Values at end of 30 msec transition
*
TIME=30
   AV=80;    F1=650;   F2=1700; F3=2500
*
*-A bit more change out to 50 msec (but only for F2)
*
TIME=50; F2=1775
*
*-Now formants will be constant to end of the syllable
*-Show end of signal at 250 msec with F0 dropping from previous
*-designation at TIME=50
*
TIME=250; F0=90
END


Klatt Synthesis Parameters

NameMeaningdefaultMinMax
AV Amplitude of voicing 0 0 80
AF Amplitude of Frication 0 0 80
AH Amplitude of Aspiration 0 0 80
AVS Amplitude of Sinusoidal voicing 0 0 80
F0 Fundamental Frequency 0 0 500
F1 First Formant 450 150 900
F2 Second Formant 1450 500 2500
F3 Third Formant 2450 1300 3500
F4 Fourth Formant 3300 2500 4500
FNZ Frequency of Nasal Zero 250 200 700
AN Amplitude of Nasal formant 0 0 80
A1 Amplitude of F1 (Parallel only) 0 0 80
A2 Amplitude of F2 " 0 0 80
A3 Amplitude of F3 " 0 0 80
A4 Amplitude of F4 " 0 0 80
A5 Amplitude of F5 " 0 0 80
A6 Amplitude of F6 " 0 0 80
AB Ampl of Cascade/Parallel Bypass 0 0 80
B1 Bandwidth of F1 50 40 500
B2 Bandwidth of F2 70 40 500
B3 Bandwidth of F3 110 40 500
SW Parallel/Cascade switch* 0 0 2
FGP Frequency of Glottal Pole 0 0 600
BGP Bandwidth of Glottal Pole 100 100 2000
FGZ Frequency of Glottal Zero 1500 0 5000
BGZ Bandwidth of Glottal Zero 6000 100 9000
B4 Bandwidth of F4 250 100 500
F5 Fifth Formant Frequency 3850 3500 4900
B5 Bandwidth of F5 200 150 700
F6 Sixth Formant Frequency 4900 4000 4999
B6 Bandwidth of F6 1000 200 2000
FNP Frequency of Nasal Pole 250 200 500
BNP Bandwidth of Nasal Pole 100 50 500
BNZ Bandwidth of Nasal Zero 100 50 500
FRA Second Glottal resonator bandwidth 200 100 1000
SR Sampling rate 10000 5000 20000
NWS Number of samples per frame 50 1 200
GAI Overall Gain control 48 0 80
NFC Number of cascaded formants 5 4 6
*Note that SW was binary in Klatt's implementation. Here, the values of SW have the following meaning: