GENSYN - Generate Synthesis Prarmeters

General

Syntax: gensyn input[.par] output[.ksp]

gensyn accepts input from a data file which you create with a text editor. This file contains instructions in a readable format for the synthesis of a speech waveform file based on Klatt's (1980) speech synthesis program. gensyn creates, as output, an unformatted binary file that is the input file to the klatlk, sf, and synsrc programs. By convention, the input file to gensyn has the extension .par, and the output file from gensyn (input to klatlk etc.) has the extension .ksp. Hereafter, these files will be referred to as par and ksp files respectively.

The contents of the input par file for gensyn specify, first, the general synthesizer configuration (e.g., cascade/parallel, sampling rate, number of samples per frame, etc.). Second, following configuration settings, the par file contains a synthesis script which lists values of time-variable control parameters at specified points in time. This script describes the time-course of formant frequencies, source amplitude, fundamental frequency, source switching, and so forth throughout the duration of the speech to be synthesized.

As a side note, the synthesis programs klatlk and sf implement (with some modifications) the code for a software cascade/parallel synthesizer (Klatt, 1980). The program klatlk is essentially the 1980 Klatt code with the only substantial modification being the ability to disable automatic formant amplitude adjustments in the all-parallel configuration. This is a capability that is sometimes useful in psychoacoustic experiments. The sf program is a further modification of the Klatt code with the original excitation generation removed. sf is essentially a time-varying filter which must have an excitation signal as well as time-varying control parameters as input. Thus, sf requires both the ksp control parameter file and an input waveform file containing the synthesis excitation signal. The excitation signals for sf can be generated using the program synsrc which uses the same parameter files (as output by gensyn).

The following is a detailed description of the two major sections of a par file for gensyn. The names, meanings, default values, and ranges for all possible parameters of the synthesizer are given in the parameter table at the end of this document. For additional details on the meaning of the synthesizer parameters see Klatt (1980). The following assumes some familiarity with Klatt's description of the synthesizer.

Configuration Section

The first section of a par file must be the configuration section. gensyn locates this section by finding the word CONFIG starting in column 1 of a line in the file. On subsequent lines of the configuration section, you may specify default settings for any of the 39 control parameters. The syntax to be used in setting a parameter value is simply:

parameter_name = value

where parameter_name is one of the names listed in the control parameter table and value is a numeric value within the range acceptable for that parameter.

Note that all 39 parameters have default settings imposed by the synthesis software, and that values entered in the configuration section are only needed if you wish to change the default setting as shown in this parameter table.

In addition to setting default values within the configuration section, you can specify that a parameter will be constant or variable over time. By default, gensyn will assume that all parameters are constants, i.e., that they will not change their values from one time-frame to the next during synthesis. Consequently, any parameter that you do want to vary over time (like formant frequency, or amplitude) must be decalred a variable in the configuration section. To declare a parameter variable use the format:

parameter_name = VAR

Finally, the configuration section must end with the word ENDCONFIG starting in column 1 of a line.

For example, here is a short CONFIG section:

CONFIG
F0=100
NWS=40; SR=8000
F1=VAR; F2=VAR; F3=VAR
AV=VAR
ENDCONFIG

Note that the format is fairly simple. One or more parameter settings can be specified on a line. If more than one is given on a line, semicolons are used to separate the individual parameter=value pairs. Order is not significant.

Additionally, gensyn allows comments to be placed within the parameter file. A comment is any text following an exclamation mark (!), or text following an asterisk in column 1. This means that comments preceeded by an exclamation mark can follow parameter specifications on the same line, while comments preceeded by an asterisk must be placed on lines by themselves since the asterisk will be the first character in the line.

Synthesis Script

Following the ENDCONFIG statement, gensyn assumes that it will see a synthesis script consisting of two or more time-frame specifications. Each time-frame starts with a TIME statement (either absolute as TIME=nnn or relative as TIME+nnn where nnn is the time in msec), and ends at the next TIME=nnn statement or an END statement (indicating end of the synthesis parameters). Within a time frame, you may indicate a new value for any variable parameter.

Although you can specify values for every time-variable parameter at every possible time-frame, it is not generally necessary to do so. gensyn will use linear interpolation to fill-in parameter values if they are unspecified for any given time frame. This means that the values of parameters need only be specified at inflection points in their trajectories. For instance, an F2 transition may start at TIME=0 and end at TIME=40, while F1 may start at TIME=15 and end at TIME=40. In setting this up, it would not be necessary to specify a value for F2 at TIME=15, even though F1 (and possibly other parameters) must be specified at TIME=15.

Formatting rules for the synthesis script are the same as those for the configuration section: semicolons must separate specifications placed on a single line, but spacing is unimportant, and the order of parameters within a time frame is unimportant (except that a time frame must start with a TIME statement). An asterisk or exclamation mark may be used to indicate comments. See the example below for more information.

Limitations

At present, the most serious limitation in gensyn is that, because of the way parameters are interpolated, all computation must be buffered in memory. There is thus a maximum length (in number of time frames) for files created using gensyn. The actual limit is a function of the amount of available memory and of the number of variable parameters. The fewer the number of variable parameters, the larger the number of time frames that can be specified.

The current PC version of gensyn buffers frame data in an array of length 16000. The number of time frames which can be handled is (16000 / NVAR), where NVAR is the number of variable parameters.

Example Parameter file for gensyn

*  EXAMPLE.PAR
*  This is an example input file for generation of Klatt synthesis parameters
* using the GENSYN program.  This file produces something like /bae/.
*
* To try:
*  >GENSYN EXAMPLE EXAMPLE
*  >KLATLK EXAMPLE EXAMPLE
*  Peak signal level:  -1.8dB
*  >
* Alternatively, instead of running KLATLK, try:
*  >SYNSRC EXAMPLE EXSRC
*  ?SYNSRC-I-Peak signal level:  -7.62
*  >SF EXAMPLE,EXSRC EXAMPLE
*  ?SF-I-Peak signal level:  -5.3
*
CONFIG  !Begin Configuration section
*
*-Initial values
*
  GAI=70; F0=110     !All other defaults are ok
*
*-Declare these parameters variable
*
  F1=VAR; F2=VAR; F3=VAR; F0=VAR; AV=VAR
*
ENDCONFIG
*
*-----Frame data (Indicate changes to any parameters over time)
*
TIME=0  
   AV=80;    F1=500;   F2=1500; F3=2250; F0=120
*
*-Values at end of 30 msec transition
*
TIME=30
   AV=80;    F1=650;   F2=1700; F3=2500
*
*-A bit more change out to 50 msec (but only for F2)
*
TIME=50; F2=1775
*
*-Now formants will be constant to end of the syllable
*-Show end of signal at 250 msec with F0 dropping from previous
*-designation at TIME=50
*
TIME=250; F0=90
END

Klatt Synthesis Parameters

Name	Meaning	default	Min	Max
AV	Amplitude of voicing	0	0	80
AF	Amplitude of Frication	0	0	80
AH	Amplitude of Aspiration	0	0	80
AVS	Amplitude of Sinusoidal voicing	0	0	80
F0	Fundamental Frequency	0	0	500
F1	First Formant	450	150	900
F2	Second Formant	1450	500	2500
F3	Third Formant	2450	1300	3500
F4	Fourth Formant	3300	2500	4500
FNZ	Frequency of Nasal Zero	250	200	700
AN	Amplitude of Nasal formant	0	0	80
A1	Amplitude of F1 (Parallel only)	0	0	80
A2	Amplitude of F2 "	0	0	80
A3	Amplitude of F3 "	0	0	80
A4	Amplitude of F4 "	0	0	80
A5	Amplitude of F5 "	0	0	80
A6	Amplitude of F6 "	0	0	80
AB	Ampl of Cascade/Parallel Bypass	0	0	80
B1	Bandwidth of F1	50	40	500
B2	Bandwidth of F2	70	40	500
B3	Bandwidth of F3	110	40	500
SW	Parallel/Cascade switch*	0	0	2
FGP	Frequency of Glottal Pole	0	0	600
BGP	Bandwidth of Glottal Pole	100	100	2000
FGZ	Frequency of Glottal Zero	1500	0	5000
BGZ	Bandwidth of Glottal Zero	6000	100	9000
B4	Bandwidth of F4	250	100	500
F5	Fifth Formant Frequency	3850	3500	4900
B5	Bandwidth of F5	200	150	700
F6	Sixth Formant Frequency	4900	4000	4999
B6	Bandwidth of F6	1000	200	2000
FNP	Frequency of Nasal Pole	250	200	500
BNP	Bandwidth of Nasal Pole	100	50	500
BNZ	Bandwidth of Nasal Zero	100	50	500
FRA	Second Glottal resonator bandwidth	200	100	1000
SR	Sampling rate	10000	5000	20000
NWS	Number of samples per frame	50	1	200
GAI	Overall Gain control	48	0	80
NFC	Number of cascaded formants	5	4	6

*Note that SW was binary in Klatt's implementation. Here, the values of SW have the following meaning:

SW = 0 -- Use cascade branch
SW = 1 -- Use parallel branch with formant amplitudes automatically adjusted to mimic cascaded formant interactions (Klatt's code).
SW = 2 -- Use parallel branch with formant amplitude adjustment disabled.