gensyn accepts input from a data file which you create with a text editor. This file contains instructions in a readable format for the synthesis of a speech waveform file based on Klatt's (1980) speech synthesis program. gensyn creates, as output, an unformatted binary file that is the input file to the klatlk, sf, and synsrc programs. By convention, the input file to gensyn has the extension .par, and the output file from gensyn (input to klatlk etc.) has the extension .ksp. Hereafter, these files will be referred to as par and ksp files respectively.
The contents of the input par file for gensyn specify, first, the general synthesizer configuration (e.g., cascade/parallel, sampling rate, number of samples per frame, etc.). Second, following configuration settings, the par file contains a synthesis script which lists values of time-variable control parameters at specified points in time. This script describes the time-course of formant frequencies, source amplitude, fundamental frequency, source switching, and so forth throughout the duration of the speech to be synthesized.
As a side note, the synthesis programs klatlk and sf implement (with some modifications) the code for a software cascade/parallel synthesizer (Klatt, 1980). The program klatlk is essentially the 1980 Klatt code with the only substantial modification being the ability to disable automatic formant amplitude adjustments in the all-parallel configuration. This is a capability that is sometimes useful in psychoacoustic experiments. The sf program is a further modification of the Klatt code with the original excitation generation removed. sf is essentially a time-varying filter which must have an excitation signal as well as time-varying control parameters as input. Thus, sf requires both the ksp control parameter file and an input waveform file containing the synthesis excitation signal. The excitation signals for sf can be generated using the program synsrc which uses the same parameter files (as output by gensyn).
The following is a detailed description of the two major sections
of a par file for gensyn. The names, meanings, default
values, and ranges for all possible parameters of the synthesizer are
given in the parameter table at the end of
this document. For additional details on the meaning of the
synthesizer parameters see Klatt (1980). The following assumes some
familiarity with Klatt's description of the synthesizer.
Configuration Section
The first section of a par file must be the configuration
section. gensyn locates this section by finding the word
CONFIG starting in column 1 of a line in the file. On
subsequent lines of the configuration section, you may specify default
settings for any of the 39 control parameters.
The syntax to be used in setting a parameter value is simply:
parameter_name = value
where parameter_name is one of the names listed in the control
parameter table and value is a numeric value within the range
acceptable for that parameter.
Note that all 39 parameters have default settings imposed by the synthesis software, and that values entered in the configuration section are only needed if you wish to change the default setting as shown in this parameter table.
In addition to setting default values within the configuration
section, you can specify that a parameter will be constant or variable
over time. By default, gensyn will assume that all parameters
are constants, i.e., that they will not change their values from one
time-frame to the next during synthesis. Consequently, any parameter
that you do want to vary over time (like formant frequency, or
amplitude) must be decalred a variable in the configuration section.
To declare a parameter variable use the format:
parameter_name = VAR
Finally, the configuration section must end with the word ENDCONFIG starting in column 1 of a line.
For example, here is a short CONFIG section:
CONFIG F0=100 NWS=40; SR=8000 F1=VAR; F2=VAR; F3=VAR AV=VAR ENDCONFIGNote that the format is fairly simple. One or more parameter settings can be specified on a line. If more than one is given on a line, semicolons are used to separate the individual parameter=value pairs. Order is not significant.
Additionally, gensyn allows comments to be placed within the parameter file. A comment is any text following an exclamation mark (!), or text following an asterisk in column 1. This means that comments preceeded by an exclamation mark can follow parameter specifications on the same line, while comments preceeded by an asterisk must be placed on lines by themselves since the asterisk will be the first character in the line.
Following the ENDCONFIG statement, gensyn assumes that it will see a synthesis script consisting of two or more time-frame specifications. Each time-frame starts with a TIME statement (either absolute as TIME=nnn or relative as TIME+nnn where nnn is the time in msec), and ends at the next TIME=nnn statement or an END statement (indicating end of the synthesis parameters). Within a time frame, you may indicate a new value for any variable parameter.
Although you can specify values for every time-variable parameter at every possible time-frame, it is not generally necessary to do so. gensyn will use linear interpolation to fill-in parameter values if they are unspecified for any given time frame. This means that the values of parameters need only be specified at inflection points in their trajectories. For instance, an F2 transition may start at TIME=0 and end at TIME=40, while F1 may start at TIME=15 and end at TIME=40. In setting this up, it would not be necessary to specify a value for F2 at TIME=15, even though F1 (and possibly other parameters) must be specified at TIME=15.
Formatting rules for the synthesis script are the same as those for
the configuration section: semicolons must separate specifications
placed on a single line, but spacing is unimportant, and the order of
parameters within a time frame is unimportant (except that a time
frame must start with a TIME statement). An asterisk or exclamation
mark may be used to indicate comments. See the example below for more information.
Limitations
At present, the most serious limitation in gensyn is that, because of
the way parameters are interpolated, all computation must be buffered
in memory. There is thus a maximum length (in number of time frames)
for files created using gensyn. The actual limit is a function of the
amount of available memory and of the number of variable parameters.
The fewer the number of variable parameters, the larger the number of
time frames that can be specified.
The current PC version of gensyn buffers frame data in an array of length 16000. The number of time frames which can be handled is (16000 / NVAR), where NVAR is the number of variable parameters.
* EXAMPLE.PAR * This is an example input file for generation of Klatt synthesis parameters * using the GENSYN program. This file produces something like /bae/. * * To try: * >GENSYN EXAMPLE EXAMPLE * >KLATLK EXAMPLE EXAMPLE * Peak signal level: -1.8dB * > * Alternatively, instead of running KLATLK, try: * >SYNSRC EXAMPLE EXSRC * ?SYNSRC-I-Peak signal level: -7.62 * >SF EXAMPLE,EXSRC EXAMPLE * ?SF-I-Peak signal level: -5.3 * CONFIG !Begin Configuration section * *-Initial values * GAI=70; F0=110 !All other defaults are ok * *-Declare these parameters variable * F1=VAR; F2=VAR; F3=VAR; F0=VAR; AV=VAR * ENDCONFIG * *-----Frame data (Indicate changes to any parameters over time) * TIME=0 AV=80; F1=500; F2=1500; F3=2250; F0=120 * *-Values at end of 30 msec transition * TIME=30 AV=80; F1=650; F2=1700; F3=2500 * *-A bit more change out to 50 msec (but only for F2) * TIME=50; F2=1775 * *-Now formants will be constant to end of the syllable *-Show end of signal at 250 msec with F0 dropping from previous *-designation at TIME=50 * TIME=250; F0=90 END
Name | Meaning | default | Min | Max |
AV | Amplitude of voicing | 0 | 0 | 80 |
AF | Amplitude of Frication | 0 | 0 | 80 |
AH | Amplitude of Aspiration | 0 | 0 | 80 |
AVS | Amplitude of Sinusoidal voicing | 0 | 0 | 80 |
F0 | Fundamental Frequency | 0 | 0 | 500 |
F1 | First Formant | 450 | 150 | 900 |
F2 | Second Formant | 1450 | 500 | 2500 |
F3 | Third Formant | 2450 | 1300 | 3500 |
F4 | Fourth Formant | 3300 | 2500 | 4500 |
FNZ | Frequency of Nasal Zero | 250 | 200 | 700 |
AN | Amplitude of Nasal formant | 0 | 0 | 80 |
A1 | Amplitude of F1 (Parallel only) | 0 | 0 | 80 |
A2 | Amplitude of F2 " | 0 | 0 | 80 |
A3 | Amplitude of F3 " | 0 | 0 | 80 |
A4 | Amplitude of F4 " | 0 | 0 | 80 |
A5 | Amplitude of F5 " | 0 | 0 | 80 |
A6 | Amplitude of F6 " | 0 | 0 | 80 |
AB | Ampl of Cascade/Parallel Bypass | 0 | 0 | 80 |
B1 | Bandwidth of F1 | 50 | 40 | 500 |
B2 | Bandwidth of F2 | 70 | 40 | 500 |
B3 | Bandwidth of F3 | 110 | 40 | 500 |
SW | Parallel/Cascade switch* | 0 | 0 | 2 |
FGP | Frequency of Glottal Pole | 0 | 0 | 600 |
BGP | Bandwidth of Glottal Pole | 100 | 100 | 2000 |
FGZ | Frequency of Glottal Zero | 1500 | 0 | 5000 |
BGZ | Bandwidth of Glottal Zero | 6000 | 100 | 9000 |
B4 | Bandwidth of F4 | 250 | 100 | 500 |
F5 | Fifth Formant Frequency | 3850 | 3500 | 4900 |
B5 | Bandwidth of F5 | 200 | 150 | 700 |
F6 | Sixth Formant Frequency | 4900 | 4000 | 4999 |
B6 | Bandwidth of F6 | 1000 | 200 | 2000 |
FNP | Frequency of Nasal Pole | 250 | 200 | 500 |
BNP | Bandwidth of Nasal Pole | 100 | 50 | 500 |
BNZ | Bandwidth of Nasal Zero | 100 | 50 | 500 |
FRA | Second Glottal resonator bandwidth | 200 | 100 | 1000 |
SR | Sampling rate | 10000 | 5000 | 20000 |
NWS | Number of samples per frame | 50 | 1 | 200 |
GAI | Overall Gain control | 48 | 0 | 80 |
NFC | Number of cascaded formants | 5 | 4 | 6 |