gensyn accepts input from a data file which you create with a text editor. This file contains instructions in a readable format for the synthesis of a speech waveform file based on Klatt's (1980) speech synthesis program. gensyn creates, as output, an unformatted binary file that is the input file to the klatlk, sf, and synsrc programs. By convention, the input file to gensyn has the extension .par, and the output file from gensyn (input to klatlk etc.) has the extension .ksp. Hereafter, these files will be referred to as par and ksp files respectively. The contents of the input par file for gensyn specify, first, the general synthesizer configuration (e.g., cascade/parallel, sampling rate, number of samples per frame, etc.). Second, following configuration settings, the par file contains a listing of parameter values at specified points in time. These are the values of the 39 control parameters described in Klatt (1980). While parameter values (e.g., frequencies of formants, amplitude of voicing, fundamental frequency, etc.) can be specified for every time-frame of the synthesis, it is not generally necessary to do so since gensyn will use linear interpolation to "fill in" all frames between non-adjacent time frames. An example par file is given later in this document.
As a side note, the synthesis programs klatlk and sf implement (with some modifications) the code for a software cascade/parallel synthesizer (Klatt, 1980). The program klatlk is essentially the 1980 Klatt code with the only substantial modification being the ability to disable automatic formant amplitude adjustments in the all-parallel configuration. This is a capability that is sometimes useful in psychoacoustic experiments. The sf program is a further modification of the Klatt code with the original excitation generation removed. sf is essentially a time-varying filter which must have an excitation signal as well as time-varying control parameters as input. Thus, sf requires both the ksp control parameter file and an input waveform file containing the synthesis excitation signal. The excitation signals for sf can be generated using the program synsrc which uses the same parameter files (as output by gensyn).
The following is a detailed description of the two major sections of a par file for gensyn. The names, meanings, default values, and ranges for all possible parameters of the synthesizer are given in the parameter table at the end of this document. For additional details on the meaning of the synthesizer parameters see Klatt (1980). The following assumes some familiarity with Klatt's description of the synthesizer.
The first section of a par file must be the configuration section.
gensyn locates this section by finding the word "CONFIG" starting in
column 1 of a line in the file. On subsequent lines of the
configuration section, you may specify default settings for any of the
39 control parameters. The syntax to be used in
setting a parameter value is simply:
where parameter_name is one of the names listed in the control parameter table and value is a numeric value within the range acceptable for that parameter.
Note that all 39 parameters have default settings imposed by the synthesis software, and that values entered in the configuration section are only needed if you wish to change the default setting.
In addition to setting default values within the configuration section,
you can specify that a parameter will be constant or variable over
time. By default, gensyn will assume that all parameters are
constants, i.e., that they will not change their values from one
time-frame to the next during synthesis. Consequently, any parameter
that you do want to vary over time (like formant frequency, or
amplitude) must be decalred a variable in the configuration section.
To declare a parameter variable use the format:
Finally, the configuration section must end with the word ENDCONFIG starting in column 1 of a line.
Within the configuration section, you may format parameter specifications rather freely. One or more specifications may be given per line. When multiple specifications are placed on a single line, each must be terminated with a semicolon. There are no constraints on the order in which parameter specifications are listed.
Additionally, gensyn allows comments to be placed within the parameter file to help you remember what you did and why you did it. A comment is any text following an exclamation mark (!), or text following an asterisk in column 1. This means that comments preceeded by an exclamation mark can follow parameter specifications on the same line, while comments preceeded by an asterisk must be placed on lines by themselves since the asterisk will be the first character in the line.
Following the ENDCONFIG statement, gensyn assumes that it will see two or more time-frame specifications. Each time-frame starts with a TIME=nnn (where nnn is the time in msec) statement, and ends at the next TIME=nnn statement or an END statement (indicating end of the synthesis parameters). Within a time frame, you may indicate a new value for any variable parameter.
gensyn will use linear interpolation to invent parameter values for time frames that you do not specify in the file, however, you must specify at least the first (TIME=0) and last time-frame for the synthesis. The interpolation of values for each parameter is independent of the interpolation for other parameters. Consequently, it is unnecessary to specify the value of every varying parameter at times when others are specified. For instance, an F2 transition may start at TIME=0 and end at TIME=40, while F1 may start at TIME=15 and end at TIME=40. In setting this up, it would not be necessary to specify a value for F2 at TIME=15, even though F1 (and possibly other parameters) must be specified at TIME=15.
Formatting rules for the frame data section of the file are the same as those for the configuration section. Semicolons must separate specifications placed on a single line, however, spacing is generally ignored, and the order of parameters within a time frame is unimportant. Again, an asterisk or exclamation mark may be used to indicate comments.
The current PC version of gensyn buffers frame data in an array of length 16000. The number of time frames which can be handled is (16000 / NVAR), where NVAR is the number of variable parameters.
* EXAMPLE.PAR * This is an example input file for generation of Klatt synthesis parameters * using the GENSYN program. This file produces something like /bae/. * * To try: * >GENSYN EXAMPLE EXAMPLE * >KLATLK EXAMPLE EXAMPLE * Peak signal level: -1.8dB * > * Alternatively, instead of running KLATLK, try: * >SYNSRC EXAMPLE EXSRC * ?SYNSRC-I-Peak signal level: -7.62 * >SF EXAMPLE,EXSRC EXAMPLE * ?SF-I-Peak signal level: -5.3 * CONFIG !Begin Configuration section * *-Initial values * GAI=70; F0=110 !All other defaults are ok * *-Declare these parameters variable * F1=VAR; F2=VAR; F3=VAR; F0=VAR; AV=VAR * ENDCONFIG * *-----Frame data (Indicate changes to any parameters over time) * TIME=0 AV=80; F1=500; F2=1500; F3=2250; F0=120 * *-Values at end of 30 msec transition * TIME=30 AV=80; F1=650; F2=1700; F3=2500 * *-A bit more change out to 50 msec (but only for F2) * TIME=50; F2=1775 * *-Now formants will be constant to end of the syllable *-Show end of signal at 250 msec with F0 dropping from previous *-designation at TIME=50 * TIME=250; F0=90 END
|AV||Amplitude of voicing||0||0||80|
|AF||Amplitude of Frication||0||0||80|
|AH||Amplitude of Aspiration||0||0||80|
|AVS||Amplitude of Sinusoidal voicing||0||0||80|
|FNZ||Frequency of Nasal Zero||250||200||700|
|AN||Amplitude of Nasal formant||0||0||80|
|A1||Amplitude of F1 (Parallel only)||0||0||80|
|A2||Amplitude of F2 "||0||0||80|
|A3||Amplitude of F3 "||0||0||80|
|A4||Amplitude of F4 "||0||0||80|
|A5||Amplitude of F5 "||0||0||80|
|A6||Amplitude of F6 "||0||0||80|
|AB||Ampl of Cascade/Parallel Bypass||0||0||80|
|B1||Bandwidth of F1||50||40||500|
|B2||Bandwidth of F2||70||40||500|
|B3||Bandwidth of F3||110||40||500|
|FGP||Frequency of Glottal Pole||0||0||600|
|BGP||Bandwidth of Glottal Pole||100||100||2000|
|FGZ||Frequency of Glottal Zero||1500||0||5000|
|BGZ||Bandwidth of Glottal Zero||6000||100||9000|
|B4||Bandwidth of F4||250||100||500|
|F5||Fifth Formant Frequency||3850||3500||4900|
|B5||Bandwidth of F5||200||150||700|
|F6||Sixth Formant Frequency||4900||4000||4999|
|B6||Bandwidth of F6||1000||200||2000|
|FNP||Frequency of Nasal Pole||250||200||500|
|BNP||Bandwidth of Nasal Pole||100||50||500|
|BNZ||Bandwidth of Nasal Zero||100||50||500|
|FRA||Second Glottal resonator bandwidth||200||100||1000|
|NWS||Number of samples per frame||50||1||200|
|GAI||Overall Gain control||48||0||80|
|NFC||Number of cascaded formants||5||4||6|