Windows EDW (WEDW)

Differences from Unix/DOS EDW.

Windows EDW (WEDW) is a fundamentally new program which attempts to provide similar functionality to the Unix/DOS version (EDW), but with a very different user interface. WEDW functions in a way that is most similar to EDW's "mark" mode, i.e., a cursor is always present and single-key segment marking and window positioning functions are always active. WEDW replaces EDW's command line interface with menu items for the selection of optional settings, extended mouse functions for segment label manipulation, and dialog boxes where text input is necessary. WEDW retains some of the appearance of EDW in that a waveform display region is always present while spectrogram and pitch marking windows can be toggled on and off as desired. Both EDW and WEDW read and write waveforms in an extended RIFF (Microsoft .WAV) format that includes waveform segment definitions and both are also able to read an older .WAV format that was the original format used by EDW. WEDW is still under development with several planned features presently inactive.

Appearance

This picture shows the screen appearance of WEDW when a waveform and spectrogram window are open. A waveform display window is always present, spectrogram window (as shown here) and a pitch marker window (not shown) may be toggled on and off as desired. The overall WEDW window can be resized using the mouse to click and drag a window corner as with most windows. The horizontal scale of all windows and the vertical scale of the waveform display window will be adjusted to the new overall window size. However, the vertical size of the spectrogram window is a constant number of pixels as determined by a settable parameter.

In addition to the standard window controls, features to note are:

The waveform file being viewed is displayed in the window title bar along with an indication of the file format.
A menu bar contains pull-down menus for File access, Editing, display Options, Play, and (for multi-channel files) Channel settings.
A status/toolbar contains readouts of waveform and spectrogram information and several buttons to control the display format and play the viewed signal.

All these features are described below in more detail.

Keyboard editing functions

Any time the mouse is within one of the graphic display windows, the following keys are bound to events as follows (case is ignored):

B - The current vertical cursor location marks the beginning of a segment. A dialog box appears to accept the segment name.
E - The current vertical cursor location marks the end of a segment. A dialog box appears to accept the segment name.

Note: when either the B or E marker of a new segment is first entered, both markers appear at the same location. WEDW allows one of the marks to be changed to a new location to generate a segment rather than location marker. Thereafter, if either begin or end marker is to be moved, WEDW will request confirmation before allowing the change.

L - The waveform is repositioned within the display window so that the sample at the current vertical cursor location is moved to the left border of the display. The time-extent of the window is unchanged.
R - The waveform is repositioned within the display window so that the sample at the current vertical cursor location is moved to the right border of the display. The time-extent of the window is unchanged.
^L- (CTRL+L) The waveform is repositioned within the display window so that the sample at the current vertical cursor location is moved to the left border of the display. The waveform sample located at the right border of the display is unchanged and time-extent is reduced to "stretch" the waveform to fit within the display window.
^R- (CTRL+R)The waveform is repositioned within the display window so that the sample at the current vertical cursor location is moved to the right border of the display. The waveform sample located at the left border of the display is unchanged and time-extent is reduced to "stretch" the waveform to fit within the display window.
V - Set the selected pitch marker to indicate a voiced pitch event.
U - Set the selected pitch marker to indicate an unvoiced event.

Note: V and U function only when the pitch marker to modify has been selected by clicking the left mouse button on the marker.

Mouse functions.

In addition to the standard use of the mouse with pull-down and pop-up menus, WEDW uses mouse events as follows:

Pitch Marker Window -

The pitch marker window as illustrated in this picture is a small window above the waveform display window in which markers are placed to indicate pitch and other events associated with the waveform. Markers that are associated with pitch periods in voiced regions of the signal have a small arrowhead at their base and are called voiced markers. Other markers may be placed to divide voiceless regions of the signal into smaller epochs of a size similar to a pitch period. These are termed voiceless markers.

When the pitch display is first enabled, WEDW searches the current working directory for a file that has the same base name as the waveform file being viewed but with the extension PPS. If a PPS file is found, it is read and information from the file is used to build the pitch marker display. However, WEDW is also able to estimate pitch marker information directly from the waveform (a process called pitch tracking) and use its estimates for the pitch marker display. Section 8 below provides more details on using the pitch tracking in WEDW.

When the pitch marker window is visible, the following mouse actions are used to edit the information:

Left_click on a marker to select it. The selected marker has a broader than usual top, as for example, the second voiced marker in the display above. Pressing the U key will set a selected marker to unvoiced; Pressing the V key will set a selected marker to voiced.
Left_double_click on a marker to delete it.
Left_double_click on white space to insert a marker.
Left_press and drag to move a marker to a new location.

Waveform/Spectrogram Windows -

Left_click to place a marker for subsequent Paste/Insert.
Left_press and drag to select a region (it will be bounded by dashed lines when the mouse button is released). This region becomes the object of CUT/COPY/+/Play selections or buttons.
Left_press and drag on a segment label to drag the segment boundary to a new location. By default, any segment boundaries which lie at the same sample location as the selected label will be moved as well. (see Label Grouping below)
Left_double_click on a segent label to bring up a dialog box which allows the segment to be deleted or the segment name to be changed.
CTRL+Left_press and drag to select and move only a single marker from a group of markers assigned to the same sample location. (see Label Grouping below).

Note: At all times, status windows in the toolbar report the time-coordinate of the mouse pointer. When the pointer is in the Pitch or Waveform window, the waveform amplitude is also displayed as an unscaled digital sample value (designated dv units). When the mouse pointer is in the spectrogram window, the status window displays the frequency coordinate corresponding to the vertical pointer location.

F0/RMS Windows -

Left_click at a frequency or amplitude by time coordinate to place a sketch marker (a green X) at that point. A subsequent left_click at a new location within the window will draw a straight line from the location of the previous sketch marker to the present location and the sketch marker will be moved to the present location. This only occurs when Line Drawing mode is enabled in the Edit menu.

Toolbar Buttons

Play - clicking the play button will play the signal in the active channel for either the selected region, the waveform subsumed by the window, or the entire file depending on which mode is selected in the Play menu item. Note that play will always play the signal associated with the active channel whether it is visible in the display or not.
Ch:n - This button always displays the currently active channel number, that is, the channel which will be heard when play is pressed and from which a spectrogram will be computed (if the spectrogram is turned on). Pressing the channel button will pop up a dialog box which allows a different channel to be selected as the active channel. When viewing single channel files, the button always displays Ch:0 and pressing the button will have no effect.
+ - If a selected region is smaller than the present display window size, the + button zooms into that region. Otherwise, the display is zoomed by a factor of two (i.e., the temporal resolution of the display is doubled and displays half as much "real" time). Pressing the Control key while clicking the + button will undo the last zoom (i.e., unzoom).
- - The display is unzoomed by a factor of two. Double clicking the - button will display the entire file in the display window.
Abs - Indicates that cursor time readout and end-time are in "absolute milliseconds" offset from the beginning of the waveform file. Clicking the Abs button will switch it to Rel.
Rel - Indicates that cursor time readout and end-time are in milliseconds relative to the begin time at the left edge of the display. Thus, the end-time is the duration of the current window and the cursor time is its position offset from the left edge of the display. Clicking the Rel button switches it to Abs.

Menu Items

Other items may appear in the menus, the following mentions only those which are active.

File:

View… - select a waveform file to view.
Save - save any changes to the current file.
Save As.. - save current file under a new name.
Stats... - show waveform file information (sampling rate/number of channels/segment definitions).
Quit - Exit WEDW without saving changes.
Exit - Exit WEDW after saving changes.

Edit:

Cut - Delete the currently selected region, but save it in a paste buffer.
Copy - Copy the selected region to the paste buffer.
Paste - Insert paste buffer contents at selected location.
Select All - As though the entire window was selected with the mouse.
Rename - Change the name of a segment.
Delete - Delete a segment definition (waveform is unchanged).
Insert - Insert the contents of another waveform file into the present file at the selected location. The inserted file also replaces the contents of the paste buffer.
Line Drawing - Selects line drawing mode for modifying F0 and RMS contours.
Modify - Applies changes to F0 and RMS contours.
Global... - Reports duration, average F0, and average RMS amplitude in the selected region and allows these to be altered.
Smooth - Runs a low pass filter over the F0 or RMS data and applies the smoothed contour to the speech.
Invert Active - Inverts the waveform for the active channel.

Options:

Waveform - Select among LINE|BAR|DOT waveform display formats and set scale multiplier for waveform display.
Labels:

Group - assign neighboring label boundaries in a selected region to a single sample location.
Settings... - Bring up dialog box for label settings.

Spectrogram:

Toggle On/Off - toggle spectrogram window.
Settings... - Bring up dialog box for spectrogram settings.

Pitch:

Toggle On/Off - Toggle pitch marker display.
Settings... - Adjust properties for pitch tracker (also reports F0 statistics). (See Pitch Tracking)
Track - Estimate pitch period locations for waveform visible in the window. (See Pitch Tracking)
Clear - Erase all existing pitch marks for waveform visible in the window. (See Pitch Tracking)

F0 Contour:

Toggle On/Off - toggle F0 window.
Settings... - Bring up dialog box for F0 settings.

RMS Contour:

Toggle On/Off - toggle RMS window.
Settings... - Bring up dialog box for RMS settings.

Save State - Save current parameter settings to wedw.ini file in the current directory. These settings will be in effect whenever wedw is started in the directory.

Play:

Window - Play button plays window contents (DEFAULT).
Region - Play button plays selected region.
File - Play button plays entire file.

Channel: The Channel menu only appears in the main menu bar when viewing a multi-channel file. By default, all channels of a multi-channel file are simultaneously displayed in the waveform window by vertically subdividing the window into as many regions are there are channels. However, the Channel menu allows any combination of channels to be selected for display. To select a single channel, simply select that channel in the display; all other selected channels will be deselected. To add a channel to one or more channels already selected, press the control key while the mouse is in the waveform window then while still pressing the control key, select the channel to be added from the Channel menu. This awkward arrangement is because at the moment WEDW will not detect the control key press unless the mouse pointer is in the waveform window. The items that would appear on the Channel menu are:

All - When checked (the default) all channels of a multi-channel waveform file will be displayed simultaneously within the waveform window.
Channel n Select channel n for display in the waveform window. There will be N (=the number of channels of data in the file) similar entries in the menu.

Spectrograms

The spectrogram displays speech information in a time-by-frequency-by-amplitude format, The time (X) axis is aligned with the waveform window time axis. The frequency (Y) axis indicates ascending frequency from a low frequency cutoff value (at the bottom of the display) to a high frequency cutoff value at the top of the display. Both these cutoff values can be set in the spectrogram settings dialog box (see below). Finally, sound amplitude at every time-by-frequency coordinate is displayed in shades of gray with white being the lowest amplitude and black the highest amplitude. The lowest and highest amplitude values can be set in the spectrogram settings dialog box, and also whether to use a linear or logarithmic amplitude scale. A logarithmic scale tends to bring out lower amplitude features of the spectrogram better than a linear scale, but a linear scale sometimes brings out formant frequency patterns better, especially in recordings that have substantial background noise.

Spectrogram Settings Dialog Box

Select "settings..." under the Options Spectrogram menu to bring up the dialog box for adjusting spectrogram parameters. Figure 2 shows this dialog box with its default settings as follows:

Screen Height -- the vertical extent of the spectrogram window in pixels. The spectrogram drawing is fastest when this number is a power of two and equal to the number of frequency values in the FFT used to compute the spectrogram. The size of the FFT is determined by a combination of the analysis window length (for the spectrogram FFTs, not the display window length) and the waveform sampling rate. The FFT length is the smallest power of two that is at least 64 and greater than or equal to the length of the analysis window. For instance, if the sampling rate is 10 kHz and the window length is between 6.4 and 12.8 msec, the FFT length will be 128. When the screen height is not a power of two or not equal to the FFT length, EDW must interpolate values of the spectrum to the screen pixel locations and this slows down the drawing process.
Window Length - the length of the FFT analysis window in msec. This parameter determines whether the spectrogram will be a narrow band or wideband spectrogram. Specifically, the bandwidth in Hertz of the spectrogram is approximately 2.0/Length in msec of the window. Thus, a window length of 6.0 msec corresponds to a bandwidth of about 333 Hz (2/.006 = 333) which is approximately the standard for a wideband spectrogram. For a narrow band spectrogram (45 Hz bandwidth) the window length would need to be about 44 msec, however, values in excess of 25 msec usually produce reasonable narrow band spectrograms.
Preemphasis - a coefficient in the range 0.0 - 1.0 which has the effect of emphasizing high frequency energy and de-emphasizing low frequency energy. Normally speech is heavily weighted to the low frequency end of the spectrum. A coefficient of 1.0 creates a highpass filter that removes all DC energy and has a 6 dB/Octave slope. Values of the preemphasis coefficient closer to zero produce less high frequency preemphasis.
Frequency range - specifies the frequency coordinate at the bottom (low frequency) and top (high frequency) of the spectrogram. By default, the low frequency cutoff is 0 and the high frequency is 5000 Hz or one half the sampling rate (the Nyquist frequency), whichever is smaller. You may change these parameters to see more detail of a region within the spectrum, however, the high frequency value cannot exceed the Nyquist frequency.
Amplitude Range - the decibel range from lowest (white) to highest (black) in the spectrogram. All amplitudes below the low value will appear white and all amplitudes above the high value will appear black. Note that these values are in decibels whether or not the Log Magnitude check box is selected.
Log Magnitude - the scaling applied to the amplitude range. When this box is checked, the grayscale for the spectrogram will be assigned in logarithmic steps which has the effect of allowing small differences at low amplitudes to be more obvious in the display. Removing the check from this box will cause amplitude to be displayed in linear grayscale in which most lower amplitude regions of the spectrogram will appear white. On displays that allow only a small number of shades of gray, the linear scale can often provide a more readable spectrogram.
Gray Levels - the number of shades of gray between white and black. Most modern displays will easily support 16, the default. Use as many as you can; it will make no difference to the speed of the drawing but will improve its appearance as the number of shades is increased.

Pitch Tracking and Settings

To perform pitch tracking in WEDW, the pitch window must be toggled on. This enables the Track item in the Pitch submenu of the Options menu. There are two modes of operation for the pitch tracker. For both modes, pitch tracking is only applied to the portion of the waveform that is visible in the waveform window, and pitch marks, if found, will replace all previously defined pitch marks for the visible portion of the waveform. This makes it possible to use different pitch tracking settings for different parts of the waveform if necessary. In the supervised pitch tracking mode, an example pitch period must be located manually by using the mouse to select a region of the waveform corresponding to one pitch period. This example is then used to seed the pitch tracker which will search the portion of the waveform visible in the waveform window in both directions starting at the location of the example pitch period. For the other, unsupervised, mode of pitch tracking, it is not necessary to manually select a seed; WEDW automatically selects a seed period using waveform amplitude and the F0 Mean value given in the Pitch Settings dialog box. Once a seed is selected either manually or automatically, the pitch tracking algorithm (described below) is the same for both pitch tracking modes. A checkbox labeled Require Seed in the Pitch Settings dialog box determines which method WEDW uses.

By default, the Require Seed box is checked indicating that the region markers delimit a single pitch period for use as the seed. When the Track item is selected in the Pitch menu, WEDW checks to see that the selected region corresponds to a period associated with an F0 between the Min and Max F0 values given in the Pitch Settings dialog box. If this condition is not met, WEDW displays an error box and you must either adjust the markers by reselecting the region, or adjust the values for F0 Min or Max in the dialog box then select pitch tracking from the options menu again. Note that the F0 Min and Max values are used both to screen the seed selection, and to report statistics after pitch tracking has completed. Therefore their values will generally change after each call to the pitch tracker and in some cases, especially after a failure in tracking, they may inherit unrealistic values that would need to be adjusted by hand before a new seed will be accepted.

When the Require Seed box is unchecked, the selected region, F0 Min, and F0 Max are all irrelevant, however, the F0 Mean value is used in estimating the duration of the internally generated seed period. Because of this it is important to set a realistic value in the F0 Mean field before starting unsupervised pitch tracking. As with F0 Min and Max, the Mean value is updated after each call to the pitch tracker to report the mean value obtained for the voiced pitch periods in the waveform. Thus, this field may also need to be reset to a realistic value if the tracker failed to run correctly.

The pitch tracking algorithm in WEDW uses a raised cosine windowed portion of the waveform centered around the onset of the seed period as the initial search template for an adjacent pitch period. The onset is assumed to be a positive-going zero crossing preceding the first (usually strongest) F1 fluctuation in the seed period. The search template is compared to the structure of the waveform within a reasonable range of distances away from the seed period using a correlation statistic. This range is determined by the amount of allowable jitter (period-to-period fluctuations in F0) in successive periods. The location of the subsequent period is taken as the location at which the search template correlates most strongly with the waveform in the region being searched. If this correlation is above a voiced/unvoiced threshold value, it is assumed that another pitch period has been detected and the onset of the new pitch period is windowed and averaged with the previous search template to form a new template that is used in searching for the next pitch period. If the maximum correlation value in the search region is below the voiced/unvoiced threshold, the present location is assumed to be unvoiced and an unvoiced marker is placed at a location corresponding to the average pitch period following the last pitch marker. When an unvoiced region of speech is encountered, the search template is replaced with a windowed inverted sine wave having a period corresponding to the duration of the window. The same algorithm is applied in both directions starting with the seed period to identify all predecessor as well as all successor periods to the seed period.

The pitch settings dialog box allows the parameters of this tracking algorithm to be adjusted to improve the performance of the algorithm with various talkers. The tracking parameters than can be adjusted are displayed in the pitch settings dialog box:

Window - The duration in msec of the cosine window used to select the search template. Values slightly less than the duration of an average pitch period seem to work fairly well.
Jitter Low - Shortest likely adjacent pitch period expressed as a percentage of the current period. The default value of 75% seems to work most of the time.
Jitter High - Longest likely adjacent pitch period expressed as a percentage of the current period. Something around 120% to 125% seems to work fairly well. Note that jitter values are always relative to the duration of the most recently identified pitch period. This makes it possible to set a fairly restricted amount of jitter and still track large variations in F0, as long as F0 changes fairly smoothly (i.e., the change from one period to the next does not exceed the allowed amount of jitter). The disadvantage of having jitter relative to the current pitch period is that errors, once made, tend to compound.
V/UV Threshold - Voiced/Unvoiced threshold in arbitrary units. If this value is 0.0, the pitch tracking algorithm sets a value based on the seed period, otherwise, the value in the edit box is used. Values in the neighborhood of 1000 seem to work reasonably well for speech of moderate amplitude.

Label Settings

The Label Settings dialog provides control of the grouping of segment boundaries, the font used to display segment labels, and whether, by default, segment boundaries that share a location move together when a segment is moved by dragging the boundary label with the mouse. Generally, when a segment boundary is moved by dragging its associated label with the mouse, only the selected segment boundary changes. However, as a special case, if more than one segment has a boundary at exactly the same location as the selected boundary, all segment boundaries which share the location are moved together. This is especially useful when, for example, the boundary between adjacent phonemes in a labeled wavform is represented simultaneously by the end marker for the earlier segment and the begin marker for the subsequent segment. Logically, it is the boundary comprising both segment markers that one is probably trying to move.

The grouping feature is provided because it can sometimes be difficult to place two boundary markers at exactly the same sample locations. By selecting the Group item in the labels menu segment markers which are quite close, but not exactly overlapped can be automatically adjusted to overlap. This in turn will ensure that they will normally move together when any boundary is dragged.

Grouping Precision - How close two boundary markers must be (in msec) to be collapsed to a single location when the Labels Group menu item is selected. This parameter works in conjunction with the selected region of the waveform to determine which segment boundaries are to be grouped by assignment to a single sample location.
Font Name - Font used for displaying segment labels. Obviously, this can be a phonetic font if one is available.
Points - Point size for the label font.
Shared Boundaries Move Together - Check box which determines the default behavior of overlapping boundary markers when one is dragged. The status of this checkbox can be inverted by pressing CTRL while dragging a segment label. That, when the box is checked, CTRL frees a boundary to move alone, and when the checkbox is clear, CTRL causes boundaries to move together.
Symbol Map File - File which specifies a mapping between characters or character strings in label names (which are typically standard ASCII) and a special symbol font such as an IPA font (see below). The file must be in the current directory or in the directory indicated by the EDWPATH environment variable.

Symbol Mapping for Segment Labels

WEDW provides a way to display special symbols such as IPA phonetic symbols when a font for the symbols is available. This is done by mapping between standard ASCII letters in segment labels and special symbol codes. The mapping interprets strings of alphabetic characters as tokens which can be replaced by characters or character strings from an specific font. For example, one might map the letter 'x' to the character code for schwa in an IPA font, or the sequence of letters 'ae' to the character code for the joined 'ae' character in an IPA font. The mapping itself is read by WEDW from a user-constructed file which specifies the name of the special symbol font and the mappings from input (ASCII label characters) to output (character codes in the symbol set) using the format:

FontName

<input letter>[<input letter>...] <output code>[,<output code>...]

IPAPhon
p	112
t	116
k	107
dx	228
b	98
d	100
g	103
q	214
m	109
n	110
ng	247
em	109,164
en	110,164
eng	247,164

For example, Table I shows a portion of a symbol mapping table for a font called IPAPhon. In some cases, single characters map to single codes. This is the case for p, t, k among others in Table I. Sometimes, multiple characters map to a single code as for dx and ng in this example. Sometimes, two or more codes are needed to represent a given input sequence as for em, en, and eng in the example.

When symbol mapping is enabled by specifying the name of a mapping file in the Labels Settings dialogue box, WEDW will break every segment label into one or more string tokens and search the mapping table for a matching token. If a match is found, the token is replaced by the output codes for the token, otherwise, the input token is assumed to correspond exactly to the output sequence. That is, tokens which are not found in the mapping table are displayed (in the symbol font) using the ASCII code of the input character(s). As a result of this strategy, Table I contains a number of single character mappings which are actually unnecessary. In particular, the mappings for p, t, k, b, d, g, m, and n are redundant since the symbol code for these letters is the same as their ASCII code. However, the ASCII letter 'q' maps to the phonetic symbol for a glottal stop and its code does not correspond to the character code for 'q, consequently, that entry is not redundant.

WEDW tokenizes segment labels using a very simple set of rules. All adjacent alphabetic characters (i.e., the letters a-z and A-Z) are assumed to be part of a single token and all other characters are treated as token delimiters. With one exception, delimiting characters are displayed in the output without mapping. The exceptional case is the '-' character which may be used to introduce a diacritic. WEDW allows for the possibility that - introduces diacritics, but it makes no assumptions about the diacritics themselves which must be specified (including the '-') in the mapping table if they are to be mapped. For instance, we use '-n' to mean nasalized and therefor would specify a nasalized schwa with the sequence ax-n. Our mapping table contains the lines:

allowing the sequence ax-n to appear as schwa with a tilde above it in the WEDW mapped display.

Prosodic Modification

As of September 3, 1997 WEDW provides the capability to modify the prosodic structure of speech. This feature is based on and requires pitch marker information and, if the prosodic modifications are to be successful, the pitch marker information must be accurate. When pitch marker information is available, WEDW is able to display either F0 or RMS amplitude data in place of the spectrogram display. All three of these display types (F0, RMS, and Spectrograms) inherit the same screen height parameters and selecting any one of these will cause the lower display window to appear (if it was not present) or to have its contents replaced by the selected display type. The pitch marker window may be present simultaneously with any of these, but it need not be as long as pitch marker data is available (e.g., in a .PPS file) for the waveform being edited. The following figure shows the F0 contour display enabled for the word "abnormal" produced by a male talker. The displayed contour consists of a series of red and blue line segments. Each line segment is equal in duration to the pitch epoch to which it corresponds. Red segments correspond to voiceless epochs while blue segments correspond to voiced epochs. In the figure a green X (called a sketch marker) indicates the start of a possible edit of the contour as described later.

All data in the F0 and RMS contour displays are linked to the pitch marker data such that changing the location of a pitch marker will immediately change the value of F0 and potentially the RMS value associated with the pitch epoch. Changing the location of a pitch marker does not change the speech waveform. Editing the F0 or RMS contour using the line drawing or other features described below does not immediately change either the waveform or the pitch marker data, however, once the F0/RMS changes have been applied, the speech waveform will be modified and the resulting modifications will then be reflected in the pitch marker data.

Prosodic features of duration, F0, and amplitude can be changed in three distinct ways. Two of these depend on selecting a region of the waveform over which modifications are applied. Either F0 or RMS contours can be smoothed (i.e., low pass filtered) by selecting a region of the displayed contour and then selecting smooth in the edit menu. The second method which applies to a selected region allows additive changes to duration, F0, or amplitude. When a region is selected, the global selection under the edit menu brings up a dialog box which reports the duration, average F0, and average RMS amplitude for the selected region. Within the dialog box, any of these values can be changed and the changes applied to the selected region. Duration is changed by replicating or deleting pitch periods from the selected region to approximately achieve the desired duration (note that the actual duration after applying the changes will be within one pitch period length of the requested duration). F0 is changed by altering the duration of each pitch period. Generally, when F0 is changed it is also necessary to add or delete pitch periods to maintain approximately the specified duration, and as a result, duration will also change slightly when F0 is changed. RMS amplitude is changed by increasing or decreasing the amplitude of each pitch period (taken separately) to achieve the requested amplitude. Because there is some interaction between adjacent pitch periods, the resulting amplitude changes are also likely to be only approximately those requested. Moreover, when RMS and duration or F0 are changed simultaneously, these changes interact. F0 changes are applied first, and then duration/RMS changes are applied in an attempt to minimize the consequences of the interaction. Still, this can result in deviations from the specified RMS value, especially when the number of pitch periods in the selected region has changed.

The third method for prosodic modification applies only to F0 and RMS contours since it involves changing the shape of the contour by sketching a new contour. For this method, it is first necessary to enable Line Draw mode (under the Edit menu) and thereafter, displayed values of F0 or amplitude are altered using the mouse. Once the desired contour has been drawn, the specified changes can be applied to the speech waveform by selecting the modify option in the Edit menu. In Line Drawing mode, clicking the left mouse button when the mouse pointer is within the lower window will place a sketch mark (a green X) at the pointer location. Moving the pointer to a new location and pressing the left button again will draw a new F0 or RMS contour by linear interpolation between the present pointer location and the sketch mark, and the sketch mark will then be moved to the present pointer location. Using this method, the desired contour is drawn by piece-wise linear approximation.

Bugs, Gotchas, and other Beasties

As of 9/5/97 -

Multi-file waveform displays are still not enabled.
There is no way within wedw to write a portion of the waveform to an external file.
Unless the display is set to 256 color mode, spectrograms are not in gray scale; they are very colorful, but difficult to interpret.
Pitch tracker cannot identify pitch periods at the edges of the display. This can cause markers to get dropped when retracking a portion of a vocalic interval.
There are probably numerous additional small bugs and problems in the prosodic modification features, but the version with these features activated is experimental...caveat user