Correcting Low Frequency Phase Distortion

Personal Publications of Dr. Martin Rothenberg:

Correcting Low Frequency Phase Distortion

Accepted for publication in the Journal of Voice, Vol. 16, No. 1, 2002

Summary

Dynamic high-pass filtering with a -3dB frequency of a factor of ten or more below the voice fundamental frequency has a negligible affect on the amplitudes of the Fourier components of the waveform generated by an electroglottograph (sometimes referred to as a laryngograph). However, such a filter can significantly distort the waveform due to distortion in the phase or time alignment of these Fourier components. Such high-pass filtering can be introduced purposefully to stabilize the waveform by attenuating low frequency noise, or may be an undesired effect of using an amplification or data acquisition system designed for acoustic signals. For a given voice fundamental frequency, the amount of distortion depends greatly on the order or attenuation characteristics of the filter and on the type of EGG waveform. A high-order filter or a breathy voice tends to increase the amount of distortion. If the characteristics of the high pass filter are known, there are a number of digital filter techniques that can be used to correct the phase distortion. However, it is shown that a relatively simple analog network can also be used to obtain a correction that suffices for most applications.

In the recent second edition of their book, Clinical Measurement of Speech and Voice (Singular Publishing Group [2000]), Baken and Orlikoff correctly point out that high-pass filtering from various sources can significantly affect the waveform of an electroglottograph (EGG) as a representation of vocal fold contact area (VFCA). This is not usually a problem in applications in which the EGG signal is used only for the measurement of the voice periodicity, such as the tracking of fundamental frequency (F0) or the synchronization of a stroboscope. However, if the details of the waveform are being studied for their implications about the nature of the vocal fold vibratory cycle, or if the glottal closed quotient (the period of vocal fold contact divided by the vibratory period) is being estimated from the EGG waveform, waveform distortion caused by high-pass filtering can be an important factor. The type of filter that Baken and Orlikoff refer to is a filter formed from linear dynamic elements, such as capacitors and inductors in an electronic filter, and not a digital filter specifically designed to avoid such distortion.

EGG waveforms can be adversely affected by ‘dynamic’ high-pass filtering even if the -3dB cutoff frequency is as low as 1/10 the fundamental frequency of the waveform. For example, for a value of F0 of 100 Hz, as could be attained by a low pitched male voice, a filter having a -3dB frequency of as low as 10 Hz could may cause some appreciable waveform distortion, depending on the shape of the waveform and the strength of the filter. For example, a simple ‘one-pole’ filter (a network-theory categorization of the simplest type of dynamic filter, having an asymptotic attenuation of -6 dB per octave), with 3dB attenuation at 10 Hz, as is used in some Glottal Enterprises EGG units to stabilize the waveform, will cause only a small distortion for a typical waveshape, while a stronger filter with the same -3dB break point, such as a commonly used 4-pole Butterworth filter (attenuation of -24 dB per octave) may have a quite large affect on the waveform.

The type of waveform distortion that can be caused by high-pass filtering is illustrated in Figure 1. The figure shows the distortion in a 150 Hz (approximately) square wave (topmost waveform), when processed by filters with -3dB ‘break points’ at 10, 20 and 40 Hz, both for a minimally distorting one-pole filter (waveforms B,C,and D) and for a four-pole filter having a Butterworth configuration (waveforms E,F, and G). The value of 150 Hz represents a typical voice fundamental frequency and the square wave represents the type of waveform for which the distortion is most pronounced for a given filter. It can be seen from the figure that the result of the filters is a tendency of the waveform to move toward the average value (or zero level) during periods of constant value or slow change in the undistorted waveform. It can also be seen that, for a give -3dB frequency, the distortion caused by the 4-pole filter is almost three times as great as the distortion caused by the one-pole filter.

The type of distortion introduced in an actual EGG waveform by high-pass filtering is shown in Figure 2. A typical EGG waveform from a male adult speaker vocalizing at roughly 160 Hz was obtained from a Glottal Enterprises model EG2 unit, with the switch-selectable internal one-pole high-pass filter set to 2 Hz, a -3dB frequency for which there is negligible distortion introduced, and captured on a wide bandwidth transient recorder. The traces in the figure show the resulting waveforms when the stored signal is replayed through a simple one-pole high-pass filter that was -3dB at 10 Hz (screen A) and 40 Hz (screen B). The high-pass filtered waveforms are shown superimposed on the original (correct) waveform. It can be seen from the waveforms in screen A that the phase distortion introduced by a one-pole filter with a breakpoint in the range of 10 to 20 Hz may often be acceptable, since its affect on the waveform is small for typical male voices, and is generally negligible for higher pitched male voices and for female and child voices.

Since a filter set for an attenuation of 3 dB (an amplitude of 70.7% of original) at 40 Hz can be expected to have a negligible affect on waveform Fourier components at or above 150 Hz (less than a half-dB), the waveform distortion introduced is clearly due to distortion in the phase, or alignment in time, introduced in the components of the waveforms in Figures 1 and 2. For this reason, this type of problem in accurate EGG waveform representation after high-pass filtering is often referred to as ‘low frequency phase distortion’.

Potentially problematic low frequency phase distortion is commonly introduced in at least one of two ways. First, commercial EGG units often remove energy at frequencies below 10 or 20 Hz in order to stabilize the waveform display, since components below about 20 Hz primarily represent noise or movement artifacts that move the waveform vertically in a random fashion.

High-pass filtering also occurs in computer applications using the computer’s audio system to record waveforms, and not a special data acquisition card or module. The computer’s audio system may attenuate components below a break point that could vary anywhere between about 20 and 60 Hz, depending on the computer, since frequencies below this approximate range are not considered significant in most audio applications, and the resulting phase distortion at frequencies above the break point is not perceived by the auditory system. The filtering introduced by the computer’s audio system would be expected to be of the simple, one-pole type, though it is possible for such filtering to occur at more than one point in the audio amplification and digitization system. The 40 Hz, one-pole filter used for some of the figures in this paper was chosen to represent this type of high-pass filter distortion.

A Simple Method for Phase Correction

There are numerous possible methods for the correction of the low frequency phase distortion introduced by high-pass filtering, both by digital signal processing techniques and by means of analog networks. One of the simplest is the use of an analog network that can lower the break point of the offending filter by some arbitrary factor N. To correct for a simple one-pole high-pass filter, this correction can be accomplished by a passive analog network consisting of only two resistors and a capacitor. The values of resistance and capacitance can be chosen by the use of standard circuit design equations to place a pole-canceling zero at or near the pole frequency of the offending filter, and another pole at a frequency a factor of N lower. Figure 3 shows the circuit used for accomplishing this function in the Glottal Enterprises model C-1 compensator.

We have found that a correction network that reduces the break frequency by a factor of approximately 2.5 gives a good correction for typical PC audio systems. Thus with a factor of 2.5, an audio system with a -3 dB frequency of 40 Hz could be corrected to a new -3 dB frequency of roughly 16 Hz, with the precise extent of the compensation depending on the strength (number of poles) of the high-pass filtering in the audio system. Our experience has shown that a simple one-pole passive filter such as this can also yield good results correcting higher-order filters, if the compensator parameters are selected to optimize the compensation.

Figure 4 illustrates the effect on the resulting waveform of phase correction of the EGG signal used for Fig. 2, using a 3-element passive network. The EGG waveform used for Figure 2 was filtered by a 40 Hz single-pole high pass filter (screen 4A), then passed through the C-1 Compensator, with component values selected for optimal correction using a factor N of 2.5 (screen 4B). The resulting signal was captured, displayed on the computer monitor as an overlay on the true waveform, and printed. The compensator was optimized by first observing the compensated waveform for a similarly high-pass filtered square wave, and adjusting the compensator values for a good fit to the square wave. (See Figure 5 below.) This process is somewhat subjective, but the exact values selected did not appear to have much effect on the waveform with an actual voice signal.

A commercial square-wave generator was used for Figure 4. However, for the cmpensation of high-pass filtering within an EGG system, Glottal Enterprises can provide a "simulated larynx," the LS-1, which produces a square-wave shaped variation in electrical resistance that can be applied directly to the electrodes. The average resistance presented to the electrodes by the LS-1, and the percent change caused by the square wave, approximate values expected when the electrodes when the electrodes are applied to the neck during voice production. It can be seen from Figure 4 that the high pass filter distortion is essentially removed by the compensator. The maximum difference between the correct waveform and the compensated waveform is less than 2% of the amplitude of the waveform.

The effect of varying the amount of compensation on the resulting waveform is illustrated in Figure 5. The topmost panel in the figure (panel A) shows again the distortion caused by a 40 Hz one-pole high-pass filter, with the original square wave overlaid for comparison. In the lower three panels (B, C and D), a compensator with N = 2.5 is employed to reduce the distortion, with the zero introduced by the compensator increasing in frequency from B to C, and at an intermediate value in D that appeared to the writer to yield a minimum distortion.

The setting of the compensator parameters in D is referred to here as the setting resulting in optimum compensation, for the value of N chosen. Those readers familiar with the details of Fourier analysis can deduce from the optimally compensated waveform that the phases of the Fourier components -the fundamental and harmonics -are now correctly aligned, since the compensated waveform is symmetrical, however the amplitude of the fundamental frequency component is about 5% too great, although all three compensator settings, in B, C and D, clearly yield a marked improvement in the fit to the original square-wave waveform.

The distortion caused by high pass filtering is more pronounced in a breathy voice in which there are briefer periods of glottal closure. This is illustrated in the lower pitch, breathy waveform shown in Figure 5. This waveform exhibits a relatively flat open phase over about 70% of each cycle. The narrow vertical pulses show the periods during which the vocal folds are in contact. The Closed Quotient (contact interval divided by the glottal period), measured at the base of each pulse of the original waveform, is about 0.3. However, it can be seen that after high pass filtering, CQ measurements would be more problematic and tend to yield values that are too small, at least for breathy voice.

Figure 6B shows the high pass filtered EGG waveform after phase correction by the optimal compensator, N = 2.5. (The glottal cycles shown in B were from a different part of the recorded segment than those in panel A, and differ slightly in period, shape and closed quotient from those in A.) Though some distortion remains in the compensated waveform, the relative flatness during the glottal open period is retained, and measurements of CQ from the original and compensated waveforms would be virtually identical.

Other Methods for Phase Correction

There are numerous possible methods for phase correction using digital filter techniques and we mention only one here as an example. Digital filters are more conveniently specified by their impulse response than by their frequency response, though these specifications are essentially equivalent. If the impulse response of the offending high-pass filter is known, it is posible to use a digital high-pass filter that has an impulse response similar to that of the offending filter, but reversed in time. This time reversal results in the same frequency response, but reverses the phase response, thus cancelling the phase distortion. (This process is roughly analogous to the old technique of eliminating low-frequency phase distortion introduced in ordinary tape recording by replaying the recorded tape in the reverse direction.)

If the nature of the offending high-pass filter is not known precisely, a manual or automatic optimization procedure can be readily derived for a digital-filter compensator, using a known input waveform, such as a square wave. For an electroglottograph, a square-wave shaped variation in electrical conductance can be used that is applied to the EGG electrodes.

Summary

Dynamic high-pass filtering with a -3 dB frequency that is a factor of ten or more below the voice fundamental frequency has a negligible effect on the amplitudes of the Fourier components of an EGG waveform. However, such a filter can significantly distort the waveform due to distortion in the phase or time alignment of these Fourier components. Such high-pass filtering can be introduced purposefully to stabilize the waveform by attenuating low-frequency noise, or may be an undesired effect of using an amplification or data acquisition system designed for acoustic signals. For a given voice fundamental frequency, the amount of distortion depends greatly on the order or the attenuation characteristics of the filter and on the type of EGG waveform. Both a high-order filter and a breathy voice tend to increase the amount of distortion.

If the characteristics of the high-pass filter are known, there are a number of digital filter techniques that can be used to reduce phase distortion. However, it is shown that a relatively simple analogue network can be used to obtain a correction that suffices for most applications. If the precise characteristics of the filter are not known, the response to a square wave can be adjusted the compensator parameters for an optimal correction.

Home

Publications

E-mail

Papers online