Cosi Fan Tutte and
What it Means
Nonlinear Source-Tract Acoustic interaction in the Soprano Voice and Some Implications for the Definition of Vocal Efficiency
F1 Tuning in the Soprano Voice
What happens when F1 is much less than three times Fo? To help answer this question, let us consider the upper range of a soprano, in which Fo approaches Fl for the vowel /a/. According to Sundberg (1975), professional sopranos tend to alter their vocal tract as they sing in this range so as to keep F1 close to Fo. According to the standard linear, noninteractive source-tract acoustic theory, this type of F1 tuning would improve the efficiency of voice production, since the fundamental frequency component of the source waveform would be greatly amplified. The resulting radiated sound would be quite strong, provided that the vibratory pattern of the vocal folds was not weakened by the pressure pattern within the vocal tract caused by F1 although the spectrum of the resulting vowel would be rather sinusoidal and devoid of the coloration caused by higher harmonics. However, if we consider the acoustic interaction between the glottal source and the vocal tract acoustic impedance, the picture changes significantly.
In order to understand the relationships between glottal area, flow, and pres- sure, which explain the effect of the source-tract interaction, we can start with the diagram of the projected glottal area in Figure 19-1. Note that if F1 and Fo are matched, the ac pressure variations just above the glottis will be almost exactly in phase with the waveform of projected glottal area, independent of the assumption we make for source-tract acoustic interaction. This phase relationship stems from a basic property of a resonant system, namely, that the input impedance tends to be purely dissipative (nonreactive) at the resonance frequency. For a dissipative acoustic system, the air flow and applied pressure will be in phase. To relate the phase of the glottal air flow to the phase of the area waveform and complete the argument, we must also assume that the Fo component of the pressure variation just above the glottis, caused by the F1 resonance, is much larger than the other pressure components that might affect the glottal flow pattern. These other components are namely, the sub glottal pressure variations and the supraglottal variations caused by any acoustic impedance factor not related to the F1 resonance (such as higher resonances, the radiation impedance at the mouth, and any inertive components due to the flow pattern within or near the glottis). This assumption can be justified if the damping of the first formant is very low, as would be the case for a nonnasalized vowel with a high first formant produced with a nonbreathy voice, conditions that appear to hold for a good soprano singing an open vowel in the upper part of her register.
Under these conditions, we would have a variation in transglottal pressure (subglottal minus supraglottal) similar to that in the figure. Pressure during the most-closed portion of the glottal cycle would be increased by the "resonance pressure," while that during the open portion of the cycle would be decreased. The average transglottal pressure could be approximately the average subglottal pressure.
Now let us look at the implications for glottal air flow. Under the noninteractive assumption, the glottal air flow would approximately follow the variation in glottal area and be unaffected by the oscillations in transglottal pressure, as indicated in the sketch in Figure 19-2.
If linear acoustic interaction is assumed, the fundamental frequency component of the glottal flow waveform would be suppressed by the variations in transglottal pressure. As indicated by the shaded areas in Figure 19-2A, the result would be a suppression in the glottal source waveform by an amount that would have the waveform of a sinusoid at Fo, The average flow and higher harmonics would not be affected. Thus, with a linear interactive model, the enhancement of the Fo component is much less than in the noninteractive model, though some enhancement does occur. Also, the acoustic power (integral of flow times pressure) dissipated at the glottis decreases, even though average flow and pressure (determining the power supplied by the respiratory system) remain the same. Thus, the voice becomes acoustically more efficient.
However, there is one serious deficiency in the linear, interactive model, even though it is significantly better at predicting voice quality and efficiency than the noninteractive model. This deficiency is illustrated by the fact that it predicts a nonzero glottal air flow when the vocal folds are closed, even if we assume a complete vocal fold closure during this phase. But, if we assume that the greatly increased transglottal pressure during the closed phase does not disturb the pattern of complete closure (that the closure is firm enough to withstand the increased pressure), then the flow must be forced to zero during this period. By merely forcing the flow to go to zero as the closed period is approached, we get a first approximation to the flow predicted by the nonlinear interactive model, as shown in Figure 19-2B.
Note that in this first approximation to the nonlinear model, the Fo component has been greatly strengthened as compared to the linear model; the waveform looks more like a sinusoid at Fo. This would strengthen the radiated SPL at Fo. But, more significantly, the waveform components at frequencies other than Fo have also been altered. The high frequencies have been changed in a rather complex way, which would depend greatly upon the duty cycle of the glottal pulse, but not grossly increased or decreased in total for the rather typical duty cycle assumed here. In addition, the component at zero frequency, which is the average air flow, shows a significant decrease. This reduced average air flow not only causes a reduced glottal power dissipation as compared to the dissipation predicted by the linear model but decreased respiratory power as well.
To get a more accurate estimate of the glottal flow waveform in the nonlinear, interactive case, we should take into account that the first-approximation wave. form has a strengthened Fo component as compared to the waveform with linear interaction. This would increase the supraglottal Fo component ( Figure 19-1) and, therefore, increase the degree of interaction with the F1 resonance. In Figure 19-2C this is indicated by a second-approximation .waveform, in which another Fo component is removed from the first-approximation waveform during the open-glottis segment. This results in a further reduction in the average air flow. However, note that, if the open quotient is more than 50 percent (it is about 60 percent in the figure), the supraglottal pressure becomes negative as the vocal folds begin to open and as they approach closure, so as to increase the glottal air flow at the onset and offset of the glottal pulse. The net result will be a sharper onset and offset of the air flow pulse for the duty cycle assumed in the figure. This sharper onset and offset would increase the energy in the higher harmonics.
That our theoretical model of the effect of nonlinear acoustic interaction is plausible is illustrated by the actual glottal air flow waveform shown in Figure 19-3, from a professional soprano singing F#5 with a fairly high level of vocal effort during the vowel [a]. For a note in this vicinity, Fo is naturally close to F1 for the vowel [a], and therefore, a match between F1 and Fo is probably easiest to achieve for that vowel. The flow waveform was obtained from a circumferentially vented wire screen mask having an acceptable frequency response to about 3000 Hz. The mask output, measuring oral volume velocity, was inverse-filtered using a manually adjustable three formant filter, while observing simultaneous air flow and EGG waveforms during the repetitive playback of a short segment, using a two-channel transient recorder. The inverse filter was adjusted to make the filter output equal to zero during the closed-glottis period, as indicated by the EGG. The adjustment thus obtained was unambiguous and repeatable, though it should be emphasized that it required a subject with a clear period of complete glottal closure at this pitch and a moderately strong EGG waveform, as ours did.
The Fo value shown in the figure, 762 Hz ± 10 Hz was measured from the cycles caught by the transient recorder. (Some vibrato was present in the production.) The resulting inverse filter settings, also shown in the figure, indicate that the subject closely matched F1 (749 Hz ± 10 Hz) to Fo for this production. The glottal flow waveform indicates that the air flow during the open-glottis period was strongly suppressed by the supraglottal pressure variation at F1, which lagged the air flow pulse very slightly. An attempt to reproduce this result a few weeks later with the same subject produced a similar waveform except that the dip in the air flow pulse was not obvious; i.e., the flow waveform approximated a "square wave."
It should be mentioned that one effect of the flow resistance of the pneumotochograph mask used (roughly about 0.5 cm H2O-sec/liter) is to increase the damping of the vocal tract formants. Thus, the effect of F1 tuning shown in Figure 19-3 would probably be stronger without the mask. (The mask also results in a slight reduction of formant frequency; however, the singer may have taken this detuning into account in her production.) This would imply that with no mask in place, the supraglottal pressure variation can be strong enough to drive the flow during the center of the open-glottis phase even closer to zero than shown in Figure 19-3. For this to occur, the peak of the ac variation in supraglottal pressure would need to be similar in magnitude to the average subglottal pressure. Supraglottal pressures of this magnitude have been measured recently by Schutte and Miller (in press) Using dual catheter-mounted miniature pressure transducers: one below the glottis and one above the glottis.
Peak pharyngeal pressures as high as the subglottal pressure, though not explained
by a linear interactive model, are entirely consistent with a nonlinear model.
In fact, the nonlinear model indicates that if the resonance were sufficiently
underdamped (efficient), the net transglottal pressure could actually reverse
for part of the glottal cycle, to cause a brief period of negative air flow
(from the pharynx to the trachea). Whether a resonance that efficient can be
attained, or whether proper vocal fold oscillatory behavior could be maintained
with such a flow pattern, is not clear at this time, though Schutte and Miller's
measurements appear to show at least one case in which this has occurred.