1.1 Nature of this Study
This work will define and explore a quantitative model of the interrelations among some of the more significant physiological variables in the dynamics of speech production. To specify a goal that might be significant and yet approachable, the following three criteria were set for the model:
(1) The model should be capable of describing the generation of the gross time and space variations of air pressure and volume velocity in the vocal mechanism.
These variations will be referred to hereafter as the breath-stream dynamics (PETERSON, 1957). The time integral of the product of the pressure difference across a constriction in the vocal tract and the volume velocity of the air flow through the constriction has the dimensions of energy, and therefore the breath-stream dynamics of the vocal mechanism may be thought of as a description of the space-time distribution of the energy associated with air flow in the vocal tract. In almost all cases the acoustic energy of speech is generated from this air flow energy. The breath stream dynamics of the vocal mechanism are thus closely related to the space-time distribution of acoustic energy sources in speech.
(2) It will be required that the model be complete enough to describe the breath-stream dynamics of a large class of those speech sounds that may be referred to as simple-released-plosives. No attempt will be made to include other types of stops in the model, namely, clicks, ejectives, implosives.
Simple-released-plosives are very common in languages. In fact, most, if not all, spoken natural languages employ a class of speech sounds having a mechanism that can be grossly described as the creation of a pulmonic pressure difference across an occlusion somewhere in the vocal tract, followed by a sudden release due to a relatively fast opening of the occlusion. In the terminology of physiological phonetics, the above description roughly delimits the class of released plosives (i.e., the class of phonoaspirated, aspirated, and unaspirated plosives, PETERSON and SHOUP, 1966). In this class we find (physiological) differences based on (a) the polarity of the pressure buildup, which determines the direction of the initial air flow in the release, (b) the air mechanism for producing the pressure difference (in this study, pulmonic), and (c) the possible multiple articulations. A multiple articulation here refers to the occurrence of articulatory closures or approaches of the articulators at more than one place-of-articulation, except for a velopharyngeal approximation or closure. From the entire family of released-plosive phone types, I would like to differentiate those which are egressive, use a primarily pulmonic (respiratory) mechanism for producing pressure, and do not involve multiple articulations. It would appear to be within the bounds of general phonetic terminology to call this class of phone types simple-released-plosives, the simple referring to the restriction on multiple articulations.
Though the simple-released-plosive classification does not cover many complex production mechanisms, multiple articulations, or certain consonant clusters, it does cover a multitude of productions in which rapid, coordinated movements of the respiratory, laryngeal, pharyngeal, velar, and oral articulatory structures are required. Most, though certainly not all, of the problems of breath-stream dynamics are encountered in a study of the simple-released-plosives.
(3) Though it is not possible at this time to specify a minimally redundant model (in which the separate parameters of the model operate independently to specify phonetically distinct plosive variants), it is desirable that the parameters which are specified be as independent as possible in the light of present day knowledge of the nature of speech.
PETERSON, in discussing the requirements for a physiologic phonetic theory, adds to the completeness and minimality requirements (embodied in his first, second and third requirements) the necessity of specifying the relation of the parameters of the model to the segmental phonetic units (PETERSON, 1964). He also observes that it is necessary to model the prosodic (suprasegmental) features of speech. Both of the latter requirements were purposely avoided in this study, though they would certainly be of importance in a general study of the breath-stream dynamics of speech.
A survey of the problem areas in the field of breath-stream dynamics, and a review of the literature through about 1956 has been presented previously by PETERSON (1957). As he points out, there were very little quantitative data until then, and many of the models discussed in the literature show little of the complexity of the actual system. For example, the pioneering measurements by STETSON (1951) were restricted by the instrumentation available, and were sometimes made with little or no amplitude or frequency calibration and linearity control. Occasionally the timing of multiple traces was suspect or difficult to determine, and in some cases it was not clear how the movements recorded were related to the physiological variables of interest. It is natural that the simple, qualitative models of breath-stream dynamics derived by STETSON from these measurements would not be entirely adequate when more quantitative exploration is employed (LADEFOGED, 1960; PETERSON, 1957, and Chapters 3 and 4 below).
Fortunately, in the eight years since PETERSONS survey, rapid advances in instrumentation techniques and a rebirth of interest in the physiological basis of speech production have resulted in a considerable body of quantitative data concerning many of those physiological variables of interest in the breath-stream dynamics of speech production. Measured quantities include average supraglottal and subglottal air pressure, air flow (volume and particle velocities), articulatory, laryngeal and respiratory movements, areas and volumes within the vocal mechanism, and muscle action potentials. However, in the study of breath-stream dynamics there appears to have been no successful effort so far to unify and relate the various findings in a model having much more explanatory power than Stetsons earlier qualitative observations.
A limited attempt to describe quantitatively the relation of breath-stream dynamics to plosive production was presented by STEVENS (1956) and later interpreted by FANT (1960, p.276 ff.). Subsequently, a model for the subglottal mechanism which could be applied to the study of breath-stream dynamics in plosive production was presented by VAN DEN BERG (1960), who has constructed a detailed electrical analog of the trachea, lungs and tissues (tissues in this case apparently includes the bony structures and the body fluids). VAN DEN BERG simulated the body and air flow movement below the glottis by means of a lumped-parameter, linear, electrical circuit with manually adjustable parameters. The network contains sections simulating the actions of the trachea, bronchi, lobuli, lung tissues and fluids, thorax, and abdomen. The literature does not seem to describe any effort to relate this model to the problems of breath-stream dynamics in speech, aside from certain aspects of singing. Much of Sections 2.1, 2.2 and 2.3 follows from VAN DEN BERGs observations and computations.
The task then in the present study is to isolate a fairly independent set of physiological parameters which are complete enough to describe most of the phonetically significant variations in the breath-stream dynamics of simple-released-plosives, and to show the static and dynamic relationships between these parameters by means of a mathematical model. Toward this end the relevant anatomical features of the speech mechanism are reviewed in Chapter 2, and a linear, lumped parameter model is developed in that chapter.
An attempt is made in Chapter 3 to extend the model of Chapter 2 to the underlying patterns of motor commands which are directed to the muscles of respiration. Due to the difficulty of this task, the primary goal in Chapter 3 is to define some of the problems and areas where more data are needed.
Though it is usually realized that the subglottal air pressure varies passively in response to variations in glottal-supraglottal air flow resistance (and, to a lesser extent, to certain supraglottal volume changes), these effects are often neglected in the literature due to the lack of quantitative concepts that could be used in predicting the extent of the variations. Those subglottal pressure variations which result from the rapid changes in air flow resistance in a simple-released-plosive are especially predominant. In Chapter 4 the effect which supraglottal movements have on subglottal pressure is explored using the model from Chapter 2, and the theoretical results are compared with measurements of subglottal pressure taken from the literature.
Some pertinent aspects of the variation of articulatory and glottal air flow resistance are discussed in Chapters 5 and 6, respectively. A primary goal is to study the role in plosive production of fast opening and closing movements of the glottis and the timing of these movements with respect to the articulatory opening and closing movements. The patterns of neural activity which underlie the actions of the laryngeal and articulatory structures are considered beyond the scope of this study.
Though it has long been known that a voiced plosive requires a mechanism for the absorption of the transglottal air flow during the period of articulatory closure, there has been little effort to quantify the effects of the various mechanisms that may be employed. In Chapter 7, the use of passive supraglottal cavity expansion, active supraglottal cavity enlargement, and incomplete velopharyngeal closure in voiced plosive production is explored.
Chapter 8 contains some conclusions concerning the variables operative in the breath-stream dynamics of simple-released-plosive production, and presents a summary of areas in which further investigation is required.