Show Summary Details

Page of

Printed from Oxford Research Encyclopedias, Linguistics. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

date: 28 June 2022

The Source–Filter Theory of Speechfree

The Source–Filter Theory of Speechfree

  • Isao TokudaIsao TokudaRitsumeikan University

Summary

In the source-filter theory, the mechanism of speech production is described as a two-stage process: (a) The air flow coming from the lungs induces tissue vibrations of the vocal folds (i.e., two small muscular folds located in the larynx) and generates the “source” sound. Turbulent airflows are also created at the glottis or at the vocal tract to generate noisy sound sources. (b) Spectral structures of these source sounds are shaped by the vocal tract “filter.” Through the filtering process, frequency components corresponding to the vocal tract resonances are amplified, while the other frequency components are diminished. The source sound mainly characterizes the vocal pitch (i.e., fundamental frequency), while the filter forms the timbre. The source-filter theory provides a very accurate description of normal speech production and has been applied successfully to speech analysis, synthesis, and processing. Separate control of the source (phonation) and the filter (articulation) is advantageous for acoustic communications, especially for human language, which requires expression of various phonemes realized by a flexible maneuver of the vocal tract configuration. Based on this idea, the articulatory phonetics focuses on the positions of the vocal organs to describe the produced speech sounds.

The source-filter theory elucidates the mechanism of “resonance tuning,” that is, a specialized way of singing. To increase efficiency of the vocalization, soprano singers adjust the vocal tract filter to tune one of the resonances to the vocal pitch. Consequently, the main source sound is strongly amplified to produce a loud voice, which is well perceived in a large concert hall over the orchestra.

It should be noted that the source–filter theory is based upon the assumption that the source and the filter are independent from each other. Under certain conditions, the source and the filter interact with each other. The source sound is influenced by the vocal tract geometry and by the acoustic feedback from the vocal tract. Such source–filter interaction induces various voice instabilities, for example, sudden pitch jump, subharmonics, resonance, quenching, and chaos.

Subjects

  • Phonetics/Phonology

1. Background

Human speech sounds are generated by a complex interaction of components of human anatomy. Most speech sounds begin with the respiratory system, which expels air from the lungs (figure 1). The air goes through the trachea and enters into the larynx, where two small muscular folds, called “vocal folds,” are located. As the vocal folds are brought together to form a narrow air passage, the airstream causes them to vibrate in a periodic manner (Titze, 2008). The vocal fold vibrations modulate the air pressure and produce a periodic sound. The produced sounds, when the vocal folds are vibrating, are called “voiced sounds,” while those in which the vocal folds do not vibrate are called “unvoiced sounds.” The air passages above the larynx are called the “vocal tract.” Turbulent air flows generated at constricted parts of the glottis or the vocal tract also contribute to aperiodic source sounds distributed over a wide range of frequencies. The shape of the vocal tract and consequently the positions of the articulators (i.e., jaw, tongue, velum, lips, mouth, teeth, and hard palate) provide a crucial factor to determine acoustical characteristics of the speech sounds. The state of the vocal folds, as well as the positions, shapes, and sizes of the articulators, changes over time to produce various phonetic sounds sequentially.

Figure 1. Concept of the source-filter theory. Airflow from the lung induces vocal fold vibrations, where glottal source sound is created. The vocal tract filter shapes the spectral structure of the source sound. The filtered speech sound is finally radiated from the mouth.

To systematically understand the mechanism of speech production, the source-filter theory divides such process into two stages (Chiba & Kajiyama, 1941; Fant, 1960) (see figure 1): (a) The air flow coming from the lungs induces tissue vibration of the vocal folds that generates the “source” sound. Turbulent noise sources are also created at constricted parts of the glottis or the vocal tract. (b) Spectral structures of these source sounds are shaped by the vocal tract “filter.” Through the filtering process, frequency components, which correspond to the resonances of the vocal tract, are amplified, while the other frequency components are diminished. The source sound characterizes mainly the vocal pitch, while the filter forms the overall spectral structure.

The source-filter theory provides a good approximation of normal human speech, under which the source sounds are only weakly influenced by the vocal tract filter, and has been applied successfully to speech analysis, synthesis, and processing (Atal & Schroeder, 1978; Markel & Gray, 2013). Independent control of the source (phonation) and the filter (articulation) is advantageous for acoustic communications with language, which requires expression of various phonemes with a flexible maneuver of the vocal tract configuration (Fitch, 2010; Lieberman, 1977).

2. Source-Filter Theory

2.1 Source

There are four main types of sound sources that provide an acoustic input to the vocal tract filter: glottal source, aspiration source, frication source, and transient source (Stevens, 1999, 2005).

The glottal source is generated by the vocal fold vibrations. The vocal folds are muscular folds located in the larynx. The opening space between the left and right vocal folds is called “glottal area.” When the vocal folds are closely located to each other, the airflow coming from the lungs can cause the vocal fold tissues to vibrate. With combined effects of pressure, airflow, tissue elasticity, and collision between the left and right vocal folds, the vocal folds give rise to vibrations, which periodically modulate acoustic air pressure at the glottis. The number of the periodic glottal vibrations per second is called “fundamental frequency (fo)” and is expressed in Hz or cycles per second. In the spectral space, the glottal source sound determines the strengths of the fundamental frequency and its integer multiples (harmonics). The glottal wave provides sources for voiced sounds such as vowels (e.g., [a],[e],[i],[o],[u]), diphthongs (i.e., combinations of two vowel sounds), and voiced consonants (e.g., [b],[d],[ɡ],[v],[z],[ð],[ʒ],[ʤ], [h],[w],[n],[m],[r],[j],[ŋ],[l]).

In addition to the glottal source, noisy signals also serve as the sound sources for consonants. Here, air turbulence developed at constricted or obstructed parts of the airway contributes to random (aperiodic) pressure fluctuations over a wide range of frequencies. Among such noisy signals, the one generated through the glottis or immediately above the glottis is called “aspiration noise.” It is characterized by a strong burst of breath that accompanies either the release or the closure of some obstruents. “Frication noise,” on the other hand, is generated by forcing air through a supraglottal constriction created by placing two articulators close together (e.g., constrictions between lower lip and upper teeth, between back of the tongue and soft palate, and between side of the tongue and molars) (Shadle, 1985, 1991). When an airway in the vocal tract is completely closed and then released, “transient noise” is generated. By forming a closure in the vocal tract, a pressure is built up in the mouth behind the closure. As the closure is released, a brief burst of turbulence is produced, which lasts for a few milliseconds.

Some speech sounds may involve more than one sound source. For instance, a voiced fricative combines the glottal source and the frication noise. A breathy voice may come from the glottal source and the aspiration noise, whereas voiceless fricatives can combine two noise sources generated at the glottis and at the supralaryngeal constriction. These sound sources are fed into the vocal-tract filter to create speech sounds.

2.2 Filter

In the source-filter theory, the vocal tract acts as an acoustic filter to modify the source sound. Through this acoustic filter, certain frequency components are passed to the output speech, while the others are attenuated. The characteristics of the filter depend upon the shape of the vocal tract. As a simple case, consider acoustic characteristics of an uniform tube of length L=17.5cm, that is, a standard length for a male vocal tract (see figure 2). At one end, the tube is closed (as glottis), while, at the other end, it is open (as mouth). Inside of the tube, longitudinal sound waves travel either toward the mouth or toward the glottis. The wave propagates by alternately compressing and expanding the air in the tube segments. By this compression/expansion, the air molecules are slightly displaced from their rest positions. Accordingly, the acoustic air pressure inside of the tube changes in time, depending upon the longitudinal displacement of the air along the direction of the traveling wave. Profile of the acoustic air pressure inside the tube is determined by the traveling waves going to the mouth or to the glottis. What is formed here is the “standing wave,” the peak amplitude profile of which does not move in space. The locations at which the absolute value of the amplitude is minimum are called “nodes,” whereas the locations at which the absolute value of the amplitude is maximum are called “antinodes.” Since the air molecules cannot vibrate much at the closed end of the tube, the closed end becomes a node. The open end of the tube, on the other hand, becomes an antinode, since the air molecules can move freely there. Various standing waves that satisfy these boundary conditions can be formed. In figure 2, 1/4 (purple), 3/4 (green), and 5/4 (sky blue) waves indicate first, second, and third resonances, respectively. Depending upon the number of the nodes in the tube, wavelengths of the standing waves are determined as λ=4L, 4/3L, 4/5L. The corresponding frequencies are obtained as f=c/λ=490, 1470, 2450 Hz, where c=343m/s represents the sound speed. These resonant frequencies are called “formants” in phonetics.

Figure 2. Standing waves of an uniform tube. For a tube having one closed end (glottis) and one open end (mouth), only odd-numbered harmonics are available. 1/4 (purple), 3/4 (green), and 5/4 (sky blue) waves correspond to the first, second, and third resonances (“1/4 wave” means 1/4 of one-cycle waveform is inside the tube).

Next, consider that a source sound is input to this acoustic tube. In the source sound (voiced source or noise, or both), acoustic energy is distributed in a broad range of frequencies. The source sound induces vibrations of the air column inside the tube and produces a sound wave in the external air as the output. The strength at which an input frequency is output from this acoustic filter depends upon the characteristics of the tube. If the input frequency component is close to one of the formants, the tube resonates with the input and propagates the corresponding vibration. Consequently, the frequency components near the formant frequencies are passed to the output at their full strength. If the input frequency component is far from any of these formants, however, the tube does not resonate with the input. Such frequency components are strongly attenuated and achieve only low oscillation amplitudes in the output. In this way, the acoustic tube, or the vocal tract, filters the source sound. This filtering process can be characterized by a transfer function, which describes dependence of the amplification ratio between the input and output acoustic signals on the frequency. Physically, the transfer function is determined by the shape of the vocal tract.

Finally, the sound wave is radiated from the lips of the mouth and the nose. Their radiation characteristics are also included in the vocal-tract transfer function.

2.3 Convolution of the Source and the Filter

Humans are able to control phonation (source generation) and articulation (filtering process) largely independently. The speech sounds are therefore considered as the response of the vocal-tract filter, into which a sound source is fed. To model such source-filter systems for speech production, the sound source, or excitation signal xt, is often implemented as a periodic impulse train for voiced speech, while white noise is used as a source for unvoiced speech. If the vocal-tract configuration does not changed in time, the vocal-tract filter becomes a linear time-invariant (LTI) system, and the output signal yt can be expressed by a convolution of the input signal xt and the impulse response of the system ht as

(1) y t = h t x t ,

where the asterisk denotes the convolution. Equation (1), which is described in the time domain, can be also expressed in the frequency domain as

Y ω = H ω X ω . (2)

The frequency domain formula states that the speech spectrum Yω is modeled as a product of the source spectrum Xω and the spectrum of the vocal-tract filter Hω. The spectrum of the vocal-tract filter Hω is represented by the product of the vocal-tract transfer function Tω and the radiation characteristics from the mouth and the nose Rω, that is, Hω=TωRω.

There exist several ways to estimate the vocal-tract filter Hω. The most popular approach is the inverse filtering, in which autoregressive parameters are estimated from an acoustic speech signal by the method of least-squares (Atal & Schroeder, 1978; Markel & Gray, 2013). The transfer function can then be recovered from the estimated autoregressive parameters. In practice, however, the inverse-filtering is limited to non-nasalized or slightly nasalized vowels. An alternative approach is based upon the measurement of the vocal tract shape. For a human subject, a cross-sectional area of the vocal tract can be measured by X-ray photography or magnetic resonance imaging (MRI). Once the area function of the vocal tract is obtained, the corresponding transfer function can be computed by the so-called transmission line model, which assumes one-dimensional plane-wave propagation inside the vocal tract (Sondhi & Schroeter, 1987; Story et al., 1996).

Figure 3. (a) Vocal tract area function for a male speaker’s vowel [a]. (b) Transfer function calculated from the area function of (a). (c) Power spectrum of the source sound generated from Liljencrants-Fant model. (d) Power spectrum of the speech signal generated from the source-filter theory.

As an example to illustrate the source-filter modeling, a sound of vowel /a/ is synthesized in figure 3. The vocal tract area function of figure 3 (a) was measured from a male subject by the MRI (Story et al., 1996). By the transmission line model, the transfer function Hω is obtained as figure 3 (b). The first and the second formants are located at F1=805Hz and F2=1205. By the inverse Fourier transform, the impulse response of the vocal tract system ht is derived. As a glottal source sound, the Liljencrants-Fant synthesize model (Fant et al., 1985) is utilized. The fundamental frequency is set to fo=100Hz, which gives rise to a sharp peak in the power spectrum in figure 3 (c). Except for the peaks appearing at higher harmonics of fo, the spectral structure of the glottal source is rather flat. As shown in figure 3 (d), convolution of the source signal with the vocal tract filter amplifies the higher harmonics of fo located close to the formants.

Since the source-filter modeling captures essence of the speech production, it has been successfully applied to speech analysis, synthesis, and processing (Atal & Schroeder, 1978; Markel & Gray, 2013). It was Chiba and Kajiyama (1941) who first explained the mechanisms of speech production based on the concept of phonation (source) and articulation (filter). Their idea was combined with Fant’s filter theory (Fant, 1960), which led to the “source-filter theory of vowel production” in the studies of speech production.

So far, the source-filter modeling has been applied only to the glottal source, in which the vocal fold vibrations provide the main source sounds. There are other sound sources, such as the frication noise. In the frication noise, air turbulence is developed at constricted (or obstructed) parts of the airway. Such random source also excites the resonances of the vocal tract in a similar manner as the glottal source (Stevens, 1999, 2005). Its marked difference from the glottal source is that the filter property is determined by the vocal tract shape downstream from the constriction (or obstruction). For instance, if the constriction is at the lips, there exists no cavity downstream from the constriction, and therefore the acoustic source is radiated directly from the mouth opening with no filtering. When the constriction is upstream from the lips, the shape of the airway between the constriction and the lips determines the filter properties. It should be also noted that the turbulent source, generated at the constriction, depends sensitively on a three-dimensional geometry of the vocal tract. Therefore, the three-dimensional shape of the vocal tract (not the one-dimensional shape of the area function) should be taken into account to model the frication noise (Shadle, 1985, 1991).

3. Resonance Tuning

As an interesting application of the source-filter theory, “resonance tuning” (Sundberg, 1989) is illustrated. In female speech, the first and the second formants lie between 300 and 900 Hz and between 900 and 2,800 Hz, respectively. In soprano singing, the vocal pitch can reach to these two ranges. To increase the efficiency of the vocalization at high fo, a soprano singer adjusts the shape of the vocal tract to tune the first or second resonance (R1 or R2) to the fundamental frequency fo. When one of the harmonics of the fo coincides with a formant resonance, the resulting acoustic power (and musical success) is enhanced.

Figure 4. Resonance tuning. (a) The same transfer function as figure 3 (b). (b) Power spectrum of the source sound, whose fundamental frequency fo is tuned to the first resonance R1 of the vocal tract. (c) Power spectrum of the speech signal generated from the source-filter theory. (d) Dependence of the amplification rate (i.e., power ratio between the output speech and the input source) on the fundamental frequency fo.

Figure 4 shows an example of the resonance tuning, in which the fundamental frequency is tuned to the first resonance R1 of the vowel /a/ as fo=805Hz. As recognized in the output speech spectrum (figure 4 (c)), the vocal tract filter strongly amplifies the fundamental frequency component of the vocal source, while the other harmonics are attenuated. Since only a single frequency component is emphasized, the output speech sounds like a pure tone. Figure 4 (d) shows dependence of the amplification ratio (i.e., the power ratio between the output speech and the input source) on the fundamental frequency fo. Indeed, the power of the output speech is maximized at the resonance tuning point of fo=805Hz. Without losing the source power, loud voices can be produced with less effort from the singers and, moreover, they are well perceived in a large concert hall over the orchestra (Joliveau et al., 2004).

Despite the significant increase in loudness, comprehensibility is sacrificed. With a strong enhancement of the fundamental frequency fo, its higher harmonics are weakened considerably, making it difficult to perceive the formant structure (figure 4 (c)). This explains why it is difficult to identify words sung in the high range by sopranos.

The resonance tuning discussed here has been based on the linear convolution of the source and the filter, which are assumed to be independent from each other. In reality, however, the source and the filter interact with each other. Depending upon the acoustic property of the vocal tract, it facilitates the vocal fold oscillations and makes the vocal source stronger. Consequently, this source-filter interaction can make the output speech sound even louder in addition to the linear resonance effect. Such interaction will be explained in more detail in section 4.

It should be of interest to note that some animals such as songbirds and gibbons utilize the technique of resonance tuning in their vocalizations (Koda et al., 2012; Nowicki, 1987; Riede et al., 2006). It has been found through X-ray filming as well as via heliox experiments that these animals adjust the vocal tract resonance to track the fundamental frequency fo. This may facilitate the acoustic communication by increasing the loudness of their vocalization. Again, higher harmonic components, which are needed to emphasize the formants in human language communications, are suppressed. Whether the animals utilize formants information in their communications is under debate (Fitch, 2010; Lieberman, 1977) but, at least in this context, production of a loud sound is more advantageous for long-distance alarm calls and pure-tone singing of animals.

4. Source-Filter Interaction

The linear source–filter theory, under which speech is represented as a convolution of the source and the filter, is based upon the assumption that the vocal fold vibrations as well as the turbulent noise sources are only weakly influenced by the vocal tract. Such an assumption is, however, valid mostly for male adult speech. The actual process of speech production is nonlinear. The vocal fold oscillations are due to combined effects of pressure, airflow, tissue elasticity, and tissue collision. It is natural that such a complex system obeys nonlinear equations of motion. Aerodynamics inside the glottis and the vocal tract is also governed by nonlinear equations in a strict sense. Moreover, there exists a mutual interaction between the source and the filter (Flanagan, 1968; Lucero et al., 2012; Rothenberg, 1981; Titze, 2008; Titze & Alipour, 2006). First, the source sound, which is generated from the vocal folds, is influenced by the vocal tract, since the vocal tract determines pressure above the vocal folds to change the aerodynamics of the glottal flow. As described in section 2.3, the turbulent source is also very sensitive to the vocal tract geometry. Second, the source sound, which then propagates through the vocal tract, is not only radiated from the mouth but is also partially reflected back to the glottis through the vocal tract. Such reflection can influence the vocal fold oscillations, especially when the fundamental frequency or its harmonics is closely located to one of the vocal tract resonances, for instance, in singing. The strong acoustic feedback makes the interrelation between the source and the filter nonlinear and induces various voice instabilities, for example., sudden pitch jump, subharmonics, resonance, quenching, and chaos (Hatzikirou et al., 2006; Lucero et al., 2012; Migimatsu & Tokuda, 2019; Titze et al., 2008).

Figure 5. Example of a glissando singing. A male subject glided the fundamental frequency (fo) from 120 Hz to 350 Hz and then back. The first resonance (R1=270Hz) is indicated by a black bold line. The pitch jump occurred when fo crossed R1.

Figure 5 shows a spectrogram that demonstrates such pitch jump. The horizontal axis represents time, while the vertical axis represents spectral power of a singing voice. In this recording, a male singer glided his pitch in a certain frequency range. Accordingly, the fundamental frequency increases from 120 Hz to 350 Hz and then decreases back to 120 Hz. Around 270Hz, the fundamental frequency or its higher harmonics crosses one of the resonances of the vocal tract (black bold line of figure 5), and it jumps abruptly. At such frequency crossing point, acoustic reflection from the vocal tract to the vocal folds becomes very strong and non-negligible. The source-filter interaction has two aspects (Story et al., 2000). On one side, the vocal tract acoustics facilitates the vocal fold oscillations and contributes to the production of a loud vocal sound as discussed in the resonance tuning (section 3). On the other side, the vocal tract acoustics inhibits the vocal fold oscillations and consequently induces a voice instability. For instance, the vocal folds oscillation can stop suddenly or spontaneously jump to another fundamental frequency as exemplified by the glissando singing of figure 5. To avoid such voice instabilities, singers must weaken the level of the acoustic coupling, possibly by adjusting the epilarynx, whenever the frequency crossing takes place (Lucero et al., 2012; Titze et al., 2008).

5. Conclusions

Summarizing, the source-filter theory has been described as a basic framework to model human speech production. The source is generated from the vocal fold oscillations and/or the turbulent airflows developed above the glottis. The vocal tract functions as a filter to modify the spectral structure of the source sounds. This filtering mechanism has been explained in terms of the resonances of the acoustical tube. Independence between the source and the filter is vital for language-based acoustic communications in humans, which require flexible maneuvering of the vocal tract configuration to express various phonemes sequentially and smoothly (Fitch, 2010; Lieberman, 1977). As an application of the source-filter theory, resonance tuning is explained as a technique utilized by soprano singers and some animals. Finally, existence of the source-filter interaction has been described. It is inevitable that the source sound is aerodynamically influenced by the vocal tract, since they are closely located to each other. Moreover, acoustic pressure wave reflecting back from the vocal tract to the glottis influences the vocal fold oscillations and can induce various voice instabilities. The source-filter interaction may become strong when the fundamental frequency or its higher harmonics crosses one of the vocal tract resonances, for example, in singing.

Further Reading

  • Atal, B. S., & Schroeder, M. (1978). Linear prediction analysis of speech based on a pole-zero representation. The Journal of the Acoustical Society of America, 64(5), 1310–1318.
  • Chiba, T., & Kajiyama, M. (1941). The vowel: Its nature and structure. Tokyo, Japan: Kaiseikan.
  • Fant, G. (1960). Acoustic theory of speech production. The Hague, The Netherlands: Mouton.
  • Lieberman, P. (1977). Speech physiology and acoustic phonetics: An introduction. New York: Macmillan.
  • Markel, J. D., & Gray, A. J. (2013). Linear prediction of speech (Vol. 12). New York: Springer Science & Business Media.
  • Stevens, K. N. (1999). Acoustic phonetics. Cambridge, MA: MIT Press.
  • Sundberg, J. (1989). The science of singing voice. DeKalb, IL: Northern Illinois University Press.
  • Titze, I. R. (1994). Principles of voice production. Englewood Cliffs, NJ: Prentice Hall.
  • Titze, I. R., & Alipour, F. (2006). The myoelastic aerodynamic theory of phonation. Iowa, IA: National Center for Voice and Speech.

References

  • Atal, B. S., & Schroeder, M. (1978). Linear prediction analysis of speech based on a pole-zero representation. The Journal of the Acoustical Society of America, 64(5), 1310–1318.
  • Chiba, T., & Kajiyama, M. (1941). The vowel: Its nature and structure. Tokyo, Japan: Kaiseikan.
  • Fant, G. (1960). Acoustic theory of speech production. The Hague, The Netherlands: Mouton.
  • Fant, G., Liljencrants, J., & Lin, Q. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratory. Quarterly Progress and Status Report, 26(4), 1–13.
  • Fitch, W. T. (2010). The evolution of language. Cambridge, UK: Cambridge University Press.
  • Flanagan, J. L. (1968). Source-system interaction in the vocal tract. Annals of the New York Academy of Sciences, 155(1), 9–17.
  • Hatzikirou, H., Fitch, W. T., & Herzel, H. (2006). Voice instabilities due to source-tract interactions. Acta Acoustica United With Acoustica, 92, 468–475.
  • Joliveau, E., Smith, J., & Wolfe, J. (2004). Acoustics: Tuning of vocal tract resonance by sopranos. Nature, 427(6970), 116.
  • Koda, H., Nishimura, T., Tokuda, I. T., Oyakawa, C., Nihonmatsu, T., & Masataka, N. (2012). Soprano singing in gibbons. American Journal of Physical Anthropology, 149(3), 347–355.
  • Lieberman, P. (1977). Speech physiology and acoustic phonetics: An introduction. New York: Macmillan.
  • Lucero, J. C., Lourenço, K. G., Hermant, N., Van Hirtum, A., & Pelorson, X. (2012). Effect of source–tract acoustical coupling on the oscillation onset of the vocal folds. The Journal of the Acoustical Society of America, 132(1), 403–411.
  • Markel, J. D., & Gray, A. J. (2013). Linear prediction of speech (Vol. 12). New York: Springer Science & Business Media.
  • Migimatsu, K., & Tokuda, I. T. (2019). Experimental study on nonlinear source–filter interaction using synthetic vocal fold models. The Journal of the Acoustical Society of America, 146(2), 983–997.
  • Nowicki, S. (1987). Vocal tract resonances in oscine bird sound production: Evidence from birdsongs in a helium atmosphere. Nature, 325(6099), 53–55.
  • Riede, T., Suthers, R. A., Fletcher, N. H., & Blevins, W. E. (2006). Songbirds tune their vocal tract to the fundamental frequency of their song. Proceedings of the National Academy of Sciences, 103(14), 5543–5548.
  • Rothenberg, M. (1981). The voice source in singing. In J. Sundberg (Ed.), Research aspects on singing (pp. 15–33). Stockholm, Sweden: Royal Swedish Academy of Music.
  • Shadle, C. H. (1985). The acoustics of fricative consonants [Doctoral thesis]. Cambridge, MA: Massachusetts Institute of Technology, released as MIT-RLE Technical Report No. 506.
  • Shadle, C. H. (1991). The effect of geometry on source mechanisms of fricative consonants. Journal of Phonetics, 19(3–4), 409–424.
  • Sondhi, M., & Schroeter, J. (1987). A hybrid time-frequency domain articulatory speech synthesizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(7), 955–967.
  • Stevens, K. N. (1999). Acoustic phonetics. Cambridge, MA: MIT Press.
  • Stevens, K. N. (2005). The acoustic/articulatory interface. Acoustical Science and Technology, 26(5), 410–417.
  • Story, B. H., Laukkanen, A.M., & Titze, I. R. (2000). Acoustic impedance of an artificially lengthened and constricted vocal tract. Journal of Voice, 14(4), 455–469.
  • Story, B. H., Titze, I. R., & Hoffman, E. A. (1996). Vocal tract area functions from magnetic resonance imaging. The Journal of the Acoustical Society of America, 100(1), 537–554.
  • Sundberg, J. (1989). The science of singing voice. DeKlab, IL: Northern Illinois University Press.
  • Titze, I. R. (1994). Principles of voice production. Englewood Cliffs, NJ: Prentice Hall.
  • Titze, I. R. (2008). Nonlinear source–filter coupling in phonation: Theory. The Journal of the Acoustical Society of America, 123(4), 1902–1915.
  • Titze, I. R., & Alipour, F. (2006). The myoelastic aerodynamic theory of phonation. Iowa, IA: National Center for Voice and Speech.
  • Titze, I., Riede, T., & Popolo, P. (2008). Nonlinear source–filter coupling in phonation: Vocal exercises. The Journal of the Acoustical Society of America, 123(4), 1902–1915.