1-10 of 10 Results

  • Keywords: speech production x
Clear all

Article

Formants  

Daniel Aalto, Jarmo Malinen, and Martti Vainio

Formant frequencies are the positions of the local maxima of the power spectral envelope of a sound signal. They arise from acoustic resonances of the vocal tract air column, and they provide substantial information about both consonants and vowels. In running speech, formants are crucial in signaling the movements with respect to place of articulation. Formants are normally defined as accumulations of acoustic energy estimated from the spectral envelope of a signal. However, not all such peaks can be related to resonances in the vocal tract, as they can be caused by the acoustic properties of the environment outside the vocal tract, and sometimes resonances are not seen in the spectrum. Such formants are called spurious and latent, respectively. By analogy, spectral maxima of synthesized speech are called formants, although they arise from a digital filter. Conversely, speech processing algorithms can detect formants in natural or synthetic speech by modeling its power spectral envelope using a digital filter. Such detection is most successful for male speech with a low fundamental frequency where many harmonic overtones excite each of the vocal tract resonances that lie at higher frequencies. For the same reason, reliable formant detection from females with high pitch or children’s speech is inherently difficult, and many algorithms fail to faithfully detect the formants corresponding to the lowest vocal tract resonant frequencies.

Article

Psycholinguistic Approaches to Morphology: Production  

Benjamin V. Tucker

Speech production is an important aspect of linguistic competence. An attempt to understand linguistic morphology without speech production would be incomplete. A central research question develops from this perspective: what is the role of morphology in speech production. Speech production researchers collect many different types of data and much of that data has informed how linguists and psycholinguists characterize the role of linguistic morphology in speech production. Models of speech production play an important role in the investigation of linguistic morphology. These models provide a framework, which allows researchers to explore the role of morphology in speech production. However, models of speech production generally focus on different aspects of the production process. These models are split between phonetic models (which attempt to understand how the brain creates motor commands for uttering and articulating speech) and psycholinguistic models (which attempt to understand the cognitive processes and representation of the production process). Models that merge these two model types, phonetic and psycholinguistic models, have the potential to allow researchers the possibility to make specific predictions about the effects of morphology on speech production. Many studies have explored models of speech production, but the investigation of the role of morphology and how morphological properties may be represented in merged speech production models is limited.

Article

Sociophonetics  

Gerard Docherty

Sociophonetics research is located at the interface of sociolinguistics and experimental phonetics. Its primary focus is to shed new light on the social-indexical phonetic properties of speech, revealing a wide range of phonetic parameters that map systematically to social factors relevant to speakers and listeners, and the fact that many of these involve particularly fine-grained control of both spatial and temporal dimensions of speech production. Recent methodological developments in acoustic and articulatory methods have yielded new insights into the nature of sociophonetic variation at the scale of entire speech communities as well as in respect of the detailed speech production patterns of individual speakers. The key theoretical dimension of sociophonetic research is to consider how models of speech production, processing, and acquisition should be informed by rapidly increasing knowledge of the ubiquity of social-indexical phonetic variation carried by the speech signal. In particular, this work is focused on inferring from the performance of speakers and listeners how social-indexical phonetic properties are interwoven into phonological representation alongside those properties associated with the transmission and interpretation of lexical-propositional information.

Article

The Motor Theory of Speech Perception  

D. H. Whalen

The Motor Theory of Speech Perception is a proposed explanation of the fundamental relationship between the way speech is produced and the way it is perceived. Associated primarily with the work of Liberman and colleagues, it posited the active participation of the motor system in the perception of speech. Early versions of the theory contained elements that later proved untenable, such as the expectation that the neural commands to the muscles (as seen in electromyography) would be more invariant than the acoustics. Support drawn from categorical perception (in which discrimination is quite poor within linguistic categories but excellent across boundaries) was called into question by studies showing means of improving within-category discrimination and finding similar results for nonspeech sounds and for animals perceiving speech. Evidence for motor involvement in perceptual processes nonetheless continued to accrue, and related motor theories have been proposed. Neurological and neuroimaging results have yielded a great deal of evidence consistent with variants of the theory, but they highlight the issue that there is no single “motor system,” and so different components appear in different contexts. Assigning the appropriate amount of effort to the various systems that interact to result in the perception of speech is an ongoing process, but it is clear that some of the systems will reflect the motor control of speech.

Article

Second Language Phonetics  

Ocke-Schwen Bohn

The study of second language phonetics is concerned with three broad and overlapping research areas: the characteristics of second language speech production and perception, the consequences of perceiving and producing nonnative speech sounds with a foreign accent, and the causes and factors that shape second language phonetics. Second language learners and bilinguals typically produce and perceive the sounds of a nonnative language in ways that are different from native speakers. These deviations from native norms can be attributed largely, but not exclusively, to the phonetic system of the native language. Non-nativelike speech perception and production may have both social consequences (e.g., stereotyping) and linguistic–communicative consequences (e.g., reduced intelligibility). Research on second language phonetics over the past ca. 30 years has resulted in a fairly good understanding of causes of nonnative speech production and perception, and these insights have to a large extent been driven by tests of the predictions of models of second language speech learning and of cross-language speech perception. It is generally accepted that the characteristics of second language speech are predominantly due to how second language learners map the sounds of the nonnative to the native language. This mapping cannot be entirely predicted from theoretical or acoustic comparisons of the sound systems of the languages involved, but has to be determined empirically through tests of perceptual assimilation. The most influential learner factors which shape how a second language is perceived and produced are the age of learning and the amount and quality of exposure to the second language. A very important and far-reaching finding from research on second language phonetics is that age effects are not due to neurological maturation which could result in the attrition of phonetic learning ability, but to the way phonetic categories develop as a function of experience with surrounding sound systems.

Article

Phonetics  

D. H. Whalen

Phonetics is the branch of linguistics that deals with the physical realization of meaningful distinctions in spoken language. Phoneticians study the anatomy and physics of sound generation, acoustic properties of the sounds of the world’s languages, the features of the signal that listeners use to perceive the message, and the brain mechanisms involved in both production and perception. Therefore, phonetics connects most directly to phonology and psycholinguistics, but it also engages a range of disciplines that are not unique to linguistics, including acoustics, physiology, biomechanics, hearing, evolution, and many others. Early theorists assumed that phonetic implementation of phonological features was universal, but it has become clear that languages differ in their phonetic spaces for phonological elements, with systematic differences in acoustics and articulation. Such language-specific details place phonetics solidly in the domain of linguistics; any complete description of a language must include its specific phonetic realization patterns. The description of what phonetic realizations are possible in human language continues to expand as more languages are described; many of the under-documented languages are endangered, lending urgency to the phonetic study of the world’s languages. Phonetic analysis can consist of transcription, acoustic analysis, measurement of speech articulators, and perceptual tests, with recent advances in brain imaging adding detail at the level of neural control and processing. Because of its dual nature as a component of a linguistic system and a set of actions in the physical world, phonetics has connections to many other branches of linguistics, including not only phonology but syntax, semantics, sociolinguistics, and clinical linguistics as well. Speech perception has been shown to integrate information from both vision and tactile sensation, indicating an embodied system. Sign language, though primarily visual, has adopted the term “phonetics” to represent the realization component, highlighting the linguistic nature both of phonetics and of sign language. Such diversity offers many avenues for studying phonetics, but it presents challenges to forming a comprehensive account of any language’s phonetic system.

Article

Acquisition of L1 Phonology in the Romance Languages  

Yvan Rose, Laetitia Almeida, and Maria João Freitas

The field of study on the acquisition of phonological productive abilities by first-language learners in the Romance languages has been largely focused on three main languages: French, Portuguese, and Spanish, including various dialects of these languages spoken in Europe as well as in the Americas. In this article, we provide a comparative survey of this literature, with an emphasis on representational phonology. We also include in our discussion observations from the development of Catalan and Italian, and mention areas where these languages, as well as Romanian, another major Romance language, would provide welcome additions to our cross-linguistic comparisons. Together, the various studies we summarize reveal intricate patterns of development, in particular concerning the acquisition of consonants across different positions within the syllable, the word, and in relation to stress, documented from both monolingual and bilingual first-language learners can be found. The patterns observed across the different languages and dialects can generally be traced to formal properties of phone distributions, as entailed by mainstream theories of phonological representation, with variations also predicted by more functional aspects of speech, including phonetic factors and usage frequency. These results call for further empirical studies of phonological development, in particular concerning Romanian, in addition to Catalan and Italian, whose phonological and phonetic properties offer compelling grounds for the formulation and testing of models of phonology and phonological development.

Article

Articulatory Phonology  

Marianne Pouplier

One of the most fundamental problems in research on spoken language is to understand how the categorical, systemic knowledge that speakers have in the form of a phonological grammar maps onto the continuous, high-dimensional physical speech act that transmits the linguistic message. The invariant units of phonological analysis have no invariant analogue in the signal—any given phoneme can manifest itself in many possible variants, depending on context, speech rate, utterance position and the like, and the acoustic cues for a given phoneme are spread out over time across multiple linguistic units. Speakers and listeners are highly knowledgeable about the lawfully structured variation in the signal and they skillfully exploit articulatory and acoustic trading relations when speaking and perceiving. For the scientific description of spoken language understanding this association between abstract, discrete categories and continuous speech dynamics remains a formidable challenge. Articulatory Phonology and the associated Task Dynamic model present one particular proposal on how to step up to this challenge using the mathematics of dynamical systems with the central insight being that spoken language is fundamentally based on the production and perception of linguistically defined patterns of motion. In Articulatory Phonology, primitive units of phonological representation are called gestures. Gestures are defined based on linear second order differential equations, giving them inherent spatial and temporal specifications. Gestures control the vocal tract at a macroscopic level, harnessing the many degrees of freedom in the vocal tract into low-dimensional control units. Phonology, in this model, thus directly governs the spatial and temporal orchestration of vocal tract actions.

Article

The Source–Filter Theory of Speech  

Isao Tokuda

In the source-filter theory, the mechanism of speech production is described as a two-stage process: (a) The air flow coming from the lungs induces tissue vibrations of the vocal folds (i.e., two small muscular folds located in the larynx) and generates the “source” sound. Turbulent airflows are also created at the glottis or at the vocal tract to generate noisy sound sources. (b) Spectral structures of these source sounds are shaped by the vocal tract “filter.” Through the filtering process, frequency components corresponding to the vocal tract resonances are amplified, while the other frequency components are diminished. The source sound mainly characterizes the vocal pitch (i.e., fundamental frequency), while the filter forms the timbre. The source-filter theory provides a very accurate description of normal speech production and has been applied successfully to speech analysis, synthesis, and processing. Separate control of the source (phonation) and the filter (articulation) is advantageous for acoustic communications, especially for human language, which requires expression of various phonemes realized by a flexible maneuver of the vocal tract configuration. Based on this idea, the articulatory phonetics focuses on the positions of the vocal organs to describe the produced speech sounds. The source-filter theory elucidates the mechanism of “resonance tuning,” that is, a specialized way of singing. To increase efficiency of the vocalization, soprano singers adjust the vocal tract filter to tune one of the resonances to the vocal pitch. Consequently, the main source sound is strongly amplified to produce a loud voice, which is well perceived in a large concert hall over the orchestra. It should be noted that the source–filter theory is based upon the assumption that the source and the filter are independent from each other. Under certain conditions, the source and the filter interact with each other. The source sound is influenced by the vocal tract geometry and by the acoustic feedback from the vocal tract. Such source–filter interaction induces various voice instabilities, for example, sudden pitch jump, subharmonics, resonance, quenching, and chaos.

Article

Discriminative Learning and the Lexicon: NDL and LDL  

Yu-Ying Chuang and R. Harald Baayen

Naive discriminative learning (NDL) and linear discriminative learning (LDL) are simple computational algorithms for lexical learning and lexical processing. Both NDL and LDL assume that learning is discriminative, driven by prediction error, and that it is this error that calibrates the association strength between input and output representations. Both words’ forms and their meanings are represented by numeric vectors, and mappings between forms and meanings are set up. For comprehension, form vectors predict meaning vectors. For production, meaning vectors map onto form vectors. These mappings can be learned incrementally, approximating how children learn the words of their language. Alternatively, optimal mappings representing the end state of learning can be estimated. The NDL and LDL algorithms are incorporated in a computational theory of the mental lexicon, the ‘discriminative lexicon’. The model shows good performance both with respect to production and comprehension accuracy, and for predicting aspects of lexical processing, including morphological processing, across a wide range of experiments. Since, mathematically, NDL and LDL implement multivariate multiple regression, the ‘discriminative lexicon’ provides a cognitively motivated statistical modeling approach to lexical processing.