Sign language phonology is the abstract grammatical component where primitive structural units are combined to create an infinite number of meaningful utterances. Although the notion of phonology is traditionally based on sound systems, phonology also includes the equivalent component of the grammar in sign languages, because it is tied to the grammatical organization, and not to particular content. This definition of phonology helps us see that the term covers all phenomena organized by constituents such as the syllable, the phonological word, and the higher-level prosodic units, as well as the structural primitives such as features, timing units, and autosegmental tiers, and it does not matter if the content is vocal or manual. Therefore, the units of sign language phonology and their phonotactics provide opportunities to observe the interaction between phonology and other components of the grammar in a different communication channel, or modality. This comparison allows us to better understand how the modality of a language influences its phonological system.
Diane Brentari, Jordan Fenlon, and Kearsy Cormier
The tongue is composed entirely of soft tissue: muscle, fat, and connective tissue. This unusual composition and the tongue’s 3D muscle fiber orientation result in many degrees of freedom. The lack of bones and cartilage means that muscle shortening creates deformations, particularly local deformations, as the tongue moves into and out of speech gestures. The tongue is also surrounded by the hard structures of the oral cavity, which both constrain its motion and support the rapid small deformations that create speech sounds. Anatomical descriptors and categories of tongue muscles do not correlate with tongue function as speech movements use finely controlled co-contractions of antagonist muscles to move the oral structures during speech. Tongue muscle volume indicates that four muscles, the genioglossus, verticalis, transversus, and superior longitudinal, occupy the bulk of the tongue. They also comprise a functional muscle grouping that can shorten the tongue in the x, y, and z directions. Various 3D muscle shortening patterns produce large- or small-scale deformations in all directions of motion. The interdigitation of the tongue’s muscles is advantageous in allowing co-contraction of antagonist muscles and providing nimble deformational changes to move the tongue toward and away from any position.
Daniel Aalto, Jarmo Malinen, and Martti Vainio
Formant frequencies are the positions of the local maxima of the power spectral envelope of a sound signal. They arise from acoustic resonances of the vocal tract air column, and they provide substantial information about both consonants and vowels. In running speech, formants are crucial in signaling the movements with respect to place of articulation. Formants are normally defined as accumulations of acoustic energy estimated from the spectral envelope of a signal. However, not all such peaks can be related to resonances in the vocal tract, as they can be caused by the acoustic properties of the environment outside the vocal tract, and sometimes resonances are not seen in the spectrum. Such formants are called spurious and latent, respectively. By analogy, spectral maxima of synthesized speech are called formants, although they arise from a digital filter. Conversely, speech processing algorithms can detect formants in natural or synthetic speech by modeling its power spectral envelope using a digital filter. Such detection is most successful for male speech with a low fundamental frequency where many harmonic overtones excite each of the vocal tract resonances that lie at higher frequencies. For the same reason, reliable formant detection from females with high pitch or children’s speech is inherently difficult, and many algorithms fail to faithfully detect the formants corresponding to the lowest vocal tract resonant frequencies.
Child phonology refers to virtually every phonetic and phonological phenomenon observable in the speech productions of children, including babbles. This includes qualitative and quantitative aspects of babbled utterances as well as all behaviors such as the deletion or modification of the sounds and syllables contained in the adult (target) forms that the child is trying to reproduce in his or her spoken utterances. This research is also increasingly concerned with issues in speech perception, a field of investigation that has traditionally followed its own course; it is only recently that the two fields have started to converge. The recent history of research on child phonology, the theoretical approaches and debates surrounding it, as well as the research methods and resources that have been employed to address these issues empirically, parallel the evolution of phonology, phonetics, and psycholinguistics as general fields of investigation. Child phonology contributes important observations, often organized in terms of developmental time periods, which can extend from the child’s earliest babbles to the stage when he or she masters the sounds, sound combinations, and suprasegmental properties of the ambient (target) language. Central debates within the field of child phonology concern the nature and origins of phonological representations as well as the ways in which they are acquired by children. Since the mid-1900s, the most central approaches to these questions have tended to fall on each side of the general divide between generative vs. functionalist (usage-based) approaches to phonology. Traditionally, generative approaches have embraced a universal stance on phonological primitives and their organization within hierarchical phonological representations, assumed to be innately available as part of the human language faculty. In contrast to this, functionalist approaches have utilized flatter (non-hierarchical) representational models and rejected nativist claims about the origin of phonological constructs. Since the beginning of the 1990s, this divide has been blurred significantly, both through the elaboration of constraint-based frameworks that incorporate phonetic evidence, from both speech perception and production, as part of accounts of phonological patterning, and through the formulation of emergentist approaches to phonological representation. Within this context, while controversies remain concerning the nature of phonological representations, debates are fueled by new outlooks on factors that might affect their emergence, including the types of learning mechanisms involved, the nature of the evidence available to the learner (e.g., perceptual, articulatory, and distributional), as well as the extent to which the learner can abstract away from this evidence. In parallel, recent advances in computer-assisted research methods and data availability, especially within the context of the PhonBank project, offer researchers unprecedented support for large-scale investigations of child language corpora. This combination of theoretical and methodological advances provides new and fertile grounds for research on child phonology and related implications for phonological theory.
A phonological inventory is a repertoire of contrastive articulatory or manual gestures shared by a community of users. Whether spoken or signed, all human languages have a phonological inventory. In spoken languages, the phonological inventory is comprised of a set of segments (consonants and vowels) and suprasegmentals (stress and intonation) that are linguistically contrastive, either lexically or grammatically, in a particular language or one of its dialects. Sign languages also have phonological inventories, which include a set of linguistically contrastive signs made from movement, hand shape, and location. The study of phonological inventories is interesting because they tell us about the distribution, frequency, and diversity of gestures that individuals acquire and produce in the world’s 7,000 or so languages. Their study has also raised important empirical questions about the comparability of linguistic concepts across different languages and modalities, in the use of statistics and sampling in quantitative approaches to comparative linguistics, and in the study of language ontogeny and phylogeny over the course of language evolution. As such, some recent research highlights include the following: quantitative approaches suggest causal relationships between phonological inventory composition and gene-culture and language-environment coevolution; the study of de novo sign languages provides important insights into the emergence of phonology; and comparative animal communication studies suggest evolutionary speech precursors in phonological repertoires of nonhuman primates, and potentially in extinct hominids including Neanderthal.
Susan Rvachew and Abdulsalam Alhaidary
Babbling is made up of meaningless speechlike syllables called canonical syllables. Canonical syllables are characterized by the coordination of consonantal and vocalic elements in syllables that have speechlike timing, phonation, and resonance characteristics. Infants begin to babble on average at approximately seven months of age. Babbling continues in parallel with less mature noncanonical vocalizations that make up the majority of utterances through the first year. Babbling also continues in parallel with the emergence of meaningful speech during the second year. Regardless of the language that the infant is learning, most canonical syllables have a CV shape with the consonant being a labial or alveolar stop or nasal and the vowel most likely to be central or low- to mid-front in place (e.g., [bʌ], [da], [mæ]). Approximately 15% of canonical utterances consist of multisyllable strings; in other words, most babbled utterances contain only a single CV syllable. The onset of the canonical babbling stage is crucially dependent upon normal hearing, permitting access to language input and feedback of self-produced speech. Many studies have reported differences in the phonetic and acoustic characteristics of babble produced by infants learning different languages. These differences include the frequency with which certain consonants are produced, the location, size, and shape of the vowel space, and the rhythmic and intonation qualities of multisyllable babbles, in each case reflecting specificities of the input language. However, replications of these findings are rare and further research is required to better understand the learning mechanisms that underlie language specific acquisition of articulatory representations during the prelinguistic stage of vocal development.
D. H. Whalen
Phonetics is the branch of linguistics that deals with the physical realization of meaningful distinctions in spoken language. Phoneticians study the anatomy and physics of sound generation, acoustic properties of the sounds of the world’s languages, the features of the signal that listeners use to perceive the message, and the brain mechanisms involved in both production and perception. Therefore, phonetics connects most directly to phonology and psycholinguistics, but it also engages a range of disciplines that are not unique to linguistics, including acoustics, physiology, biomechanics, hearing, evolution, and many others. Early theorists assumed that phonetic implementation of phonological features was universal, but it has become clear that languages differ in their phonetic spaces for phonological elements, with systematic differences in acoustics and articulation. Such language-specific details place phonetics solidly in the domain of linguistics; any complete description of a language must include its specific phonetic realization patterns. The description of what phonetic realizations are possible in human language continues to expand as more languages are described; many of the under-documented languages are endangered, lending urgency to the phonetic study of the world’s languages. Phonetic analysis can consist of transcription, acoustic analysis, measurement of speech articulators, and perceptual tests, with recent advances in brain imaging adding detail at the level of neural control and processing. Because of its dual nature as a component of a linguistic system and a set of actions in the physical world, phonetics has connections to many other branches of linguistics, including not only phonology but syntax, semantics, sociolinguistics, and clinical linguistics as well. Speech perception has been shown to integrate information from both vision and tactile sensation, indicating an embodied system. Sign language, though primarily visual, has adopted the term “phonetics” to represent the realization component, highlighting the linguistic nature both of phonetics and of sign language. Such diversity offers many avenues for studying phonetics, but it presents challenges to forming a comprehensive account of any language’s phonetic system.
Corpus Phonology is an approach to phonology that places corpora at the center of phonological research. Some practitioners of corpus phonology see corpora as the only object of investigation; others use corpora alongside other available techniques (for instance, intuitions, psycholinguistic and neurolinguistic experimentation, laboratory phonology, the study of the acquisition of phonology or of language pathology, etc.). Whatever version of corpus phonology one advocates, corpora have become part and parcel of the modern research environment, and their construction and exploitation has been modified by the multidisciplinary advances made within various fields. Indeed, for the study of spoken usage, the term ‘corpus’ should nowadays only be applied to bodies of data meeting certain technical requirements, even if corpora of spoken usage are by no means new and coincide with the birth of recording techniques. It is therefore essential to understand what criteria must be met by a modern corpus (quality of recordings, diversity of speech situations, ethical guidelines, time-alignment with transcriptions and annotations, etc.) and what tools are available to researchers. Once these requirements are met, the way is open to varying and possibly conflicting uses of spoken corpora by phonological practitioners. A traditional stance in theoretical phonology sees the data as a degenerate version of a more abstract underlying system, but more and more researchers within various frameworks (e.g., usage-based approaches, exemplar models, stochastic Optimality Theory, sociophonetics) are constructing models that tightly bind phonological competence to language use, rely heavily on quantitative information, and attempt to account for intra-speaker and inter-speaker variation. This renders corpora essential to phonological research and not a mere adjunct to the phonological description of the languages of the world.
One of the most fundamental problems in research on spoken language is to understand how the categorical, systemic knowledge that speakers have in the form of a phonological grammar maps onto the continuous, high-dimensional physical speech act that transmits the linguistic message. The invariant units of phonological analysis have no invariant analogue in the signal—any given phoneme can manifest itself in many possible variants, depending on context, speech rate, utterance position and the like, and the acoustic cues for a given phoneme are spread out over time across multiple linguistic units. Speakers and listeners are highly knowledgeable about the lawfully structured variation in the signal and they skillfully exploit articulatory and acoustic trading relations when speaking and perceiving. For the scientific description of spoken language understanding this association between abstract, discrete categories and continuous speech dynamics remains a formidable challenge. Articulatory Phonology and the associated Task Dynamic model present one particular proposal on how to step up to this challenge using the mathematics of dynamical systems with the central insight being that spoken language is fundamentally based on the production and perception of linguistically defined patterns of motion. In Articulatory Phonology, primitive units of phonological representation are called gestures. Gestures are defined based on linear second order differential equations, giving them inherent spatial and temporal specifications. Gestures control the vocal tract at a macroscopic level, harnessing the many degrees of freedom in the vocal tract into low-dimensional control units. Phonology, in this model, thus directly governs the spatial and temporal orchestration of vocal tract actions.