The Motor Theory of Speech Perception is a proposed explanation of the fundamental relationship between the way speech is produced and the way it is perceived. Associated primarily with the work of Liberman and colleagues, it posited the active participation of the motor system in the perception of speech. Early versions of the theory contained elements that later proved untenable, such as the expectation that the neural commands to the muscles (as seen in electromyography) would be more invariant than the acoustics. Support drawn from categorical perception (in which discrimination is quite poor within linguistic categories but excellent across boundaries) was called into question by studies showing means of improving within-category discrimination and finding similar results for nonspeech sounds and for animals perceiving speech. Evidence for motor involvement in perceptual processes nonetheless continued to accrue, and related motor theories have been proposed. Neurological and neuroimaging results have yielded a great deal of evidence consistent with variants of the theory, but they highlight the issue that there is no single “motor system,” and so different components appear in different contexts. Assigning the appropriate amount of effort to the various systems that interact to result in the perception of speech is an ongoing process, but it is clear that some of the systems will reflect the motor control of speech.
Article
The Motor Theory of Speech Perception
D. H. Whalen
Article
Language and Cognitive Aging
Lori E. James and Sara Anne Goring
The questions of whether and why language processes change in healthy aging require complicated answers. Although comprehension appears to be more stable across adulthood than does production, there is evidence for age-related changes and also for constancy within both input and output components of language. Further, these changes can be considered at various levels of the language hierarchy, such as sensory input, words, sentences, and discourse. As concluded in several other comprehensive reviews, older adults’ language production ability declines much more noticeably than does their comprehension, presumably because comprehension is able to benefit from contextual processing in a way that production cannot. Specifically, lexical and orthographic retrieval become more difficult during normal aging, and these changes appear to represent the most noticeable age-related declines in language production. Some theories of age-related decline focus on global deterioration of cognitive function, whereas other theories predict changes in specific processes related to language function. Both types of theories have received empirical support as applied to language performance, although additional theoretical development is still needed to capture the patterns of effects. Further, in order to truly understand how cognitive aging impacts the ability to understand and produce language, it is necessary to examine how age-related shifts in goals, expertise, and compensatory strategies influence language processes. There are important implications of research on language and cognitive aging, in that language can play a role in physical health and psychological well-being. In summary, our review of the existing literature on language and cognitive aging supports previous claims that language ability is asymmetrically impacted by age, with smaller overall effects of aging on comprehension than production processes.
Article
Dispersion Theory and Phonology
Edward Flemming
Dispersion Theory concerns the constraints that govern contrasts, the phonetic differences that can distinguish words in a language. Specifically it posits that there are distinctiveness constraints that favor contrasts that are more perceptually distinct over less distinct contrasts. The preference for distinct contrasts is hypothesized to follow from a preference to minimize perceptual confusion: In order to recover what a speaker is saying, a listener must identify the words in the utterance. The more confusable words are, the more likely a listener is to make errors. Because contrasts are the minimal permissible differences between words in a language, banning indistinct contrasts reduces the likelihood of misperception.
The term ‘dispersion’ refers to the separation of sounds in perceptual space that results from maximizing the perceptual distinctiveness of the contrasts between those sounds, and is adopted from Lindblom’s Theory of Adaptive Dispersion, a theory of phoneme inventories according to which inventories are selected so as to maximize the perceptual differences between phonemes. These proposals follow a long tradition of explaining cross-linguistic tendencies in the phonetic and phonological form of languages in terms of a preference for perceptually distinct contrasts.
Flemming proposes that distinctiveness constraints constitute one class of constraints in an Optimality Theoretic model of phonology. In this context, distinctiveness constraints predict several basic phenomena, the first of which is the preference for maximal dispersion in inventories of contrasting sounds that first motivated the development of the Theory of Adaptive Dispersion. But distinctiveness constraints are formulated as constraints on the surface forms of possible words that interact with other phonological constraints, so they evaluate the distinctiveness of contrasts in context. As a result, Dispersion Theory predicts that contrasts can be neutralized or enhanced in particular phonological contexts. This prediction arises because the phonetic realization of sounds depends on their context, so the perceptual differences between contrasting sounds also depend on context. If the realization of a contrast in a particular context would be insufficiently distinct (i.e., it would violate a high-ranked distinctiveness constraint), there are two options: the offending contrast can be neutralized, or it can be modified (‘enhanced’) to make it more distinct.
A basic open question regarding Dispersion Theory concerns the proper formulation of distinctiveness constraints and the extent of variation in their rankings across languages, issues that are tied up with the questions about the nature of perceptual distinctiveness. Another concerns the size and nature of the comparison set of contrasting word-forms required to be able to evaluate whether a candidate output satisfies distinctiveness constraints.
Article
Cochlear Implants
Matthew B. Winn and Peggy B. Nelson
Cochlear implants (CIs) are the most successful sensory implant in history, restoring the sensation of sound to thousands of persons who have severe to profound hearing loss. Implants do not recreate acoustic sound as most of us know it, but they instead convey a rough representation of the temporal envelope of signals. This sparse signal, derived from the envelopes of narrowband frequency filters, is sufficient for enabling speech understanding in quiet environments for those who lose hearing as adults and is enough for most children to develop spoken language skills. The variability between users is huge, however, and is only partially understood.
CIs provide acoustic information that is sufficient for the recognition of some aspects of spoken language, especially information that can be conveyed by temporal patterns, such as syllable timing, consonant voicing, and manner of articulation. They are insufficient for conveying pitch cues and separating speech from noise.
There is a great need for improving our understanding of functional outcomes of CI success beyond measuring percent correct for word and sentence recognitions. Moreover, greater understanding of the variability experienced by children, especially children and families from various social and cultural backgrounds, is of paramount importance. Future developments will no doubt expand the use of this remarkable device.
Article
Audiovisual Speech Perception and the McGurk Effect
Lawrence D. Rosenblum
Research on visual and audiovisual speech information has profoundly influenced the fields of psycholinguistics, perception psychology, and cognitive neuroscience. Visual speech findings have provided some of most the important human demonstrations of our new conception of the perceptual brain as being supremely multimodal. This “multisensory revolution” has seen a tremendous growth in research on how the senses integrate, cross-facilitate, and share their experience with one another.
The ubiquity and apparent automaticity of multisensory speech has led many theorists to propose that the speech brain is agnostic with regard to sense modality: it might not know or care from which modality speech information comes. Instead, the speech function may act to extract supramodal informational patterns that are common in form across energy streams. Alternatively, other theorists have argued that any common information existent across the modalities is minimal and rudimentary, so that multisensory perception largely depends on the observer’s associative experience between the streams. From this perspective, the auditory stream is typically considered primary for the speech brain, with visual speech simply appended to its processing. If the utility of multisensory speech is a consequence of a supramodal informational coherence, then cross-sensory “integration” may be primarily a consequence of the informational input itself. If true, then one would expect to see evidence for integration occurring early in the perceptual process, as well in a largely complete and automatic/impenetrable manner. Alternatively, if multisensory speech perception is based on associative experience between the modal streams, then no constraints on how completely or automatically the senses integrate are dictated. There is behavioral and neurophysiological research supporting both perspectives.
Much of this research is based on testing the well-known McGurk effect, in which audiovisual speech information is thought to integrate to the extent that visual information can affect what listeners report hearing. However, there is now good reason to believe that the McGurk effect is not a valid test of multisensory integration. For example, there are clear cases in which responses indicate that the effect fails, while other measures suggest that integration is actually occurring. By mistakenly conflating the McGurk effect with speech integration itself, interpretations of the completeness and automaticity of multisensory may be incorrect. Future research should use more sensitive behavioral and neurophysiological measures of cross-modal influence to examine these issues.
Article
Speech Perception and Generalization Across Talkers and Accents
Kodi Weatherholtz and T. Florian Jaeger
The seeming ease with which we usually understand each other belies the complexity of the processes that underlie speech perception. One of the biggest computational challenges is that different talkers realize the same speech categories (e.g., /p/) in physically different ways. We review the mixture of processes that enable robust speech understanding across talkers despite this lack of invariance. These processes range from automatic pre-speech adjustments of the distribution of energy over acoustic frequencies (normalization) to implicit statistical learning of talker-specific properties (adaptation, perceptual recalibration) to the generalization of these patterns across groups of talkers (e.g., gender differences).
Article
Direct Perception of Speech
Carol A. Fowler
The theory of speech perception as direct derives from a general direct-realist account of perception. A realist stance on perception is that perceiving enables occupants of an ecological niche to know its component layouts, objects, animals, and events. “Direct” perception means that perceivers are in unmediated contact with their niche (mediated neither by internally generated representations of the environment nor by inferences made on the basis of fragmentary input to the perceptual systems). Direct perception is possible because energy arrays that have been causally structured by niche components and that are available to perceivers specify (i.e., stand in 1:1 relation to) components of the niche. Typically, perception is multi-modal; that is, perception of the environment depends on specifying information present in, or even spanning, multiple energy arrays.
Applied to speech perception, the theory begins with the observation that speech perception involves the same perceptual systems that, in a direct-realist theory, enable direct perception of the environment. Most notably, the auditory system supports speech perception, but also the visual system, and sometimes other perceptual systems. Perception of language forms (consonants, vowels, word forms) can be direct if the forms lawfully cause specifying patterning in the energy arrays available to perceivers. In Articulatory Phonology, the primitive language forms (constituting consonants and vowels) are linguistically significant gestures of the vocal tract, which cause patterning in air and on the face. Descriptions are provided of informational patterning in acoustic and other energy arrays. Evidence is next reviewed that speech perceivers make use of acoustic and cross modal information about the phonetic gestures constituting consonants and vowels to perceive the gestures.
Significant problems arise for the viability of a theory of direct perception of speech. One is the “inverse problem,” the difficulty of recovering vocal tract shapes or actions from acoustic input. Two other problems arise because speakers coarticulate when they speak. That is, they temporally overlap production of serially nearby consonants and vowels so that there are no discrete segments in the acoustic signal corresponding to the discrete consonants and vowels that talkers intend to convey (the “segmentation problem”), and there is massive context-sensitivity in acoustic (and optical and other modalities) patterning (the “invariance problem”). The present article suggests solutions to these problems.
The article also reviews signatures of a direct mode of speech perception, including that perceivers use cross-modal speech information when it is available and exhibit various indications of perception-production linkages, such as rapid imitation and a disposition to converge in dialect with interlocutors.
An underdeveloped domain within the theory concerns the very important role of longer- and shorter-term learning in speech perception. Infants develop language-specific modes of attention to acoustic speech signals (and optical information for speech), and adult listeners attune to novel dialects or foreign accents. Moreover, listeners make use of lexical knowledge and statistical properties of the language in speech perception. Some progress has been made in incorporating infant learning into a theory of direct perception of speech, but much less progress has been made in the other areas.
Article
The Psychology of Hearing Loss
Christopher J. Plack and Hannah H. Guest
The psychology of hearing loss brings together many different subdisciplines of psychology, including neurophysiology, perception, cognition, and mental health. Hearing loss is defined clinically in terms of pure-tone audiometric thresholds: the lowest sound pressure levels that an individual can detect when listening for pure tones at various frequencies. Audiometric thresholds can be elevated by damage to the sensitive hair cells of the cochlea (the hearing part of the inner ear) caused by aging, ototoxic drugs, noise exposure, or disease. This damage can also cause reductions in frequency selectivity (the ability of the ear to separate out the different frequency components of sounds) and abnormally rapid growth of loudness with sound level. However, hearing loss is a heterogeneous condition and audiometric thresholds are relatively insensitive to many of the disorders that affect real-world listening ability. Hair cell loss and damage to the auditory nerve can occur before audiometric thresholds are affected. Dysfunction of neurons in the auditory brainstem as a consequence of aging is associated with deficits in processing the rapid temporal fluctuations in sounds, causing difficulties in sound localization and in speech and music perception. The impact of hearing loss on an individual can be profound and includes problems in communication (particularly in noisy environments), social isolation, and depression. Hearing loss may also be an important contributor to age-related cognitive decline and dementia.
Article
Mirror Neurons, Empathy, and the Other
Marco Iacoboni
Empathy is the ability to understand and share the feelings of other people. It extends also to the ability to understand and share the feelings of animals and fictional characters. Empathy is essential to properly function in social interactions. It is also typically higher for people who belong to one’s own social group and lower for people who belong to a different social group. Lack of empathy is associated with severe mental health conditions including psychopathy, narcissism, and antisocial personality disorder.
Empathy is a complex, flexible, adaptive, and nuanced function for navigating social settings that involves the interplay of multiple neural systems. A crucial neural system for empathy is the mirror neuron system, formed by cells with a variety of properties and the shared feature of being activated during the actions of the self and the perception of actions of other people. The mirroring of the actions of other people in one’s brain allows an understanding from within of the other’s intentions, motivation, and feelings.