The seeming ease with which we usually understand each other belies the complexity of the processes that underlie speech perception. One of the biggest computational challenges is that different talkers realize the same speech categories (e.g., /p/) in physically different ways. We review the mixture of processes that enable robust speech understanding across talkers despite this lack of invariance. These processes range from automatic pre-speech adjustments of the distribution of energy over acoustic frequencies (normalization) to implicit statistical learning of talker-specific properties (adaptation, perceptual recalibration) to the generalization of these patterns across groups of talkers (e.g., gender differences).
Kodi Weatherholtz and T. Florian Jaeger
Melissa Redford and Melissa Baese-Berk
Acoustic theories assume that speech perception begins with an acoustic signal transformed by auditory processing. In classical acoustic theory, this assumption entails perceptual primitives that are akin to those identified in the spectral analyses of speech. The research objective is to link these primitives with phonological units of traditional descriptive linguistics via sound categories and then to understand how these units/categories are bound together in time to recognize words. Achieving this objective is challenging because the signal is replete with variation, making the mapping of signal to sound category nontrivial. Research that grapples with the mapping problem has led to many basic findings about speech perception, including the importance of cue redundancy to category identification and of differential cue weighting to category formation. Research that grapples with the related problem of binding categories into words for speech processing motivates current neuropsychological work on speech perception. The central focus on the mapping problem in classical theory has also led to an alternative type of acoustic theory, namely, exemplar-based theory. According to this type of acoustic theory, variability is critical for processing talker-specific information during speech processing. The problems associated with mapping acoustic cues to sound categories is not addressed because exemplar-based theories assume that perceptual traces of whole words are perceptual primitives. Smaller units of speech sound representation, as well as the phonology as a whole, are emergent from the word-based representations. Yet, like classical acoustic theories, exemplar-based theories assume that production is mediated by a phonology that has no inherent motor information. The presumed disconnect between acoustic and motor information during perceptual processing distinguishes acoustic theories as a class from other theories of speech perception.
The study of second language phonetics is concerned with three broad and overlapping research areas: the characteristics of second language speech production and perception, the consequences of perceiving and producing nonnative speech sounds with a foreign accent, and the causes and factors that shape second language phonetics. Second language learners and bilinguals typically produce and perceive the sounds of a nonnative language in ways that are different from native speakers. These deviations from native norms can be attributed largely, but not exclusively, to the phonetic system of the native language. Non-nativelike speech perception and production may have both social consequences (e.g., stereotyping) and linguistic–communicative consequences (e.g., reduced intelligibility). Research on second language phonetics over the past ca. 30 years has resulted in a fairly good understanding of causes of nonnative speech production and perception, and these insights have to a large extent been driven by tests of the predictions of models of second language speech learning and of cross-language speech perception. It is generally accepted that the characteristics of second language speech are predominantly due to how second language learners map the sounds of the nonnative to the native language. This mapping cannot be entirely predicted from theoretical or acoustic comparisons of the sound systems of the languages involved, but has to be determined empirically through tests of perceptual assimilation. The most influential learner factors which shape how a second language is perceived and produced are the age of learning and the amount and quality of exposure to the second language. A very important and far-reaching finding from research on second language phonetics is that age effects are not due to neurological maturation which could result in the attrition of phonetic learning ability, but to the way phonetic categories develop as a function of experience with surrounding sound systems.