The Motor Theory of Speech Perception is a proposed explanation of the fundamental relationship between the way speech is produced and the way it is perceived. Associated primarily with the work of Liberman and colleagues, it posited the active participation of the motor system in the perception of speech. Early versions of the theory contained elements that later proved untenable, such as the expectation that the neural commands to the muscles (as seen in electromyography) would be more invariant than the acoustics. Support drawn from categorical perception (in which discrimination is quite poor within linguistic categories but excellent across boundaries) was called into question by studies showing means of improving within-category discrimination and finding similar results for nonspeech sounds and for animals perceiving speech. Evidence for motor involvement in perceptual processes nonetheless continued to accrue, and related motor theories have been proposed. Neurological and neuroimaging results have yielded a great deal of evidence consistent with variants of the theory, but they highlight the issue that there is no single “motor system,” and so different components appear in different contexts. Assigning the appropriate amount of effort to the various systems that interact to result in the perception of speech is an ongoing process, but it is clear that some of the systems will reflect the motor control of speech.
D. H. Whalen
Kodi Weatherholtz and T. Florian Jaeger
The seeming ease with which we usually understand each other belies the complexity of the processes that underlie speech perception. One of the biggest computational challenges is that different talkers realize the same speech categories (e.g., /p/) in physically different ways. We review the mixture of processes that enable robust speech understanding across talkers despite this lack of invariance. These processes range from automatic pre-speech adjustments of the distribution of energy over acoustic frequencies (normalization) to implicit statistical learning of talker-specific properties (adaptation, perceptual recalibration) to the generalization of these patterns across groups of talkers (e.g., gender differences).