The Motor Theory of Speech Perception is a proposed explanation of the fundamental relationship between the way speech is produced and the way it is perceived. Associated primarily with the work of Liberman and colleagues, it posited the active participation of the motor system in the perception of speech. Early versions of the theory contained elements that later proved untenable, such as the expectation that the neural commands to the muscles (as seen in electromyography) would be more invariant than the acoustics. Support drawn from categorical perception (in which discrimination is quite poor within linguistic categories but excellent across boundaries) was called into question by studies showing means of improving within-category discrimination and finding similar results for nonspeech sounds and for animals perceiving speech. Evidence for motor involvement in perceptual processes nonetheless continued to accrue, and related motor theories have been proposed. Neurological and neuroimaging results have yielded a great deal of evidence consistent with variants of the theory, but they highlight the issue that there is no single “motor system,” and so different components appear in different contexts. Assigning the appropriate amount of effort to the various systems that interact to result in the perception of speech is an ongoing process, but it is clear that some of the systems will reflect the motor control of speech.
D. H. Whalen
One of the most fundamental problems in research on spoken language is to understand how the categorical, systemic knowledge that speakers have in the form of a phonological grammar maps onto the continuous, high-dimensional physical speech act that transmits the linguistic message. The invariant units of phonological analysis have no invariant analogue in the signal—any given phoneme can manifest itself in many possible variants, depending on context, speech rate, utterance position and the like, and the acoustic cues for a given phoneme are spread out over time across multiple linguistic units. Speakers and listeners are highly knowledgeable about the lawfully structured variation in the signal and they skillfully exploit articulatory and acoustic trading relations when speaking and perceiving. For the scientific description of spoken language understanding this association between abstract, discrete categories and continuous speech dynamics remains a formidable challenge. Articulatory Phonology and the associated Task Dynamic model present one particular proposal on how to step up to this challenge using the mathematics of dynamical systems with the central insight being that spoken language is fundamentally based on the production and perception of linguistically defined patterns of motion. In Articulatory Phonology, primitive units of phonological representation are called gestures. Gestures are defined based on linear second order differential equations, giving them inherent spatial and temporal specifications. Gestures control the vocal tract at a macroscopic level, harnessing the many degrees of freedom in the vocal tract into low-dimensional control units. Phonology, in this model, thus directly governs the spatial and temporal orchestration of vocal tract actions.