Acoustic theories assume that speech perception begins with an acoustic signal transformed by auditory processing. In classical acoustic theory, this assumption entails perceptual primitives that are akin to those identified in the spectral analyses of speech. The research objective is to link these primitives with phonological units of traditional descriptive linguistics via sound categories and then to understand how these units/categories are bound together in time to recognize words. Achieving this objective is challenging because the signal is replete with variation, making the mapping of signal to sound category nontrivial. Research that grapples with the mapping problem has led to many basic findings about speech perception, including the importance of cue redundancy to category identification and of differential cue weighting to category formation. Research that grapples with the related problem of binding categories into words for speech processing motivates current neuropsychological work on speech perception. The central focus on the mapping problem in classical theory has also led to an alternative type of acoustic theory, namely, exemplar-based theory. According to this type of acoustic theory, variability is critical for processing talker-specific information during speech processing. The problems associated with mapping acoustic cues to sound categories is not addressed because exemplar-based theories assume that perceptual traces of whole words are perceptual primitives. Smaller units of speech sound representation, as well as the phonology as a whole, are emergent from the word-based representations. Yet, like classical acoustic theories, exemplar-based theories assume that production is mediated by a phonology that has no inherent motor information. The presumed disconnect between acoustic and motor information during perceptual processing distinguishes acoustic theories as a class from other theories of speech perception.
12
Article
Acoustic Theories of Speech Perception
Melissa Redford and Melissa Baese-Berk
Article
Discriminative Learning and the Lexicon: NDL and LDL
Yu-Ying Chuang and R. Harald Baayen
Naive discriminative learning (NDL) and linear discriminative learning (LDL) are simple computational algorithms for lexical learning and lexical processing. Both NDL and LDL assume that learning is discriminative, driven by prediction error, and that it is this error that calibrates the association strength between input and output representations. Both words’ forms and their meanings are represented by numeric vectors, and mappings between forms and meanings are set up. For comprehension, form vectors predict meaning vectors. For production, meaning vectors map onto form vectors. These mappings can be learned incrementally, approximating how children learn the words of their language. Alternatively, optimal mappings representing the end state of learning can be estimated. The NDL and LDL algorithms are incorporated in a computational theory of the mental lexicon, the ‘discriminative lexicon’. The model shows good performance both with respect to production and comprehension accuracy, and for predicting aspects of lexical processing, including morphological processing, across a wide range of experiments. Since, mathematically, NDL and LDL implement multivariate multiple regression, the ‘discriminative lexicon’ provides a cognitively motivated statistical modeling approach to lexical processing.
Article
The Phonetics of Prosody
Amalia Arvaniti
Prosody is an umbrella term used to cover a variety of interconnected and interacting phenomena, namely stress, rhythm, phrasing, and intonation. The phonetic expression of prosody relies on a number of parameters, including duration, amplitude, and fundamental frequency (F0). The same parameters are also used to encode lexical contrasts (such as tone), as well as paralinguistic phenomena (such as anger, boredom, and excitement). Further, the exact function and organization of the phonetic parameters used for prosody differ across languages. These considerations make it imperative to distinguish the linguistic phenomena that make up prosody from their phonetic exponents, and similarly to distinguish between the linguistic and paralinguistic uses of the latter. A comprehensive understanding of prosody relies on the idea that speech is prosodically organized into phrasal constituents, the edges of which are phonetically marked in a number of ways, for example, by articulatory strengthening in the beginning and lengthening at the end. Phrases are also internally organized either by stress, that is around syllables that are more salient relative to others (as in English and Spanish), or by the repetition of a relatively stable tonal pattern over short phrases (as in Korean, Japanese, and French). Both types of organization give rise to rhythm, the perception of speech as consisting of groups of a similar and repetitive pattern. Tonal specification over phrases is also used for intonation purposes, that is, to mark phrasal boundaries, and express information structure and pragmatic meaning. Taken together, the components of prosody help with the organization and planning of speech, while prosodic cues are used by listeners during both language acquisition and speech processing. Importantly, prosody does not operate independently of segments; rather, it profoundly affects segment realization, making the incorporation of an understanding of prosody into experimental design essential for most phonetic research.
Article
Psycholinguistic Approaches to Morphology: Production
Benjamin V. Tucker
Speech production is an important aspect of linguistic competence. An attempt to understand linguistic morphology without speech production would be incomplete. A central research question develops from this perspective: what is the role of morphology in speech production. Speech production researchers collect many different types of data and much of that data has informed how linguists and psycholinguists characterize the role of linguistic morphology in speech production. Models of speech production play an important role in the investigation of linguistic morphology. These models provide a framework, which allows researchers to explore the role of morphology in speech production. However, models of speech production generally focus on different aspects of the production process. These models are split between phonetic models (which attempt to understand how the brain creates motor commands for uttering and articulating speech) and psycholinguistic models (which attempt to understand the cognitive processes and representation of the production process). Models that merge these two model types, phonetic and psycholinguistic models, have the potential to allow researchers the possibility to make specific predictions about the effects of morphology on speech production. Many studies have explored models of speech production, but the investigation of the role of morphology and how morphological properties may be represented in merged speech production models is limited.
Article
Ideophones (Mimetics, Expressives)
Kimi Akita and Mark Dingemanse
Ideophones, also termed mimetics or expressives, are marked words that depict sensory imagery. They are found in many of the world’s languages, and sizable lexical classes of ideophones are particularly well-documented in the languages of Asia, Africa, and the Americas. Ideophones are not limited to onomatopoeia like meow and smack but cover a wide range of sensory domains, such as manner of motion (e.g., plisti plasta ‘splish-splash’ in Basque), texture (e.g., tsaklii ‘rough’ in Ewe), and psychological states (e.g., wakuwaku ‘excited’ in Japanese). Across languages, ideophones stand out as marked words due to special phonotactics, expressive morphology including certain types of reduplication, and relative syntactic independence, in addition to production features like prosodic foregrounding and common co-occurrence with iconic gestures.
Three intertwined issues have been repeatedly debated in the century-long literature on ideophones. (a) Definition: Isolated descriptive traditions and cross-linguistic variation have sometimes obscured a typologically unified view of ideophones, but recent advances show the promise of a prototype definition of ideophones as conventionalized depictions in speech, with room for language-specific nuances. (b) Integration: The variable integration of ideophones across linguistic levels reveals an interaction between expressiveness and grammatical integration, and has important implications for how to conceive of dependencies between linguistic systems. (c) Iconicity: Ideophones form a natural laboratory for the study of iconic form-meaning associations in natural languages, and converging evidence from corpus and experimental studies suggests important developmental, evolutionary, and communicative advantages of ideophones.
Article
Second Language Phonetics
Ocke-Schwen Bohn
The study of second language phonetics is concerned with three broad and overlapping research areas: the characteristics of second language speech production and perception, the consequences of perceiving and producing nonnative speech sounds with a foreign accent, and the causes and factors that shape second language phonetics. Second language learners and bilinguals typically produce and perceive the sounds of a nonnative language in ways that are different from native speakers. These deviations from native norms can be attributed largely, but not exclusively, to the phonetic system of the native language. Non-nativelike speech perception and production may have both social consequences (e.g., stereotyping) and linguistic–communicative consequences (e.g., reduced intelligibility). Research on second language phonetics over the past ca. 30 years has resulted in a fairly good understanding of causes of nonnative speech production and perception, and these insights have to a large extent been driven by tests of the predictions of models of second language speech learning and of cross-language speech perception. It is generally accepted that the characteristics of second language speech are predominantly due to how second language learners map the sounds of the nonnative to the native language. This mapping cannot be entirely predicted from theoretical or acoustic comparisons of the sound systems of the languages involved, but has to be determined empirically through tests of perceptual assimilation. The most influential learner factors which shape how a second language is perceived and produced are the age of learning and the amount and quality of exposure to the second language. A very important and far-reaching finding from research on second language phonetics is that age effects are not due to neurological maturation which could result in the attrition of phonetic learning ability, but to the way phonetic categories develop as a function of experience with surrounding sound systems.
Article
Iconicity
Irit Meir and Oksana Tkachman
Iconicity is a relationship of resemblance or similarity between the two aspects of a sign: its form and its meaning. An iconic sign is one whose form resembles its meaning in some way. The opposite of iconicity is arbitrariness. In an arbitrary sign, the association between form and meaning is based solely on convention; there is nothing in the form of the sign that resembles aspects of its meaning. The Hindu-Arabic numerals 1, 2, 3 are arbitrary, because their current form does not correlate to any aspect of their meaning. In contrast, the Roman numerals I, II, III are iconic, because the number of occurrences of the sign I correlates with the quantity that the numerals represent. Because iconicity has to do with the properties of signs in general and not only those of linguistic signs, it plays an important role in the field of semiotics—the study of signs and signaling. However, language is the most pervasive symbolic communicative system used by humans, and the notion of iconicity plays an important role in characterizing the linguistic sign and linguistic systems. Iconicity is also central to the study of literary uses of language, such as prose and poetry.
There are various types of iconicity: the form of a sign may resemble aspects of its meaning in several ways: it may create a mental image of the concept (imagic iconicity), or its structure and the arrangement of its elements may resemble the structural relationship between components of the concept represented (diagrammatic iconicity). An example of the first type is the word cuckoo, whose sounds resemble the call of the bird, or a sign such as RABBIT in Israeli Sign Language, whose form—the hands representing the rabbit's long ears—resembles a visual property of that animal. An example of diagrammatic iconicity is vēnī, vīdī, vīcī, where the order of clauses in a discourse is understood as reflecting the sequence of events in the world.
Iconicity is found on all linguistic levels: phonology, morphology, syntax, semantics, and discourse. It is found both in spoken languages and in sign languages. However, sign languages, because of the visual-gestural modality through which they are transmitted, are much richer in iconic devices, and therefore offer a rich array of topics and perspectives for investigating iconicity, and the interaction between iconicity and language structure.
Article
Defectiveness in Morphology
Antonio Fábregas
Morphological defectiveness refers to situations where one or more paradigmatic forms of a lexeme are not realized, without plausible syntactic, semantic, or phonological causes. The phenomenon tends to be associated with low-frequency lexemes and loanwords. Typically, defectiveness is gradient, lexeme-specific, and sensitive to the internal structure of paradigms.
The existence of defectiveness is a challenge to acquisition models and morphological theories where there are elsewhere operations to materialize items. For this reason, defectiveness has become a rich field of research in recent years, with distinct approaches that view it as an item-specific idiosyncrasy, as an epiphenomenal result of rule competition, or as a normal morphological alternation within a paradigmatic space.
Article
Direct Perception of Speech
Carol A. Fowler
The theory of speech perception as direct derives from a general direct-realist account of perception. A realist stance on perception is that perceiving enables occupants of an ecological niche to know its component layouts, objects, animals, and events. “Direct” perception means that perceivers are in unmediated contact with their niche (mediated neither by internally generated representations of the environment nor by inferences made on the basis of fragmentary input to the perceptual systems). Direct perception is possible because energy arrays that have been causally structured by niche components and that are available to perceivers specify (i.e., stand in 1:1 relation to) components of the niche. Typically, perception is multi-modal; that is, perception of the environment depends on specifying information present in, or even spanning, multiple energy arrays.
Applied to speech perception, the theory begins with the observation that speech perception involves the same perceptual systems that, in a direct-realist theory, enable direct perception of the environment. Most notably, the auditory system supports speech perception, but also the visual system, and sometimes other perceptual systems. Perception of language forms (consonants, vowels, word forms) can be direct if the forms lawfully cause specifying patterning in the energy arrays available to perceivers. In Articulatory Phonology, the primitive language forms (constituting consonants and vowels) are linguistically significant gestures of the vocal tract, which cause patterning in air and on the face. Descriptions are provided of informational patterning in acoustic and other energy arrays. Evidence is next reviewed that speech perceivers make use of acoustic and cross modal information about the phonetic gestures constituting consonants and vowels to perceive the gestures.
Significant problems arise for the viability of a theory of direct perception of speech. One is the “inverse problem,” the difficulty of recovering vocal tract shapes or actions from acoustic input. Two other problems arise because speakers coarticulate when they speak. That is, they temporally overlap production of serially nearby consonants and vowels so that there are no discrete segments in the acoustic signal corresponding to the discrete consonants and vowels that talkers intend to convey (the “segmentation problem”), and there is massive context-sensitivity in acoustic (and optical and other modalities) patterning (the “invariance problem”). The present article suggests solutions to these problems.
The article also reviews signatures of a direct mode of speech perception, including that perceivers use cross-modal speech information when it is available and exhibit various indications of perception-production linkages, such as rapid imitation and a disposition to converge in dialect with interlocutors.
An underdeveloped domain within the theory concerns the very important role of longer- and shorter-term learning in speech perception. Infants develop language-specific modes of attention to acoustic speech signals (and optical information for speech), and adult listeners attune to novel dialects or foreign accents. Moreover, listeners make use of lexical knowledge and statistical properties of the language in speech perception. Some progress has been made in incorporating infant learning into a theory of direct perception of speech, but much less progress has been made in the other areas.
Article
Acceptability Judgments
James Myers
Acceptability judgments are reports of a speaker’s or signer’s subjective sense of the well-formedness, nativeness, or naturalness of (novel) linguistic forms. Their value comes in providing data about the nature of the human capacity to generalize beyond linguistic forms previously encountered in language comprehension. For this reason, acceptability judgments are often also called grammaticality judgments (particularly in syntax), although unlike the theory-dependent notion of grammaticality, acceptability is accessible to consciousness. While acceptability judgments have been used to test grammatical claims since ancient times, they became particularly prominent with the birth of generative syntax. Today they are also widely used in other linguistic schools (e.g., cognitive linguistics) and other linguistic domains (pragmatics, semantics, morphology, and phonology), and have been applied in a typologically diverse range of languages. As psychological responses to linguistic stimuli, acceptability judgments are experimental data. Their value thus depends on the validity of the experimental procedures, which, in their traditional version (where theoreticians elicit judgments from themselves or a few colleagues), have been criticized as overly informal and biased. Traditional responses to such criticisms have been supplemented in recent years by laboratory experiments that use formal psycholinguistic methods to collect and quantify judgments from nonlinguists under controlled conditions. Such formal experiments have played an increasingly influential role in theoretical linguistics, being used to justify subtle judgment claims or new grammatical models that incorporate gradience or lexical influences. They have also been used to probe the cognitive processes giving rise to the sense of acceptability itself, the central finding being that acceptability reflects processing ease. Exploring what this finding means will require not only further empirical work on the acceptability judgment process, but also theoretical work on the nature of grammar.
12