Kodi Weatherholtz and T. Florian Jaeger
The seeming ease with which we usually understand each other belies the complexity of the processes that underlie speech perception. One of the biggest computational challenges is that different talkers realize the same speech categories (e.g., /p/) in physically different ways. We review the mixture of processes that enable robust speech understanding across talkers despite this lack of invariance. These processes range from automatic pre-speech adjustments of the distribution of energy over acoustic frequencies (normalization) to implicit statistical learning of talker-specific properties (adaptation, perceptual recalibration) to the generalization of these patterns across groups of talkers (e.g., gender differences).
Patrice Speeter Beddor
In their conversational interactions with speakers, listeners aim to understand what a speaker is saying, that is, they aim to arrive at the linguistic message, which is interwoven with social and other information, being conveyed by the input speech signal. Across the more than 60 years of speech perception research, a foundational issue has been to account for listeners’ ability to achieve stable linguistic percepts corresponding to the speaker’s intended message despite highly variable acoustic signals. Research has especially focused on acoustic variants attributable to the phonetic context in which a given phonological form occurs and on variants attributable to the particular speaker who produced the signal. These context- and speaker-dependent variants reveal the complex—albeit informationally rich—patterns that bombard listeners in their everyday interactions.
How do listeners deal with these variable acoustic patterns? Empirical studies that address this question provide clear evidence that perception is a malleable, dynamic, and active process. Findings show that listeners perceptually factor out, or compensate for, the variation due to context yet also use that same variation in deciding what a speaker has said. Similarly, listeners adjust, or normalize, for the variation introduced by speakers who differ in their anatomical and socio-indexical characteristics, yet listeners also use that socially structured variation to facilitate their linguistic judgments. Investigations of the time course of perception show that these perceptual accommodations occur rapidly, as the acoustic signal unfolds in real time. Thus, listeners closely attend to the phonetic details made available by different contexts and different speakers. The structured, lawful nature of this variation informs perception.
Speech perception changes over time not only in listeners’ moment-by-moment processing, but also across the life span of individuals as they acquire their native language(s), non-native languages, and new dialects and as they encounter other novel speech experiences. These listener-specific experiences contribute to individual differences in perceptual processing. However, even listeners from linguistically homogenous backgrounds differ in their attention to the various acoustic properties that simultaneously convey linguistically and socially meaningful information. The nature and source of listener-specific perceptual strategies serve as an important window on perceptual processing and on how that processing might contribute to sound change.
Theories of speech perception aim to explain how listeners interpret the input acoustic signal as linguistic forms. A theoretical account should specify the principles that underlie accurate, stable, flexible, and dynamic perception as achieved by different listeners in different contexts. Current theories differ in their conception of the nature of the information that listeners recover from the acoustic signal, with one fundamental distinction being whether the recovered information is gestural or auditory. Current approaches also differ in their conception of the nature of phonological representations in relation to speech perception, although there is increasing consensus that these representations are more detailed than the abstract, invariant representations of traditional formal phonology. Ongoing work in this area investigates how both abstract information and detailed acoustic information are stored and retrieved, and how best to integrate these types of information in a single theoretical model.
Sónia Frota and Marina Vigário
The syntax–phonology interface refers to the way syntax and phonology are interconnected. Although syntax and phonology constitute different language domains, it seems undisputed that they relate to each other in nontrivial ways. There are different theories about the syntax–phonology interface. They differ in how far each domain is seen as relevant to generalizations in the other domain, and in the types of information from each domain that are available to the other.
Some theories see the interface as unlimited in the direction and types of syntax–phonology connections, with syntax impacting on phonology and phonology impacting on syntax. Other theories constrain mutual interaction to a set of specific syntactic phenomena (i.e., discourse-related) that may be influenced by a limited set of phonological phenomena (namely, heaviness and rhythm). In most theories, there is an asymmetrical relationship: specific types of syntactic information are available to phonology, whereas syntax is phonology-free.
The role that syntax plays in phonology, as well as the types of syntactic information that are relevant to phonology, is also a matter of debate. At one extreme, Direct Reference Theories claim that phonological phenomena, such as external sandhi processes, refer directly to syntactic information. However, approaches arguing for a direct influence of syntax differ on the types of syntactic information needed to account for phonological phenomena, from syntactic heads and structural configurations (like c-command and government) to feature checking relationships and phase units. The precise syntactic information that is relevant to phonology may depend on (the particular version of) the theory of syntax assumed to account for syntax–phonology mapping. At the other extreme, Prosodic Hierarchy Theories propose that syntactic and phonological representations are fundamentally distinct and that the output of the syntax–phonology interface is prosodic structure. Under this view, phonological phenomena refer to the phonological domains defined in prosodic structure. The structure of phonological domains is built from the interaction of a limited set of syntactic information with phonological principles related to constituent size, weight, and eurhythmic effects, among others. The kind of syntactic information used in the computation of prosodic structure distinguishes between different Prosodic Hierarchy Theories: the relation-based approach makes reference to notions like head-complement, modifier-head relations, and syntactic branching, while the end-based approach focuses on edges of syntactic heads and maximal projections. Common to both approaches is the distinction between lexical and functional categories, with the latter being invisible to the syntax–phonology mapping. Besides accounting for external sandhi phenomena, prosodic structure interacts with other phonological representations, such as metrical structure and intonational structure.
As shown by the theoretical diversity, the study of the syntax–phonology interface raises many fundamental questions. A systematic comparison among proposals with reference to empirical evidence is lacking. In addition, findings from language acquisition and development and language processing constitute novel sources of evidence that need to be taken into account. The syntax–phonology interface thus remains a challenging research field in the years to come.
Erich R. Round
The non–Pama-Nyugan, Tangkic languages were spoken until recently in the southern Gulf of Carpentaria, Australia. The most extensively documented are Lardil, Kayardild, and Yukulta. Their phonology is notable for its opaque, word-final deletion rules and extensive word-internal sandhi processes. The morphology contains complex relationships between sets of forms and sets of functions, due in part to major historical refunctionalizations, which have converted case markers into markers of tense and complementization and verbal suffixes into case markers. Syntactic constituency is often marked by inflectional concord, resulting frequently in affix stacking. Yukulta in particular possesses a rich set of inflection-marking possibilities for core arguments, including detransitivized configurations and an inverse system. These relate in interesting ways historically to argument marking in Lardil and Kayardild. Subordinate clauses are marked for tense across most constituents other than the subject, and such tense marking is also found in main clauses in Lardil and Kayardild, which have lost the agreement and tense-marking second-position clitic of Yukulta. Under specific conditions of co-reference between matrix and subordinate arguments, and under certain discourse conditions, clauses may be marked, on all or almost all words, by complementization markers, in addition to inflection for case and tense.
Paul de Lacy
Phonology has both a taxonomic/descriptive and cognitive meaning. In the taxonomic/descriptive context, it refers to speech sound systems. As a cognitive term, it refers to a part of the brain’s ability to produce and perceive speech sounds. This article focuses on research in the cognitive domain.
The brain does not simply record speech sounds and “play them back.” It abstracts over speech sounds, and transforms the abstractions in nontrivial ways. Phonological cognition is about what those abstractions are, and how they are transformed in perception and production.
There are many theories about phonological cognition. Some theories see it as the result of domain-general mechanisms, such as analogy over a Lexicon. Other theories locate it in an encapsulated module that is genetically specified, and has innate propositional content. In production, this module takes as its input phonological material from a Lexicon, and refers to syntactic and morphological structure in producing an output, which involves nontrivial transformation. In some theories, the output is instructions for articulator movement, which result in speech sounds; in other theories, the output goes to the Phonetic module. In perception, a continuous acoustic signal is mapped onto a phonetic representation, which is then mapped onto underlying forms via the Phonological module, which are then matched to lexical entries.
Exactly which empirical phenomena phonological cognition is responsible for depends on the theory. At one extreme, it accounts for all human speech sound patterns and realization. At the other extreme, it is little more than a way of abstracting over speech sounds. In the most popular Generative conception, it explains some sound patterns, with other modules (e.g., the Lexicon and Phonetic module) accounting for others. There are many types of patterns, with names such as “assimilation,” “deletion,” and “neutralization”—a great deal of phonological research focuses on determining which patterns there are, which aspects are universal and which are language-particular, and whether/how phonological cognition is responsible for them.
Phonological computation connects with other cognitive structures. In the Generative T-model, the phonological module’s input includes morphs of Lexical items along with at least some morphological and syntactic structure; the output is sent to either a Phonetic module, or directly to the neuro-motor interface, resulting in articulator movement. However, other theories propose that these modules’ computation proceeds in parallel, and that there is bidirectional communication between them.
The study of phonological cognition is a young science, so many fundamental questions remain to be answered. There are currently many different theories, and theoretical diversity over the past few decades has increased rather than consolidated. In addition, new research methods have been developed and older ones have been refined, providing novel sources of evidence. Consequently, phonological research is both lively and challenging, and is likely to remain that way for some time to come.
When the phonological form of a morpheme—a unit of meaning that cannot be decomposed further into smaller units of meaning—involves a particular melodic pattern as part of its sound shape, this morpheme is specified for tone. In view of this definition, phrase- and utterance-level melodies—also known as intonation—are not to be interpreted as instances of tone. That is, whereas the question “Tomorrow?” may be uttered with a rising melody, this melody is not tone, because it is not a part of the lexical specification of the morpheme tomorrow. A language that presents morphemes that are specified with specific melodies is called a tone language. It is not the case that in a tone language every morpheme, content word, or syllable would be specified for tone. Tonal specification can be highly restricted within the lexicon. Examples of such sparsely specified tone languages include Swedish, Japanese, and Ekagi (a language spoken in the Indonesian part of New Guinea); in these languages, only some syllables in some words are specified for tone. There are also tone languages where each and every syllable of each and every word has a specification. Vietnamese and Shilluk (a language spoken in South Sudan) illustrate this configuration. Tone languages also vary greatly in terms of the inventory of phonological tone forms. The smallest possible inventory contrasts one specification with the absence of specification. But there are also tone languages with eight or more distinctive tone categories. The physical (acoustic) realization of the tone categories is primarily fundamental frequency (F0), which is perceived as pitch. However, often other phonetic correlates are also involved, in particular voice quality. Tone plays a prominent role in the study of phonology because of its structural complexity. That is, in many languages, the way a tone surfaces is conditioned by factors such as the segmental composition of the morpheme, the tonal specifications of surrounding constituents, morphosyntax, and intonation. On top of this, tone is diachronically unstable. This means that, when a language has tone, we can expect to find considerable variation between dialects, and more of it than in relation to other parts of the sound system.
Language is a system that maps meanings to forms, but the mapping is not always one-to-one. Variation means that one meaning corresponds to multiple forms, for example faster ~ more fast. The choice is not uniquely determined by the rules of the language, but is made by the individual at the time of performance (speaking, writing). Such choices abound in human language. They are usually not just a matter of free will, but involve preferences that depend on the context, including the phonological context. Phonological variation is a situation where the choice among expressions is phonologically conditioned, sometimes statistically, sometimes categorically. In this overview, we take a look at three studies of variable vowel harmony in three languages (Finnish, Hungarian, and Tommo So) formulated in three frameworks (Partial Order Optimality Theory, Stochastic Optimality Theory, and Maximum Entropy Grammar). For example, both Finnish and Hungarian have Backness Harmony: vowels must be all [+back] or all [−back] within a single word, with the exception of neutral vowels that are compatible with either. Surprisingly, some stems allow both [+back] and [−back] suffixes in free variation, for example, analyysi-na ~ analyysi-nä ‘analysis-
Harry van der Hulst
The subject of this article is vowel harmony. In its prototypical form, this phenomenon involves agreement between all vowels in a word for some phonological property (such as palatality, labiality, height or tongue root position). This agreement is then evidenced by agreement patterns within morphemes and by alternations in vowels when morphemes are combined into complex words, thus creating allomorphic alternations. Agreement involves one or more harmonic features for which vowels form harmonic pairs, such that each vowel has a harmonic counterpart in the other set. I will focus on vowels that fail to alternate, that are thus neutral (either inherently or in a specific context), and that will be either opaque or transparent to the process. We will compare approaches that use underspecification of binary features and approaches that use unary features. For vowel harmony, vowels are either triggers or targets, and for each, specific conditions may apply. Vowel harmony can be bidirectional or unidirectional and can display either a root control pattern or a dominant/recessive pattern.
Eystein Dahl and Antonio Fábregas
Zero or null morphology refers to morphological units that are devoid of phonological content. Whether such entities should be postulated is one of the most controversial issues in morphological theory, with disagreements in how the concept should be delimited, what would count as an instance of zero morphology inside a particular theory, and whether such objects should be allowed even as mere analytical instruments.
With respect to the first problem, given that zero morphology is a hypothesis that comes from certain analyses, delimiting what counts as a zero morpheme is not a trivial matter. The concept must be carefully differentiated from others that intuitively also involve situations where there is no overt morphological marking: cumulative morphology, phonological deletion, etc.
About the second issue, what counts as null can also depend on the specific theories where the proposal is made. In the strict sense, zero morphology involves a complete morphosyntactic representation that is associated to zero phonological content, but there are other notions of zero morphology that differ from the one discussed here, such as absolute absence of morphological expression, in addition to specific theory-internal interpretations of what counts as null. Thus, it is also important to consider the different ways in which something can be morphologically silent.
Finally, with respect to the third side of the debate, arguments are made for and against zero morphology, notably from the perspectives of falsifiability, acquisition, and psycholinguistics. Of particular impact is the question of which properties a theory should have in order to block the possibility that zero morphology exists, and conversely the properties that theories that accept zero morphology associate to null morphemes.
An important ingredient in this debate has to do with two empirical domains: zero derivation and paradigmatic uniformity. Ultimately, the plausibility that zero morphemes exist or not depends on the success at accounting for these two empirical patterns in a better way than theories that ban zero morphology.