The word accent system of Tokyo Japanese might look quite complex with a number of accent patterns and rules. However, recent research has shown that it is not as complex as has been assumed if one incorporates the notion of markedness into the analysis: nouns have only two productive accent patterns, the antepenultimate and the unaccented pattern, and different accent rules can be generalized if one focuses on these two productive accent patterns.
The word accent system raises some new interesting issues. One of them concerns the fact that a majority of nouns are ‘unaccented,’ that is, they are pronounced with a rather flat pitch pattern, apparently violating the principle of obligatoriness. A careful analysis of noun accentuation reveals that this strange accent pattern occurs in some linguistically predictable structures. In morphologically simplex nouns, it typically tends to emerge in four-mora nouns ending in a sequence of light syllables. In compound nouns, on the other hand, it emerges due to multiple factors, such as compound-final deaccenting morphemes, deaccenting pseudo-morphemes, and some types of prosodic configurations.
Japanese pitch accent exhibits an interesting aspect in its interactions with other phonological and linguistic structures. For example, the accent of compound nouns is closely related with rendaku, or sequential voicing; the choice between the accented and unaccented patterns in certain types of compound nouns correlates with the presence or absence of the sequential voicing. Moreover, whether the compound accent rule applies to a certain compound depends on its internal morphosyntactic configuration as well as its meaning; alternatively, the compound accent rule is blocked in certain types of morphosyntactic and semantic structures.
Finally, careful analysis of word accent sheds new light on the syllable structure of the language, notably on two interrelated questions about diphthong-hood and super-heavy syllables. It provides crucial insight into ‘diphthongs,’ or the question of which vowel sequence constitutes a diphthong, against a vowel sequence across a syllable boundary. It also presents new evidence against trimoraic syllables in the language.
Acceptability judgments are reports of a speaker’s or signer’s subjective sense of the well-formedness, nativeness, or naturalness of (novel) linguistic forms. Their value comes in providing data about the nature of the human capacity to generalize beyond linguistic forms previously encountered in language comprehension. For this reason, acceptability judgments are often also called grammaticality judgments (particularly in syntax), although unlike the theory-dependent notion of grammaticality, acceptability is accessible to consciousness. While acceptability judgments have been used to test grammatical claims since ancient times, they became particularly prominent with the birth of generative syntax. Today they are also widely used in other linguistic schools (e.g., cognitive linguistics) and other linguistic domains (pragmatics, semantics, morphology, and phonology), and have been applied in a typologically diverse range of languages. As psychological responses to linguistic stimuli, acceptability judgments are experimental data. Their value thus depends on the validity of the experimental procedures, which, in their traditional version (where theoreticians elicit judgments from themselves or a few colleagues), have been criticized as overly informal and biased. Traditional responses to such criticisms have been supplemented in recent years by laboratory experiments that use formal psycholinguistic methods to collect and quantify judgments from nonlinguists under controlled conditions. Such formal experiments have played an increasingly influential role in theoretical linguistics, being used to justify subtle judgment claims or new grammatical models that incorporate gradience or lexical influences. They have also been used to probe the cognitive processes giving rise to the sense of acceptability itself, the central finding being that acceptability reflects processing ease. Exploring what this finding means will require not only further empirical work on the acceptability judgment process, but also theoretical work on the nature of grammar.
The Romance languages are characterized by the existence of pronominal clitics. Third person pronominal clitics are often, but not always, homophonous with the definite determiner series in the same language. Both pronominal and determiner clitics emerge early in child acquisition, but their path of development varies depending on clitic type and language. While determiner clitic acquisition is quite homogeneous across Romance, there is wide cross-linguistic variation for pronominal clitics (accusative vs. partitive vs. dative, first/second person vs. third person); the observed differences in acquisition correlate with syntactic differences between the pronouns. Acquisition of pronominal clitics is also affected if a language has both null objects and object clitics, as in European Portuguese. The interpretation of Romance pronominal clitics is generally target-like in child grammar, with absence of Pronoun Interpretation problems like those found in languages with strong pronouns. Studies on developmental language impairment show that, as in typical development, clitic production is subject to cross-linguistic variation. The divergent performance between determiners and pronominals in this population points to the syntactic (as opposed to phonological) nature of the deficit.
Marie K. Huffman
Articulatory phonetics is concerned with the physical mechanisms involved in producing spoken language. A fundamental goal of articulatory phonetics is to relate linguistic representations to articulator movements in real time and the consequent acoustic output that makes speech a medium for information transfer. Understanding the overall process requires an appreciation of the aerodynamic conditions necessary for sound production and the way that the various parts of the chest, neck, and head are used to produce speech. One descriptive goal of articulatory phonetics is the efficient and consistent description of the key articulatory properties that distinguish sounds used contrastively in language. There is fairly strong consensus in the field about the inventory of terms needed to achieve this goal. Despite this common, segmental, perspective, speech production is essentially dynamic in nature. Much remains to be learned about how the articulators are coordinated for production of individual sounds and how they are coordinated to produce sounds in sequence. Cutting across all of these issues is the broader question of which aspects of speech production are due to properties of the physical mechanism and which are the result of the nature of linguistic representations. A diversity of approaches is used to try to tease apart the physical and the linguistic contributions to the articulatory fabric of speech sounds in the world’s languages. A variety of instrumental techniques are currently available, and improvement in safe methods of tracking articulators in real time promises to soon bring major advances in our understanding of how speech is produced.
William R. Leben
Autosegments were introduced by John Goldsmith in his 1976 M.I.T. dissertation to represent tone and other suprasegmental phenomena. Goldsmith’s intuition, embodied in the term he created, was that autosegments constituted an independent, conceptually equal tier of phonological representation, with both tiers realized simultaneously like the separate voices in a musical score.
The analysis of suprasegmentals came late to generative phonology, even though it had been tackled in American structuralism with the long components of Harris’s 1944 article, “Simultaneous components in phonology” and despite being a particular focus of Firthian prosodic analysis. The standard version of generative phonology of the era (Chomsky and Halle’s The Sound Pattern of English) made no special provision for phenomena that had been labeled suprasegmental or prosodic by earlier traditions.
An early sign that tones required a separate tier of representation was the phenomenon of tonal stability. In many tone languages, when vowels are lost historically or synchronically, their tones remain. The behavior of contour tones in many languages also falls into place when the contours are broken down into sequences of level tones on an independent level or representation. The autosegmental framework captured this naturally, since a sequence of elements on one tier can be connected to a single element on another. But the single most compelling aspect of the early autosegmental model was a natural account of tone spreading, a very common process that was only awkwardly captured by rules of whatever sort. Goldsmith’s autosegmental solution was the Well-Formedness Condition, requiring, among other things, that every tone on the tonal tier be associated with some segment on the segmental tier, and vice versa. Tones thus spread more or less automatically to segments lacking them. The Well-Formedness Condition, at the very core of the autosegmental framework, was a rare constraint, posited nearly two decades before Optimality Theory.
One-to-many associations and spreading onto adjacent elements are characteristic of tone but not confined to it. Similar behaviors are widespread in long-distance phenomena, including intonation, vowel harmony, and nasal prosodies, as well as more locally with partial or full assimilation across adjacent segments.
The early autosegmental notion of tiers of representation that were distinct but conceptually equal soon gave way to a model with one basic tier connected to tiers for particular kinds of articulation, including tone and intonation, nasality, vowel features, and others. This has led to hierarchical representations of phonological features in current models of feature geometry, replacing the unordered distinctive feature matrices of early generative phonology. Autosegmental representations and processes also provide a means of representing non-concatenative morphology, notably the complex interweaving of roots and patterns in Semitic languages.
Later work modified many of the key properties of the autosegmental model. Optimality Theory has led to a radical rethinking of autosegmental mapping, delinking, and spreading as they were formulated under the earlier derivational paradigm.
Blending is a type of word formation in which two or more words are merged into one so that the blended constituents are either clipped, or partially overlap. An example of a typical blend is brunch, in which the beginning of the word breakfast is joined with the ending of the word lunch. In many cases such as motel (motor + hotel) or blizzaster (blizzard + disaster) the constituents of a blend overlap at segments that are phonologically or graphically identical. In some blends, both constituents retain their form as a result of overlap, for example, stoption (stop + option). These examples illustrate only a handful of the variety of forms blends may take; more exotic examples include formations like Thankshallowistmas (Thanksgiving + Halloween + Christmas). The visual and audial amalgamation in blends is reflected on the semantic level. It is common to form blends meaning a combination or a product of two objects or phenomena, such as an animal breed (e.g., zorse, a breed of zebra and horse), an interlanguage variety (e.g., franglais, which is a French blend of français and anglais meaning a mixture of French and English languages), or other type of mix (e.g., a shress is a type of clothes having features of both a shirt and a dress).
Blending as a word formation process can be regarded as a subtype of compounding because, like compounds, blends are formed of two (or sometimes more) content words and semantically either are hyponyms of one of their constituents, or exhibit some kind of paradigmatic relationships between the constituents. In contrast to compounds, however, the formation of blends is restricted by a number of phonological constraints given that the resulting formation is a single word. In particular, blends tend to be of the same length as the longest of their constituent words, and to preserve the main stress of one of their constituents. Certain regularities are also observed in terms of ordering of the words in a blend (e.g., shorter first, more frequent first), and in the position of the switch point, that is, where one blended word is cut off and switched to another (typically at the syllable boundary or at the onset/rime boundary). The regularities of blend formation can be related to the recognizability of the blended words.
Bracketing paradoxes—constructions whose morphosyntactic and morpho-phonological structures appear to be irreconcilably at odds (e.g., unhappier)—are unanimously taken to point to truths about the derivational system that we have not yet grasped. Consider that the prefix un- must be structurally separate in some way from happier both for its own reasons (its [n] surprisingly does not assimilate in Place to a following consonant (e.g., u[n]popular)), and for reasons external to the prefix (the suffix -er must be insensitive to the presence of un-, as the comparative cannot attach to bases of three syllables or longer (e.g., *intelligenter)). But, un- must simultaneously be present in the derivation before -er is merged, so that unhappier can have the proper semantic reading (‘more unhappy’, and not ‘not happier’). Bracketing paradoxes emerged as a problem for generative accounts of both morphosyntax and morphophonology only in the 1970s. With the rise of restrictions on and technology used to describe and represent the behavior of affixes (e.g., the Affix-Ordering Generalization, Lexical Phonology and Morphology, the Prosodic Hierarchy), morphosyntacticians and phonologists were confronted with this type of inconsistent derivation in many unrelated languages.
Child phonology refers to virtually every phonetic and phonological phenomenon observable in the speech productions of children, including babbles. This includes qualitative and quantitative aspects of babbled utterances as well as all behaviors such as the deletion or modification of the sounds and syllables contained in the adult (target) forms that the child is trying to reproduce in his or her spoken utterances. This research is also increasingly concerned with issues in speech perception, a field of investigation that has traditionally followed its own course; it is only recently that the two fields have started to converge. The recent history of research on child phonology, the theoretical approaches and debates surrounding it, as well as the research methods and resources that have been employed to address these issues empirically, parallel the evolution of phonology, phonetics, and psycholinguistics as general fields of investigation. Child phonology contributes important observations, often organized in terms of developmental time periods, which can extend from the child’s earliest babbles to the stage when he or she masters the sounds, sound combinations, and suprasegmental properties of the ambient (target) language. Central debates within the field of child phonology concern the nature and origins of phonological representations as well as the ways in which they are acquired by children. Since the mid-1900s, the most central approaches to these questions have tended to fall on each side of the general divide between generative vs. functionalist (usage-based) approaches to phonology. Traditionally, generative approaches have embraced a universal stance on phonological primitives and their organization within hierarchical phonological representations, assumed to be innately available as part of the human language faculty. In contrast to this, functionalist approaches have utilized flatter (non-hierarchical) representational models and rejected nativist claims about the origin of phonological constructs. Since the beginning of the 1990s, this divide has been blurred significantly, both through the elaboration of constraint-based frameworks that incorporate phonetic evidence, from both speech perception and production, as part of accounts of phonological patterning, and through the formulation of emergentist approaches to phonological representation. Within this context, while controversies remain concerning the nature of phonological representations, debates are fueled by new outlooks on factors that might affect their emergence, including the types of learning mechanisms involved, the nature of the evidence available to the learner (e.g., perceptual, articulatory, and distributional), as well as the extent to which the learner can abstract away from this evidence. In parallel, recent advances in computer-assisted research methods and data availability, especially within the context of the PhonBank project, offer researchers unprecedented support for large-scale investigations of child language corpora. This combination of theoretical and methodological advances provides new and fertile grounds for research on child phonology and related implications for phonological theory.
Clinical linguistics is the branch of linguistics that applies linguistic concepts and theories to the study of language disorders. As the name suggests, clinical linguistics is a dual-facing discipline. Although the conceptual roots of this field are in linguistics, its domain of application is the vast array of clinical disorders that may compromise the use and understanding of language. Both dimensions of clinical linguistics can be addressed through an examination of specific linguistic deficits in individuals with neurodevelopmental disorders, craniofacial anomalies, adult-onset neurological impairments, psychiatric disorders, and neurodegenerative disorders. Clinical linguists are interested in the full range of linguistic deficits in these conditions, including phonetic deficits of children with cleft lip and palate, morphosyntactic errors in children with specific language impairment, and pragmatic language impairments in adults with schizophrenia.
Like many applied disciplines in linguistics, clinical linguistics sits at the intersection of a number of areas. The relationship of clinical linguistics to the study of communication disorders and to speech-language pathology (speech and language therapy in the United Kingdom) are two particularly important points of intersection. Speech-language pathology is the area of clinical practice that assesses and treats children and adults with communication disorders. All language disorders restrict an individual’s ability to communicate freely with others in a range of contexts and settings. So language disorders are first and foremost communication disorders. To understand language disorders, it is useful to think of them in terms of points of breakdown on a communication cycle that tracks the progress of a linguistic utterance from its conception in the mind of a speaker to its comprehension by a hearer. This cycle permits the introduction of a number of important distinctions in language pathology, such as the distinction between a receptive and an expressive language disorder, and between a developmental and an acquired language disorder. The cycle is also a useful model with which to conceptualize a range of communication disorders other than language disorders. These other disorders, which include hearing, voice, and fluency disorders, are also relevant to clinical linguistics.
Clinical linguistics draws on the conceptual resources of the full range of linguistic disciplines to describe and explain language disorders. These disciplines include phonetics, phonology, morphology, syntax, semantics, pragmatics, and discourse. Each of these linguistic disciplines contributes concepts and theories that can shed light on the nature of language disorder. A wide range of tools and approaches are used by clinical linguists and speech-language pathologists to assess, diagnose, and treat language disorders. They include the use of standardized and norm-referenced tests, communication checklists and profiles (some administered by clinicians, others by parents, teachers, and caregivers), and qualitative methods such as conversation analysis and discourse analysis. Finally, clinical linguists can contribute to debates about the nosology of language disorders. In order to do so, however, they must have an understanding of the place of language disorders in internationally recognized classification systems such as the 2013 Diagnostic and Statistical Manual of Mental Disorders (DSM-5) of the American Psychiatric Association.
The study of coarticulation—namely, the articulatory modification of a given speech sound arising from coproduction or overlap with neighboring sounds in the speech chain—has attracted the close attention of phonetic researchers for at least the last 60 years. Knowledge about coarticulatory patterns in speech should provide information about the planning mechanisms of consecutive consonants and vowels and the execution of coordinative articulatory structures during the production of those segmental units. Coarticulatory effects involve changes in articulatory displacement over time toward the left (anticipatory) or the right (carryover) of the trigger, and their typology and extent depend on the articulator under investigation (lip, velum, tongue, jaw, larynx) and the articulatory characteristics of the individual consonants and vowels, as well as nonsegmental factors such as speech rate, stress, and language. A challenge for studying coarticulation is that different speakers may use different coarticulatory mechanisms when producing a given phonemic sequence and they also use coarticulatory information differently for phonemic identification in perception. More knowledge about all these research issues should contribute to a deeper understanding of coarticulation deficits in speakers with speech disorders, how the ability to coarticulate develops from childhood to adulthood, and the extent to which the failure to compensate for coarticulatory effects may give rise to sound change.
Jane Chandlee and Jeffrey Heinz
Computational phonology studies the nature of the computations necessary and sufficient for characterizing phonological knowledge. As a field it is informed by the theories of computation and phonology.
The computational nature of phonological knowledge is important because at a fundamental level it is about the psychological nature of memory as it pertains to phonological knowledge. Different types of phonological knowledge can be characterized as computational problems, and the solutions to these problems reveal their computational nature. In contrast to syntactic knowledge, there is clear evidence that phonological knowledge is computationally bounded to the so-called regular classes of sets and relations. These classes have multiple mathematical characterizations in terms of logic, automata, and algebra with significant implications for the nature of memory. In fact, there is evidence that phonological knowledge is bounded by particular subregular classes, with more restrictive logical, automata-theoretic, and algebraic characterizations, and thus by weaker models of memory.
Connectionism is an important theoretical framework for the study of human cognition and behavior. Also known as Parallel Distributed Processing (PDP) or Artificial Neural Networks (ANN), connectionism advocates that learning, representation, and processing of information in mind are parallel, distributed, and interactive in nature. It argues for the emergence of human cognition as the outcome of large networks of interactive processing units operating simultaneously. Inspired by findings from neural science and artificial intelligence, connectionism is a powerful computational tool, and it has had profound impact on many areas of research, including linguistics. Since the beginning of connectionism, many connectionist models have been developed to account for a wide range of important linguistic phenomena observed in monolingual research, such as speech perception, speech production, semantic representation, and early lexical development in children. Recently, the application of connectionism to bilingual research has also gathered momentum. Connectionist models are often precise in the specification of modeling parameters and flexible in the manipulation of relevant variables in the model to address relevant theoretical questions, therefore they can provide significant advantages in testing mechanisms underlying language processes.
Daniel Currie Hall
The fundamental idea underlying the use of distinctive features in phonology is the proposition that the same phonetic properties that distinguish one phoneme from another also play a crucial role in accounting for phonological patterns. Phonological rules and constraints apply to natural classes of segments, expressed in terms of features, and involve mechanisms, such as spreading or agreement, that copy distinctive features from one segment to another.
Contrastive specification builds on this by taking seriously the idea that phonological features are distinctive features. Many phonological patterns appear to be sensitive only to properties that crucially distinguish one phoneme from another, ignoring the same properties when they are redundant or predictable. For example, processes of voicing assimilation in many languages apply only to the class of obstruents, where voicing distinguishes phonemic pairs such as /t/ and /d/, and ignore sonorant consonants and vowels, which are predictably voiced. In theories of contrastive specification, features that do not serve to mark phonemic contrasts (such as [+voice] on sonorants) are omitted from underlying representations. Their phonological inertness thus follows straightforwardly from the fact that they are not present in the phonological system at the point at which the pattern applies, though the redundant features may subsequently be filled in either before or during phonetic implementation.
In order to implement a theory of contrastive specification, it is necessary to have a means of determining which features are contrastive (and should thus be specified) and which ones are redundant (and should thus be omitted). A traditional and intuitive method involves looking for minimal pairs of phonemes: if [±voice] is the only property that can distinguish /t/ from /d/, then it must be specified on them. This approach, however, often identifies too few contrastive features to distinguish the phonemes of an inventory, particularly when the phonetic space is sparsely populated. For example, in the common three-vowel inventory /i a u/, there is more than one property that could distinguish any two vowels: /i/ differs from /a/ in both place (front versus back or central) and height (high versus low), /a/ from /u/ in both height and rounding, and /u/ from /i/ in both rounding and place.
Because pairwise comparison cannot identify any features as contrastive in such cases, much recent work in contrastive specification is instead based on a hierarchical sequencing of features, with specifications assigned by dividing the full inventory into successively smaller subsets. For example, if the inventory /i a u/ is first divided according to height, then /a/ is fully distinguished from the other two vowels by virtue of being low, and the second feature, either place or rounding, is contrastive only on the high vowels. Unlike pairwise comparison, this approach produces specifications that fully distinguish the members of the underlying inventory, while at the same time allowing for the possibility of cross-linguistic variation in the specifications assigned to similar inventories.
Corpus Phonology is an approach to phonology that places corpora at the center of phonological research. Some practitioners of corpus phonology see corpora as the only object of investigation; others use corpora alongside other available techniques (for instance, intuitions, psycholinguistic and neurolinguistic experimentation, laboratory phonology, the study of the acquisition of phonology or of language pathology, etc.). Whatever version of corpus phonology one advocates, corpora have become part and parcel of the modern research environment, and their construction and exploitation has been modified by the multidisciplinary advances made within various fields. Indeed, for the study of spoken usage, the term ‘corpus’ should nowadays only be applied to bodies of data meeting certain technical requirements, even if corpora of spoken usage are by no means new and coincide with the birth of recording techniques. It is therefore essential to understand what criteria must be met by a modern corpus (quality of recordings, diversity of speech situations, ethical guidelines, time-alignment with transcriptions and annotations, etc.) and what tools are available to researchers. Once these requirements are met, the way is open to varying and possibly conflicting uses of spoken corpora by phonological practitioners. A traditional stance in theoretical phonology sees the data as a degenerate version of a more abstract underlying system, but more and more researchers within various frameworks (e.g., usage-based approaches, exemplar models, stochastic Optimality Theory, sociophonetics) are constructing models that tightly bind phonological competence to language use, rely heavily on quantitative information, and attempt to account for intra-speaker and inter-speaker variation. This renders corpora essential to phonological research and not a mere adjunct to the phonological description of the languages of the world.
Morphological defectiveness refers to situations where one or more paradigmatic forms of a lexeme are not realized, without plausible syntactic, semantic, or phonological causes. The phenomenon tends to be associated with low-frequency lexemes and loanwords. Typically, defectiveness is gradient, lexeme-specific, and sensitive to the internal structure of paradigms.
The existence of defectiveness is a challenge to acquisition models and morphological theories where there are elsewhere operations to materialize items. For this reason, defectiveness has become a rich field of research in recent years, with distinct approaches that view it as an item-specific idiosyncrasy, as an epiphenomenal result of rule competition, or as a normal morphological alternation within a paradigmatic space.
Derivational morphology is a type of word formation that creates new lexemes, either by changing syntactic category or by adding substantial new meaning (or both) to a free or bound base. Derivation may be contrasted with inflection on the one hand or with compounding on the other. The distinctions between derivation and inflection and between derivation and compounding, however, are not always clear-cut. New words may be derived by a variety of formal means including affixation, reduplication, internal modification of various sorts, subtraction, and conversion. Affixation is best attested cross-linguistically, especially prefixation and suffixation. Reduplication is also widely found, with various internal changes like ablaut and root and pattern derivation less common. Derived words may fit into a number of semantic categories. For nouns, event and result, personal and participant, collective and abstract noun are frequent. For verbs, causative and applicative categories are well-attested, as are relational and qualitative derivations for adjectives. Languages frequently also have ways of deriving negatives, relational words, and evaluatives. Most languages have derivation of some sort, although there are languages that rely more heavily on compounding than on derivation to build their lexical stock. A number of topics have dominated the theoretical literature on derivation, including productivity (the extent to which new words can be created with a given affix or morphological process), the principles that determine the ordering of affixes, and the place of derivational morphology with respect to other components of the grammar. The study of derivation has also been important in a number of psycholinguistic debates concerning the perception and production of language.
Carol A. Fowler
The theory of speech perception as direct derives from a general direct-realist account of perception. A realist stance on perception is that perceiving enables occupants of an ecological niche to know its component layouts, objects, animals, and events. “Direct” perception means that perceivers are in unmediated contact with their niche (mediated neither by internally generated representations of the environment nor by inferences made on the basis of fragmentary input to the perceptual systems). Direct perception is possible because energy arrays that have been causally structured by niche components and that are available to perceivers specify (i.e., stand in 1:1 relation to) components of the niche. Typically, perception is multi-modal; that is, perception of the environment depends on specifying information present in, or even spanning, multiple energy arrays.
Applied to speech perception, the theory begins with the observation that speech perception involves the same perceptual systems that, in a direct-realist theory, enable direct perception of the environment. Most notably, the auditory system supports speech perception, but also the visual system, and sometimes other perceptual systems. Perception of language forms (consonants, vowels, word forms) can be direct if the forms lawfully cause specifying patterning in the energy arrays available to perceivers. In Articulatory Phonology, the primitive language forms (constituting consonants and vowels) are linguistically significant gestures of the vocal tract, which cause patterning in air and on the face. Descriptions are provided of informational patterning in acoustic and other energy arrays. Evidence is next reviewed that speech perceivers make use of acoustic and cross modal information about the phonetic gestures constituting consonants and vowels to perceive the gestures.
Significant problems arise for the viability of a theory of direct perception of speech. One is the “inverse problem,” the difficulty of recovering vocal tract shapes or actions from acoustic input. Two other problems arise because speakers coarticulate when they speak. That is, they temporally overlap production of serially nearby consonants and vowels so that there are no discrete segments in the acoustic signal corresponding to the discrete consonants and vowels that talkers intend to convey (the “segmentation problem”), and there is massive context-sensitivity in acoustic (and optical and other modalities) patterning (the “invariance problem”). The present article suggests solutions to these problems.
The article also reviews signatures of a direct mode of speech perception, including that perceivers use cross-modal speech information when it is available and exhibit various indications of perception-production linkages, such as rapid imitation and a disposition to converge in dialect with interlocutors.
An underdeveloped domain within the theory concerns the very important role of longer- and shorter-term learning in speech perception. Infants develop language-specific modes of attention to acoustic speech signals (and optical information for speech), and adult listeners attune to novel dialects or foreign accents. Moreover, listeners make use of lexical knowledge and statistical properties of the language in speech perception. Some progress has been made in incorporating infant learning into a theory of direct perception of speech, but much less progress has been made in the other areas.
Dispersion Theory concerns the constraints that govern contrasts, the phonetic differences that can distinguish words in a language. Specifically it posits that there are distinctiveness constraints that favor contrasts that are more perceptually distinct over less distinct contrasts. The preference for distinct contrasts is hypothesized to follow from a preference to minimize perceptual confusion: In order to recover what a speaker is saying, a listener must identify the words in the utterance. The more confusable words are, the more likely a listener is to make errors. Because contrasts are the minimal permissible differences between words in a language, banning indistinct contrasts reduces the likelihood of misperception.
The term ‘dispersion’ refers to the separation of sounds in perceptual space that results from maximizing the perceptual distinctiveness of the contrasts between those sounds, and is adopted from Lindblom’s Theory of Adaptive Dispersion, a theory of phoneme inventories according to which inventories are selected so as to maximize the perceptual differences between phonemes. These proposals follow a long tradition of explaining cross-linguistic tendencies in the phonetic and phonological form of languages in terms of a preference for perceptually distinct contrasts.
Flemming proposes that distinctiveness constraints constitute one class of constraints in an Optimality Theoretic model of phonology. In this context, distinctiveness constraints predict several basic phenomena, the first of which is the preference for maximal dispersion in inventories of contrasting sounds that first motivated the development of the Theory of Adaptive Dispersion. But distinctiveness constraints are formulated as constraints on the surface forms of possible words that interact with other phonological constraints, so they evaluate the distinctiveness of contrasts in context. As a result, Dispersion Theory predicts that contrasts can be neutralized or enhanced in particular phonological contexts. This prediction arises because the phonetic realization of sounds depends on their context, so the perceptual differences between contrasting sounds also depend on context. If the realization of a contrast in a particular context would be insufficiently distinct (i.e., it would violate a high-ranked distinctiveness constraint), there are two options: the offending contrast can be neutralized, or it can be modified (‘enhanced’) to make it more distinct.
A basic open question regarding Dispersion Theory concerns the proper formulation of distinctiveness constraints and the extent of variation in their rankings across languages, issues that are tied up with the questions about the nature of perceptual distinctiveness. Another concerns the size and nature of the comparison set of contrasting word-forms required to be able to evaluate whether a candidate output satisfies distinctiveness constraints.
In the Early Modern English period (1500–1700), steps were taken toward Standard English, and this was also the time when Shakespeare wrote, but these perspectives are only part of the bigger picture. This chapter looks at Early Modern English as a variable and changing language not unlike English today. Standardization is found particularly in spelling, and new vocabulary was created as a result of the spread of English into various professional and occupational specializations. New research using digital corpora, dictionaries, and databases reveals the gradual nature of these processes. Ongoing developments were no less gradual in pronunciation, with processes such as the Great Vowel Shift, or in grammar, where many changes resulted in new means of expression and greater transparency. Word order was also subject to gradual change, becoming more fixed over time.
Daniel Aalto, Jarmo Malinen, and Martti Vainio
Formant frequencies are the positions of the local maxima of the power spectral envelope of a sound signal. They arise from acoustic resonances of the vocal tract air column, and they provide substantial information about both consonants and vowels. In running speech, formants are crucial in signaling the movements with respect to place of articulation. Formants are normally defined as accumulations of acoustic energy estimated from the spectral envelope of a signal. However, not all such peaks can be related to resonances in the vocal tract, as they can be caused by the acoustic properties of the environment outside the vocal tract, and sometimes resonances are not seen in the spectrum. Such formants are called spurious and latent, respectively. By analogy, spectral maxima of synthesized speech are called formants, although they arise from a digital filter. Conversely, speech processing algorithms can detect formants in natural or synthetic speech by modeling its power spectral envelope using a digital filter. Such detection is most successful for male speech with a low fundamental frequency where many harmonic overtones excite each of the vocal tract resonances that lie at higher frequencies. For the same reason, reliable formant detection from females with high pitch or children’s speech is inherently difficult, and many algorithms fail to faithfully detect the formants corresponding to the lowest vocal tract resonant frequencies.