1-20 of 20 Results

  • Keywords: speech x
Clear all

Article

Daniel Aalto, Jarmo Malinen, and Martti Vainio

Formant frequencies are the positions of the local maxima of the power spectral envelope of a sound signal. They arise from acoustic resonances of the vocal tract air column, and they provide substantial information about both consonants and vowels. In running speech, formants are crucial in signaling the movements with respect to place of articulation. Formants are normally defined as accumulations of acoustic energy estimated from the spectral envelope of a signal. However, not all such peaks can be related to resonances in the vocal tract, as they can be caused by the acoustic properties of the environment outside the vocal tract, and sometimes resonances are not seen in the spectrum. Such formants are called spurious and latent, respectively. By analogy, spectral maxima of synthesized speech are called formants, although they arise from a digital filter. Conversely, speech processing algorithms can detect formants in natural or synthetic speech by modeling its power spectral envelope using a digital filter. Such detection is most successful for male speech with a low fundamental frequency where many harmonic overtones excite each of the vocal tract resonances that lie at higher frequencies. For the same reason, reliable formant detection from females with high pitch or children’s speech is inherently difficult, and many algorithms fail to faithfully detect the formants corresponding to the lowest vocal tract resonant frequencies.

Article

The Motor Theory of Speech Perception is a proposed explanation of the fundamental relationship between the way speech is produced and the way it is perceived. Associated primarily with the work of Liberman and colleagues, it posited the active participation of the motor system in the perception of speech. Early versions of the theory contained elements that later proved untenable, such as the expectation that the neural commands to the muscles (as seen in electromyography) would be more invariant than the acoustics. Support drawn from categorical perception (in which discrimination is quite poor within linguistic categories but excellent across boundaries) was called into question by studies showing means of improving within-category discrimination and finding similar results for nonspeech sounds and for animals perceiving speech. Evidence for motor involvement in perceptual processes nonetheless continued to accrue, and related motor theories have been proposed. Neurological and neuroimaging results have yielded a great deal of evidence consistent with variants of the theory, but they highlight the issue that there is no single “motor system,” and so different components appear in different contexts. Assigning the appropriate amount of effort to the various systems that interact to result in the perception of speech is an ongoing process, but it is clear that some of the systems will reflect the motor control of speech.

Article

The tongue is composed entirely of soft tissue: muscle, fat, and connective tissue. This unusual composition and the tongue’s 3D muscle fiber orientation result in many degrees of freedom. The lack of bones and cartilage means that muscle shortening creates deformations, particularly local deformations, as the tongue moves into and out of speech gestures. The tongue is also surrounded by the hard structures of the oral cavity, which both constrain its motion and support the rapid small deformations that create speech sounds. Anatomical descriptors and categories of tongue muscles do not correlate with tongue function as speech movements use finely controlled co-contractions of antagonist muscles to move the oral structures during speech. Tongue muscle volume indicates that four muscles, the genioglossus, verticalis, transversus, and superior longitudinal, occupy the bulk of the tongue. They also comprise a functional muscle grouping that can shorten the tongue in the x, y, and z directions. Various 3D muscle shortening patterns produce large- or small-scale deformations in all directions of motion. The interdigitation of the tongue’s muscles is advantageous in allowing co-contraction of antagonist muscles and providing nimble deformational changes to move the tongue toward and away from any position.

Article

Edward Flemming

Dispersion Theory concerns the constraints that govern contrasts, the phonetic differences that can distinguish words in a language. Specifically it posits that there are distinctiveness constraints that favor contrasts that are more perceptually distinct over less distinct contrasts. The preference for distinct contrasts is hypothesized to follow from a preference to minimize perceptual confusion: In order to recover what a speaker is saying, a listener must identify the words in the utterance. The more confusable words are, the more likely a listener is to make errors. Because contrasts are the minimal permissible differences between words in a language, banning indistinct contrasts reduces the likelihood of misperception. The term ‘dispersion’ refers to the separation of sounds in perceptual space that results from maximizing the perceptual distinctiveness of the contrasts between those sounds, and is adopted from Lindblom’s Theory of Adaptive Dispersion, a theory of phoneme inventories according to which inventories are selected so as to maximize the perceptual differences between phonemes. These proposals follow a long tradition of explaining cross-linguistic tendencies in the phonetic and phonological form of languages in terms of a preference for perceptually distinct contrasts. Flemming proposes that distinctiveness constraints constitute one class of constraints in an Optimality Theoretic model of phonology. In this context, distinctiveness constraints predict several basic phenomena, the first of which is the preference for maximal dispersion in inventories of contrasting sounds that first motivated the development of the Theory of Adaptive Dispersion. But distinctiveness constraints are formulated as constraints on the surface forms of possible words that interact with other phonological constraints, so they evaluate the distinctiveness of contrasts in context. As a result, Dispersion Theory predicts that contrasts can be neutralized or enhanced in particular phonological contexts. This prediction arises because the phonetic realization of sounds depends on their context, so the perceptual differences between contrasting sounds also depend on context. If the realization of a contrast in a particular context would be insufficiently distinct (i.e., it would violate a high-ranked distinctiveness constraint), there are two options: the offending contrast can be neutralized, or it can be modified (‘enhanced’) to make it more distinct. A basic open question regarding Dispersion Theory concerns the proper formulation of distinctiveness constraints and the extent of variation in their rankings across languages, issues that are tied up with the questions about the nature of perceptual distinctiveness. Another concerns the size and nature of the comparison set of contrasting word-forms required to be able to evaluate whether a candidate output satisfies distinctiveness constraints.

Article

Speech production is an important aspect of linguistic competence. An attempt to understand linguistic morphology without speech production would be incomplete. A central research question develops from this perspective: what is the role of morphology in speech production. Speech production researchers collect many different types of data and much of that data has informed how linguists and psycholinguists characterize the role of linguistic morphology in speech production. Models of speech production play an important role in the investigation of linguistic morphology. These models provide a framework, which allows researchers to explore the role of morphology in speech production. However, models of speech production generally focus on different aspects of the production process. These models are split between phonetic models (which attempt to understand how the brain creates motor commands for uttering and articulating speech) and psycholinguistic models (which attempt to understand the cognitive processes and representation of the production process). Models that merge these two model types, phonetic and psycholinguistic models, have the potential to allow researchers the possibility to make specific predictions about the effects of morphology on speech production. Many studies have explored models of speech production, but the investigation of the role of morphology and how morphological properties may be represented in merged speech production models is limited.

Article

Ocke-Schwen Bohn

The study of second language phonetics is concerned with three broad and overlapping research areas: the characteristics of second language speech production and perception, the consequences of perceiving and producing nonnative speech sounds with a foreign accent, and the causes and factors that shape second language phonetics. Second language learners and bilinguals typically produce and perceive the sounds of a nonnative language in ways that are different from native speakers. These deviations from native norms can be attributed largely, but not exclusively, to the phonetic system of the native language. Non-nativelike speech perception and production may have both social consequences (e.g., stereotyping) and linguistic–communicative consequences (e.g., reduced intelligibility). Research on second language phonetics over the past ca. 30 years has resulted in a fairly good understanding of causes of nonnative speech production and perception, and these insights have to a large extent been driven by tests of the predictions of models of second language speech learning and of cross-language speech perception. It is generally accepted that the characteristics of second language speech are predominantly due to how second language learners map the sounds of the nonnative to the native language. This mapping cannot be entirely predicted from theoretical or acoustic comparisons of the sound systems of the languages involved, but has to be determined empirically through tests of perceptual assimilation. The most influential learner factors which shape how a second language is perceived and produced are the age of learning and the amount and quality of exposure to the second language. A very important and far-reaching finding from research on second language phonetics is that age effects are not due to neurological maturation which could result in the attrition of phonetic learning ability, but to the way phonetic categories develop as a function of experience with surrounding sound systems.

Article

Jack Sidnell

Conversation analysis is an approach to the study of social interaction and talk-in-interaction that, although rooted in the sociological study of everyday life, has exerted significant influence across the humanities and social sciences including linguistics. Drawing on recordings (both audio and video) naturalistic interaction (unscripted, non-elicited, etc.) conversation analysts attempt to describe the stable practices and underlying normative organizations of interaction by moving back and forth between the close study of singular instances and the analysis of patterns exhibited across collections of cases. Four important domains of research within conversation analysis are turn-taking, repair, action formation and ascription, and action sequencing.

Article

Paul de Lacy

Phonology has both a taxonomic/descriptive and cognitive meaning. In the taxonomic/descriptive context, it refers to speech sound systems. As a cognitive term, it refers to a part of the brain’s ability to produce and perceive speech sounds. This article focuses on research in the cognitive domain. The brain does not simply record speech sounds and “play them back.” It abstracts over speech sounds, and transforms the abstractions in nontrivial ways. Phonological cognition is about what those abstractions are, and how they are transformed in perception and production. There are many theories about phonological cognition. Some theories see it as the result of domain-general mechanisms, such as analogy over a Lexicon. Other theories locate it in an encapsulated module that is genetically specified, and has innate propositional content. In production, this module takes as its input phonological material from a Lexicon, and refers to syntactic and morphological structure in producing an output, which involves nontrivial transformation. In some theories, the output is instructions for articulator movement, which result in speech sounds; in other theories, the output goes to the Phonetic module. In perception, a continuous acoustic signal is mapped onto a phonetic representation, which is then mapped onto underlying forms via the Phonological module, which are then matched to lexical entries. Exactly which empirical phenomena phonological cognition is responsible for depends on the theory. At one extreme, it accounts for all human speech sound patterns and realization. At the other extreme, it is little more than a way of abstracting over speech sounds. In the most popular Generative conception, it explains some sound patterns, with other modules (e.g., the Lexicon and Phonetic module) accounting for others. There are many types of patterns, with names such as “assimilation,” “deletion,” and “neutralization”—a great deal of phonological research focuses on determining which patterns there are, which aspects are universal and which are language-particular, and whether/how phonological cognition is responsible for them. Phonological computation connects with other cognitive structures. In the Generative T-model, the phonological module’s input includes morphs of Lexical items along with at least some morphological and syntactic structure; the output is sent to either a Phonetic module, or directly to the neuro-motor interface, resulting in articulator movement. However, other theories propose that these modules’ computation proceeds in parallel, and that there is bidirectional communication between them. The study of phonological cognition is a young science, so many fundamental questions remain to be answered. There are currently many different theories, and theoretical diversity over the past few decades has increased rather than consolidated. In addition, new research methods have been developed and older ones have been refined, providing novel sources of evidence. Consequently, phonological research is both lively and challenging, and is likely to remain that way for some time to come.

Article

Marie K. Huffman

Articulatory phonetics is concerned with the physical mechanisms involved in producing spoken language. A fundamental goal of articulatory phonetics is to relate linguistic representations to articulator movements in real time and the consequent acoustic output that makes speech a medium for information transfer. Understanding the overall process requires an appreciation of the aerodynamic conditions necessary for sound production and the way that the various parts of the chest, neck, and head are used to produce speech. One descriptive goal of articulatory phonetics is the efficient and consistent description of the key articulatory properties that distinguish sounds used contrastively in language. There is fairly strong consensus in the field about the inventory of terms needed to achieve this goal. Despite this common, segmental, perspective, speech production is essentially dynamic in nature. Much remains to be learned about how the articulators are coordinated for production of individual sounds and how they are coordinated to produce sounds in sequence. Cutting across all of these issues is the broader question of which aspects of speech production are due to properties of the physical mechanism and which are the result of the nature of linguistic representations. A diversity of approaches is used to try to tease apart the physical and the linguistic contributions to the articulatory fabric of speech sounds in the world’s languages. A variety of instrumental techniques are currently available, and improvement in safe methods of tracking articulators in real time promises to soon bring major advances in our understanding of how speech is produced.

Article

This chapter deals with the discussion that has concerned and concerns the very concept of ‘word’. It considers different definitions which have been advanced according different theoretical positions. Thereafter, it examines various phenomena which are strictly bound to ‘word’: word compounds and multi-word expressions, word formation rules, word classes (or Parts-of-Speech), splinters, univerbation and, finally, word blendings

Article

Research on visual and audiovisual speech information has profoundly influenced the fields of psycholinguistics, perception psychology, and cognitive neuroscience. Visual speech findings have provided some of most the important human demonstrations of our new conception of the perceptual brain as being supremely multimodal. This “multisensory revolution” has seen a tremendous growth in research on how the senses integrate, cross-facilitate, and share their experience with one another. The ubiquity and apparent automaticity of multisensory speech has led many theorists to propose that the speech brain is agnostic with regard to sense modality: it might not know or care from which modality speech information comes. Instead, the speech function may act to extract supramodal informational patterns that are common in form across energy streams. Alternatively, other theorists have argued that any common information existent across the modalities is minimal and rudimentary, so that multisensory perception largely depends on the observer’s associative experience between the streams. From this perspective, the auditory stream is typically considered primary for the speech brain, with visual speech simply appended to its processing. If the utility of multisensory speech is a consequence of a supramodal informational coherence, then cross-sensory “integration” may be primarily a consequence of the informational input itself. If true, then one would expect to see evidence for integration occurring early in the perceptual process, as well in a largely complete and automatic/impenetrable manner. Alternatively, if multisensory speech perception is based on associative experience between the modal streams, then no constraints on how completely or automatically the senses integrate are dictated. There is behavioral and neurophysiological research supporting both perspectives. Much of this research is based on testing the well-known McGurk effect, in which audiovisual speech information is thought to integrate to the extent that visual information can affect what listeners report hearing. However, there is now good reason to believe that the McGurk effect is not a valid test of multisensory integration. For example, there are clear cases in which responses indicate that the effect fails, while other measures suggest that integration is actually occurring. By mistakenly conflating the McGurk effect with speech integration itself, interpretations of the completeness and automaticity of multisensory may be incorrect. Future research should use more sensitive behavioral and neurophysiological measures of cross-modal influence to examine these issues.

Article

Louise Cummings

Clinical linguistics is the branch of linguistics that applies linguistic concepts and theories to the study of language disorders. As the name suggests, clinical linguistics is a dual-facing discipline. Although the conceptual roots of this field are in linguistics, its domain of application is the vast array of clinical disorders that may compromise the use and understanding of language. Both dimensions of clinical linguistics can be addressed through an examination of specific linguistic deficits in individuals with neurodevelopmental disorders, craniofacial anomalies, adult-onset neurological impairments, psychiatric disorders, and neurodegenerative disorders. Clinical linguists are interested in the full range of linguistic deficits in these conditions, including phonetic deficits of children with cleft lip and palate, morphosyntactic errors in children with specific language impairment, and pragmatic language impairments in adults with schizophrenia. Like many applied disciplines in linguistics, clinical linguistics sits at the intersection of a number of areas. The relationship of clinical linguistics to the study of communication disorders and to speech-language pathology (speech and language therapy in the United Kingdom) are two particularly important points of intersection. Speech-language pathology is the area of clinical practice that assesses and treats children and adults with communication disorders. All language disorders restrict an individual’s ability to communicate freely with others in a range of contexts and settings. So language disorders are first and foremost communication disorders. To understand language disorders, it is useful to think of them in terms of points of breakdown on a communication cycle that tracks the progress of a linguistic utterance from its conception in the mind of a speaker to its comprehension by a hearer. This cycle permits the introduction of a number of important distinctions in language pathology, such as the distinction between a receptive and an expressive language disorder, and between a developmental and an acquired language disorder. The cycle is also a useful model with which to conceptualize a range of communication disorders other than language disorders. These other disorders, which include hearing, voice, and fluency disorders, are also relevant to clinical linguistics. Clinical linguistics draws on the conceptual resources of the full range of linguistic disciplines to describe and explain language disorders. These disciplines include phonetics, phonology, morphology, syntax, semantics, pragmatics, and discourse. Each of these linguistic disciplines contributes concepts and theories that can shed light on the nature of language disorder. A wide range of tools and approaches are used by clinical linguists and speech-language pathologists to assess, diagnose, and treat language disorders. They include the use of standardized and norm-referenced tests, communication checklists and profiles (some administered by clinicians, others by parents, teachers, and caregivers), and qualitative methods such as conversation analysis and discourse analysis. Finally, clinical linguists can contribute to debates about the nosology of language disorders. In order to do so, however, they must have an understanding of the place of language disorders in internationally recognized classification systems such as the 2013 Diagnostic and Statistical Manual of Mental Disorders (DSM-5) of the American Psychiatric Association.

Article

In the indigenous sociolinguistic systems of West Africa, an important way of expressing—and creating—social hierarchy in interaction is through intermediaries: third parties, through whom messages are relayed. The forms of mediation vary by region, by the scale of the social hierarchy, and by the ways hierarchy is locally understood. In larger-scale systems where hierarchy is elaborate, the interacting parties include a high-status person, a mediator who ranks lower, and a third person or group—perhaps another dignitary, but potentially anyone. In smaller-scale, more egalitarian societies, the (putative) interactants could include an authoritative spirit represented by a mask, the mask’s bearer, a “translator,” and an audience. In all these systems, mediated interactions may also involve distinctive registers or vocalizations. Meanwhile, the interactional structure and its characteristic ways of speaking offer tropes and resources for expressing politeness in everyday talk. In the traditions connected with precolonial kingdoms and empires, professional praise orators deliver eulogistic performances for their higher-status patrons. This role is understood as transmission—transmitting a message from the past, or from a group, or from another dignitary—more than as creating a composition from whole cloth. The transmitter amplifies and embellishes the message; he or she does not originate it. In addition to their formal public performances, these orators serve as interpreters and intermediaries between their patrons and their patrons’ visitors. Speech to the patron is relayed through the interpreter, even if the original speaker and the patron are in the same room. Social hierarchy is thus expressed as interactional distance. In the Sahel, these social hierarchies involve a division of labor, including communicative labor, in a complex system of ranked castes and orders. The praise orators, as professional experts in the arts of language and communication, are a separate, low-ranking category (known by the French term griot). Some features of griot performance style, and the contrasting—sometimes even disfluent—verbal conduct of high-ranking aristocrats, carry over into speech registers used by persons of any social category in situations evoking hierarchy (petitioning, for example). In indigenous state systems further south, professional orators are not a separate caste, and chiefs are also supposed to have verbal skills, although still using intermediaries. Special honorific registers, such as the esoteric Akan “palace speech,” are used in the chief’s court. Some politeness forms in everyday Akan usage today echo these practices. An example of a small-scale society is the Bedik (Senegal-Guinea border), among whom masked dancers serve as the visible and auditory representation of spirit beings. The mask spirits, whose speech and conduct contrasts with their bearers’ ordinary behavior, require “translators” to relay their messages to addressees. This too is mediated communication, involving a multi-party interactional structure as well as distinctive vocalizations. Linguistic repertoires in the Sahel have long included Arabic, and Islamic learning is another source of high status, coexisting with other traditional sources and sharing some interactional patterns. The European conquest brought European languages to the top of West African linguistic hierarchies, which have remained largely in place since independence.

Article

D. H. Whalen

Phonetics is the branch of linguistics that deals with the physical realization of meaningful distinctions in spoken language. Phoneticians study the anatomy and physics of sound generation, acoustic properties of the sounds of the world’s languages, the features of the signal that listeners use to perceive the message, and the brain mechanisms involved in both production and perception. Therefore, phonetics connects most directly to phonology and psycholinguistics, but it also engages a range of disciplines that are not unique to linguistics, including acoustics, physiology, biomechanics, hearing, evolution, and many others. Early theorists assumed that phonetic implementation of phonological features was universal, but it has become clear that languages differ in their phonetic spaces for phonological elements, with systematic differences in acoustics and articulation. Such language-specific details place phonetics solidly in the domain of linguistics; any complete description of a language must include its specific phonetic realization patterns. The description of what phonetic realizations are possible in human language continues to expand as more languages are described; many of the under-documented languages are endangered, lending urgency to the phonetic study of the world’s languages. Phonetic analysis can consist of transcription, acoustic analysis, measurement of speech articulators, and perceptual tests, with recent advances in brain imaging adding detail at the level of neural control and processing. Because of its dual nature as a component of a linguistic system and a set of actions in the physical world, phonetics has connections to many other branches of linguistics, including not only phonology but syntax, semantics, sociolinguistics, and clinical linguistics as well. Speech perception has been shown to integrate information from both vision and tactile sensation, indicating an embodied system. Sign language, though primarily visual, has adopted the term “phonetics” to represent the realization component, highlighting the linguistic nature both of phonetics and of sign language. Such diversity offers many avenues for studying phonetics, but it presents challenges to forming a comprehensive account of any language’s phonetic system.

Article

Kodi Weatherholtz and T. Florian Jaeger

The seeming ease with which we usually understand each other belies the complexity of the processes that underlie speech perception. One of the biggest computational challenges is that different talkers realize the same speech categories (e.g., /p/) in physically different ways. We review the mixture of processes that enable robust speech understanding across talkers despite this lack of invariance. These processes range from automatic pre-speech adjustments of the distribution of energy over acoustic frequencies (normalization) to implicit statistical learning of talker-specific properties (adaptation, perceptual recalibration) to the generalization of these patterns across groups of talkers (e.g., gender differences).

Article

Carol A. Fowler

The theory of speech perception as direct derives from a general direct-realist account of perception. A realist stance on perception is that perceiving enables occupants of an ecological niche to know its component layouts, objects, animals, and events. “Direct” perception means that perceivers are in unmediated contact with their niche (mediated neither by internally generated representations of the environment nor by inferences made on the basis of fragmentary input to the perceptual systems). Direct perception is possible because energy arrays that have been causally structured by niche components and that are available to perceivers specify (i.e., stand in 1:1 relation to) components of the niche. Typically, perception is multi-modal; that is, perception of the environment depends on specifying information present in, or even spanning, multiple energy arrays. Applied to speech perception, the theory begins with the observation that speech perception involves the same perceptual systems that, in a direct-realist theory, enable direct perception of the environment. Most notably, the auditory system supports speech perception, but also the visual system, and sometimes other perceptual systems. Perception of language forms (consonants, vowels, word forms) can be direct if the forms lawfully cause specifying patterning in the energy arrays available to perceivers. In Articulatory Phonology, the primitive language forms (constituting consonants and vowels) are linguistically significant gestures of the vocal tract, which cause patterning in air and on the face. Descriptions are provided of informational patterning in acoustic and other energy arrays. Evidence is next reviewed that speech perceivers make use of acoustic and cross modal information about the phonetic gestures constituting consonants and vowels to perceive the gestures. Significant problems arise for the viability of a theory of direct perception of speech. One is the “inverse problem,” the difficulty of recovering vocal tract shapes or actions from acoustic input. Two other problems arise because speakers coarticulate when they speak. That is, they temporally overlap production of serially nearby consonants and vowels so that there are no discrete segments in the acoustic signal corresponding to the discrete consonants and vowels that talkers intend to convey (the “segmentation problem”), and there is massive context-sensitivity in acoustic (and optical and other modalities) patterning (the “invariance problem”). The present article suggests solutions to these problems. The article also reviews signatures of a direct mode of speech perception, including that perceivers use cross-modal speech information when it is available and exhibit various indications of perception-production linkages, such as rapid imitation and a disposition to converge in dialect with interlocutors. An underdeveloped domain within the theory concerns the very important role of longer- and shorter-term learning in speech perception. Infants develop language-specific modes of attention to acoustic speech signals (and optical information for speech), and adult listeners attune to novel dialects or foreign accents. Moreover, listeners make use of lexical knowledge and statistical properties of the language in speech perception. Some progress has been made in incorporating infant learning into a theory of direct perception of speech, but much less progress has been made in the other areas.

Article

Mitchell Green

Speech acts are acts that can, but need not, be carried out by saying and meaning that one is doing so. Many view speech acts as the central units of communication, with phonological, morphological, syntactic, and semantic properties of an utterance serving as ways of identifying whether the speaker is making a promise, a prediction, a statement, or a threat. Some speech acts are momentous, since an appropriate authority can, for instance, declare war or sentence a defendant to prison, by saying that he or she is doing so. Speech acts are typically analyzed into two distinct components: a content dimension (corresponding to what is being said), and a force dimension (corresponding to how what is being said is being expressed). The grammatical mood of the sentence used in a speech act signals, but does not uniquely determine, the force of the speech act being performed. A special type of speech act is the performative, which makes explicit the force of the utterance. Although it has been famously claimed that performatives such as “I promise to be there on time” are neither true nor false, current scholarly consensus rejects this view. The study of so-called infelicities concerns the ways in which speech acts might either be defective (say by being insincere) or fail completely. Recent theorizing about speech acts tends to fall either into conventionalist or intentionalist traditions: the former sees speech acts as analogous to moves in a game, with such acts being governed by rules of the form “doing A counts as doing B”; the latter eschews game-like rules and instead sees speech acts as governed by communicative intentions only. Debate also arises over the extent to which speakers can perform one speech act indirectly by performing another. Skeptics about the frequency of such events contend that many alleged indirect speech acts should be seen instead as expressions of attitudes. New developments in speech act theory also situate them in larger conversational frameworks, such as inquiries, debates, or deliberations made in the course of planning. In addition, recent scholarship has identified a type of oppression against under-represented groups as occurring through “silencing”: a speaker attempts to use a speech act to protect her autonomy, but the putative act fails due to her unjust milieu.

Article

Marianne Pouplier

One of the most fundamental problems in research on spoken language is to understand how the categorical, systemic knowledge that speakers have in the form of a phonological grammar maps onto the continuous, high-dimensional physical speech act that transmits the linguistic message. The invariant units of phonological analysis have no invariant analogue in the signal—any given phoneme can manifest itself in many possible variants, depending on context, speech rate, utterance position and the like, and the acoustic cues for a given phoneme are spread out over time across multiple linguistic units. Speakers and listeners are highly knowledgeable about the lawfully structured variation in the signal and they skillfully exploit articulatory and acoustic trading relations when speaking and perceiving. For the scientific description of spoken language understanding this association between abstract, discrete categories and continuous speech dynamics remains a formidable challenge. Articulatory Phonology and the associated Task Dynamic model present one particular proposal on how to step up to this challenge using the mathematics of dynamical systems with the central insight being that spoken language is fundamentally based on the production and perception of linguistically defined patterns of motion. In Articulatory Phonology, primitive units of phonological representation are called gestures. Gestures are defined based on linear second order differential equations, giving them inherent spatial and temporal specifications. Gestures control the vocal tract at a macroscopic level, harnessing the many degrees of freedom in the vocal tract into low-dimensional control units. Phonology, in this model, thus directly governs the spatial and temporal orchestration of vocal tract actions.

Article

Adrian P. Simpson and Melanie Weirich

Speech carries a wealth of information about the speaker aside from any verbal message ranging from emotional state (sad, happy, bored, etc.) to illness (e.g., cold). Central features are a speaker’s gender and their sexual orientation. In part this is an inevitable product of differences in speakers’ anatomical dimensions, for example on average males have lower pitched voices than females due to longer, thicker vocal cords that vibrate more slowly. Arguably much more information has been learned by a speaker as they construct their gender or identify with a particular sexual orientation. Differences in speech already begin in young children, before any marked gender-related anatomical differences develop, emphasizing the importance of behavioral patterns. Gender, gender identity, and sexual orientation are encoded in speech in a range of different phonetic parameters relating to both phonation (activity of the vocal folds) and articulation (dimensions and configuration of the supraglottal cavities), as well as the use of pitch patterns and differences in voice quality (the way in which the vocal folds vibrate). Differences in the size and configuration of the supraglottal cavities give rise to differences in the size of the acoustic vowel space as well as subtle differences in the production of individual sounds, such as the sibilant [s]. Furthermore, significant and systematic gender-specific differences have been found in the average duration of utterances and individual sounds, which in turn have been found to have a complex relationship to the perception of tempo.

Article

The term “part of speech” is a traditional one that has been in use since grammars of Classical Greek (e.g., Dionysius Thrax) and Latin were compiled; for all practical purposes, it is synonymous with the term “word class.” The term refers to a system of word classes, whereby class membership depends on similar syntactic distribution and morphological similarity (as well as, in a limited fashion, on similarity in meaning—a point to which we shall return). By “morphological similarity,” reference is made to functional morphemes that are part of words belonging to the same word class. Some examples for both criteria follow: The fact that in English, nouns can be preceded by a determiner such as an article (e.g., a book, the apple) illustrates syntactic distribution. Morphological similarity among members of a given word class can be illustrated by the many adverbs in English that are derived by attaching the suffix –ly, that is, a functional morpheme, to an adjective (quick, quick-ly). A morphological test for nouns in English and many other languages is whether they can bear plural morphemes. Verbs can bear morphology for tense, aspect, and mood, as well as voice morphemes such as passive, causative, or reflexive, that is, morphemes that alter the argument structure of the verbal root. Adjectives typically co-occur with either bound or free morphemes that function as comparative and superlative markers. Syntactically, they modify nouns, while adverbs modify word classes that are not nouns—for example, verbs and adjectives. Most traditional and descriptive approaches to parts of speech draw a distinction between major and minor word classes. The four parts of speech just mentioned—nouns, verbs, adjectives, and adverbs—constitute the major word classes, while a number of others, for example, adpositions, pronouns, conjunctions, determiners, and interjections, make up the minor word classes. Under some approaches, pronouns are included in the class of nouns, as a subclass. While the minor classes are probably not universal, (most of) the major classes are. It is largely assumed that nouns, verbs, and probably also adjectives are universal parts of speech. Adverbs might not constitute a universal word class. There are technical terms that are equivalents to the terms of major versus minor word class, such as content versus function words, lexical versus functional categories, and open versus closed classes, respectively. However, these correspondences might not always be one-to-one. More recent approaches to word classes don’t recognize adverbs as belonging to the major classes; instead, adpositions are candidates for this status under some of these accounts, for example, as in Jackendoff (1977). Under some other theoretical accounts, such as Chomsky (1981) and Baker (2003), only the three word classes noun, verb, and adjective are major or lexical categories. All of the accounts just mentioned are based on binary distinctive features; however, the features used differ from each other. While Chomsky uses the two category features [N] and [V], Jackendoff uses the features [Subj] and [Obj], among others, focusing on the ability of nouns, verbs, adjectives, and adpositions to take (directly, without the help of other elements) subjects (thus characterizing verbs and nouns) or objects (thus characterizing verbs and adpositions). Baker (2003), too, uses the property of taking subjects, but attributes it only to verbs. In his approach, the distinctive feature of bearing a referential index characterizes nouns, and only those. Adjectives are characterized by the absence of both of these distinctive features. Another important issue addressed by theoretical studies on lexical categories is whether those categories are formed pre-syntactically, in a morphological component of the lexicon, or whether they are constructed in the syntax or post-syntactically. Jackendoff (1977) is an example of a lexicalist approach to lexical categories, while Marantz (1997), and Borer (2003, 2005a, 2005b, 2013) represent an account where the roots of words are category-neutral, and where their membership to a particular lexical category is determined by their local syntactic context. Baker (2003) offers an account that combines properties of both approaches: words are built in the syntax and not pre-syntactically; however, roots do have category features that are inherent to them. There are empirical phenomena, such as phrasal affixation, phrasal compounding, and suspended affixation, that strongly suggest that a post-syntactic morphological component should be allowed, whereby “syntax feeds morphology.”