Allomorphy and syncretism are both deviations from the one-to-one relationship between form and meaning inside the linguistic sign as postulated by Saussure as well as from the ideal of inflectional morphology as stipulated in the canonical approach by Corbett. Instances of both phenomena are well documented in all Romance languages. In inflection, allomorphy refers to the use of more than one root/stem in the paradigm of a single lexeme or to the existence of more than one inflectional affix for the same function. Syncretism describes the existence of identical forms with different functions in one and the same paradigm.
Verbs exhibiting stem allomorphy are traditionally called irregular, a label that describes the existence of unexpected and, sometimes, unpredictable forms from a learner’s perspective. Extreme forms of allomorphy are called suppletion, for which traditional accounts require two or more etymologically unrelated roots/stems to coexist within the paradigm of a single lexeme. Allomorphy often originates in sound change affecting only stems in a certain phonological environment. When the phonological conditioning of the stem allomorph disappears, which is frequently the case, its distribution within the paradigm may become purely morphological, thus constituting a morphome in the sense of Aronoff.
Recurrent patterns of syncretism may also be considered morphomes. Whereas syncretism was quite rare in Latin verb morphology, Romance languages feature it to much greater, if different, degrees. In extreme cases, syncretism patterns become paradigm-structuring in many Gallo-Romance varieties, as is the case in the verb morphology of standard French, where almost all forms are syncretic with at least one other.
Article
Allomorphy and Syncretism in the Romance Languages
Marc-Olivier Hinzelin
Article
Case Markers in Indo-Aryan
Miriam Butt
Indo-Aryan languages have the longest documented historical record, with the earliest attested texts going back to 1900 bce. Old Indo-Aryan (Vedic, Sanskrit) had an inflectional case-marking system where nominatives functioned as subjects. Objects could be realized via several different case markers (depending on semantic and structural factors), but not the nominative. This inflectional system was lost over the course of several centuries during Middle Indo-Aryan, resulting in just a nominative–oblique inflectional distinction. The New Indo-Aryan languages innovated case markers and developed new case-marking systems. Like in Old Indo-Aryan, case is systematically used to express semantic differences via differential object marking constructions. However, unlike in Old Indo-Aryan, many of the New Indo-Aryan languages are ergative and all allow for non-nominative subjects, most prominently for experiencer subjects. Objects, on the other hand, can now also be unmarked (nominative), usually participating in differential object marking. The case-marking patterns within New Indo-Aryan and across time have given rise to a number of debates and analyses. The most prominent of these include issues of case alignment and language change, the distribution of ergative vs. accusative vs. nominative case, and discussions of markedness and differential case marking.
Article
Valency in the Romance Languages
Steffen Heidinger
The notion of valency describes the property of verbs to open argument positions in a sentence (e.g., the verb eat opens two argument positions, filled in the sentence John ate the cake by the subject John and the direct object the cake). Depending on the number of arguments, a verb is avalent (no argument), monovalent (one argument), bivalent (two arguments), or trivalent (three arguments).
In Romance languages, verbs are often labile (i.e., they occur in more than one valency pattern without any formal change on the verb). For example, the (European and Brazilian) Portuguese verb adoecer ‘get sick’/‘make sick’ can be used both as a monovalent and a bivalent verb (O bebê adoeceu ‘The baby got sick’ vs. O tempo frio adoeceu o bebê ‘The cold weather made the baby sick’). However, labile verbs are not equally important in all Romance languages. Taking the causative–anticausative alternation as an example, labile verbs are used more frequently in the encoding of the alternation in Portuguese and Italian than in Catalan and Spanish (the latter languages frequently recur to an encoding with a reflexively marked anticausative verb (e.g., Spanish romperse ‘break’).
Romance languages possess various formal means to signal that a given constituent is an argument: word order, flagging the argument (by means of morphological case and, more importantly, prepositional marking), and indexing the argument on the verb (by means of morphological agreement or clitic pronouns). Again, Romance languages show variation with respect to the use of these formal means. For example, prepositional marking is much more frequent than morphological case marking on nouns (the latter being only found in Romanian).
Article
Computational Models of Morphological Learning
Jordan Kodner
A computational learner needs three things: Data to learn from, a class of representations to acquire, and a way to get from one to the other. Language acquisition is a very particular learning setting that can be defined in terms of the input (the child’s early linguistic experience) and the output (a grammar capable of generating a language very similar to the input). The input is infamously impoverished. As it relates to morphology, the vast majority of potential forms are never attested in the input, and those that are attested follow an extremely skewed frequency distribution. Learners nevertheless manage to acquire most details of their native morphologies after only a few years of input. That said, acquisition is not instantaneous nor is it error-free. Children do make mistakes, and they do so in predictable ways which provide insights into their grammars and learning processes.
The most elucidating computational model of morphology learning from the perspective of a linguist is one that learns morphology like a child does, that is, on child-like input and along a child-like developmental path. This article focuses on clarifying those aspects of morphology acquisition that should go into such an elucidating a computational model. Section 1 describes the input with a focus on child-directed speech corpora and input sparsity. Section 2 discusses representations with focuses on productivity, developmental paths, and formal learnability. Section 3 surveys the range of learning tasks that guide research in computational linguistics and NLP with special focus on how they relate to the acquisition setting. The conclusion in Section 4 presents a summary of morphology acquisition as a learning problem with Table 4 highlighting the key takeaways of this article.
Article
Morphologically ‘Autonomous’ Structures in the Romance Languages
Paul O'Neill
This contribution analyses morphologically autonomous structures within the context of the Romance languages, the family of languages which, along with Latin, have most served as an evidence base for these structures. Autonomous morphological structures are defined as an abstract representation of paradigmatic cells which form a cohesive group and reliably share exponents with each other, and the forms which realize them, are thus to a large extent interpredictable. In this contribution, I restrict my discussion to the most canonical type of these structures and those which have sparked the most controversy in the linguistic literature. I analyze this controversy and suggest that it is due to (a) their overlapping meaning with the term morphome, a concept which embodies an empirical claim about all morphology and (b) the controversy surrounding what morphology actually is and the basic units of morphological analysis and storage. I make a distinction between abstractive and constructive models of morphology and suggest that historical tendencies within the latter encourage scholars to view morphologically autonomous structures either as not synchronically relevant or as phonologically or semantically derivable due to their theoretical assumptions about the nature of language and the mental storage of words. These assumptions constitute the horizons of intelligibility of such models regarding the functioning of language and its governing principles, including outdated ideas of the capacity of mental storage. Unfortunately, however, the different theories furnish scholars with an expansive array of devices through which they can seemingly explain away the synchronic generalizations of the data while relegating the most recalcitrant data to the domain of memorized forms which are not relevant to the grammar. I present evidence in favor of the psychological reality of morphologically autonomous structures in diachrony and I argue that synchronically, these structures are necessary to explain the distribution of the data and capture the fact that speakers do not memorize every inflectional form of a paradigm but rely on patterns of predictability and implicational relationships between forms. It is my suggestion that morphologically autonomous structures encourage a revaluation of the basic units of memorization and the structure of the lexicon in accordance with abstractive theories of morphology.
Article
Catalan
Francisco Ordóñez
Catalan is a “medium-sized” Romance language spoken by over 10 million speakers, spread over four nation states: Northeastern Spain, Andorra, Southern France, and the city of L’Alguer (Alghero) in Sardinia, Italy. Catalan is divided into two primary dialectal divisions, each with further subvarieties: Western Catalan (Western Catalonia, Eastern Aragon, and Valencian Community) and Eastern Catalan (center and east of Catalonia, Balearic Islands, Rosselló, and l’Alguer).
Catalan descends from Vulgar Latin. Catalan expanded during medieval times as one of the primary vernacular languages of the Kingdom of Aragon. It largely retained its role in government and society until the War of Spanish Succession in 1714, and since it has been minoritized. Catalan was finally standardized during the beginning of the 20th century, although later during the Franco dictatorship it was banned in public spaces. The situation changed with the new Spanish Constitution promulgated in 1978, when Catalan was declared co-official with Spanish in Catalonia, the Valencian Community, and the Balearic Islands.
The Latin vowel system evolved in Catalan into a system of seven stressed vowels. As in most other Iberian Romance languages, there is a general process of spirantization or lenition of voiced stops. Catalan has a two-gender grammatical system and, as in other Western Romance languages, plurals end in -s; Catalan has a personal article and Balearic Catalan has a two-determiner system for common nouns. Finally, past perfective actions are indicated by a compound tense consisting of the auxiliary verb anar ‘to go’ in present tense plus the infinitive.
Catalan is a minoritized language everywhere it is spoken, except in the microstate of Andorra, and it is endangered in France and l’Alguer. The revival of Catalan in the post-dictatorship era is connected with a movement called linguistic normalization. The idea of normalization refers to the aim to return Catalan to a “normal” use at an official level and everyday level as any official language.
Article
Lexical Representations in Language Processing
Gary Libben
Words are the backbone of language activity. An average 20-year-old native speaker of English will have a vocabulary of about 42,000 words. These words are connected with one another within the larger network of lexical knowledge that is termed the mental lexicon. The metaphor of a mental lexicon has played a central role in the development of theories of language and mind and has provided an intellectual meeting ground for psychologists, neurolinguists, and psycholinguists. Research on the mental lexicon has shown that lexical knowledge is not static. New words are acquired throughout the life span, creating very large increases in the richness of connectivity within the lexical system and changing the system as a whole. Because most people in the world speak more than one language, the default mental lexicon may be a multilingual one. Such a mental lexicon differs substantially from a lexicon of an individual language and would lead to the creation of new integrated lexical systems due to the pressure on the system to organize and access lexical knowledge in a homogenous manner. The mental lexicon contains both word knowledge and morphological knowledge. There is also evidence that it contains multiword strings such as idioms and lexical bundles. This speaks in support of a nonrestrictive “big tent” view of units of representation within the mental lexicon. Changes in research on lexical representations in language processing have emphasized lexical action and the role of learning. Although the metaphor of words as distinct representations within a lexical store has served to advance knowledge, it is more likely that words are best seen as networks of activity that are formed and affected by experience and learning throughout the life span.
Article
Nominalizations in the Romance Languages
Antonio Fábregas and Rafael Marín
The term nominalization refers to a specific type of category-changing morphological operation that produces nouns from other lexical categories, most productively verbs and adjectives. By extension, it is also used to refer to the resulting derived nouns. In Romance languages, nominalization generally involves addition of a suffix to the base (cf. Italian generoso ‘generous’ > generos-ità ‘generosity’), and such suffixes are called nominalizers. However there are also cases of nouns built from other categories without any overt nominalizer (cf. Spanish inútil ‘useless’ > inútil ‘useless person’); descriptively, this process is called conversion, and it is debatable whether it should also be treated as a nominalization or whether another different kind of morphological operation is involved here.
Nominalizations can be divided in several classes depending on a variety of semantic and syntactic factors, such as the type of entities that they denote or the ability to introduce arguments. The main nominalization classes are (a) complex event nominalizations, which come from verbs, can combine with some temporal and aspectual modifiers, and have the ability to introduce at least an internal argument; (b) state nominalizations, which denote states associated to the verbs that serve as their bases; (c) participant nominalizations, which denote different types of arguments of the base, such as agents, resulting objects, locations or recipients; and (d) quality nominalizations, coming from adjectives and more restrictively from verbs, which denote a set of properties related to their base. Different classes of predicates select for different nominalization types, and there is a debate surrounding which tests capture in a more complete way the nuances of this taxonomy.
Nominalizers impose different types of restrictions to their bases: aspectual restrictions (individual-level vs. stage-level, (a) telicity, dynamicity, etc.), argument structure restrictions (agent vs. nonagent, different types of internal arguments), morphological restrictions (for instance, selecting only verbs that belong to a particular conjugation class), and finally conceptual restrictions (for instance, showing a strong preference for bases belonging to a particular conceptual domain).
In Romance languages, nominalizations sometimes alternate with other word classes, most significantly infinitives (see article on “Infinitival Clauses in the Romance Languages” in this encyclopedia). Infinitival constructions in Romance can display a mixture of verbal and nominal properties, or be totally recategorized as nouns, and in both cases they can compete with prototypical nominalizations. Less generally, participles (see article on “Participial Relative Clauses” in this encyclopedia), gerunds and supines can also display nominalization properties in some Romance varieties.
Article
Peculiarities of Portuguese Word-Formation
Graça Rio-Torto
Portuguese shares major word-formation mechanisms—affixation, composition, conversion, blending, clipping—with Romance languages, but also displays some peculiarities related to different Latin, Celtiberian, Germanic, and Mozarabic lexical heritages and to the internal dynamics of the language from the 12th to the 21st century. Portuguese has preserved the core of the medieval word-formation framework, but new patterns were of course introduced from time to time, especially during the 20th century. Portuguese word-formation peculiarities are partly conservative, partly innovative; some comply with international trends of word-formation, others depart from them. The proliferation of Neo-Latin compounding and the increase of blending, as well as the introduction of phenomena such as clipping, reanalysis, and grammaticalization illustrate the convergence of modern Portuguese with international word-formation tendencies. In Portuguese, as in other languages, learned suffixes tend to be less productive than the corresponding nonlearned ones coexisting with them. However, in specific cases such as gentilic adjectives/nouns, a learned suffix like -ense could also win over its nonlearned rival (in this case, Pt. -ês/-esa), while in Italian the nonlearned suffix -ese prevails.
Apart from peculiar phonological outcomes of some Latin suffixes and the greater weight of interfixation due to phonological and prosodic conditions, the major distinctive traits of Portuguese word-formation include: (a) the unique distribution of the major evaluative suffixes, grounded in subjective/attitudinal values; (b) the subjective meanings associated with several suffixes that are not found in the corresponding suffixes of other Romance languages; (c) the specific set of suffixal resources for forming agentive and instrumental deverbal nouns; and (d) the expansion of the categorial bases selected by some suffixes.
Article
Secondary Predication in the Romance Languages
Steffen Heidinger
A secondary predicate is a nonverbal predicate which is typically optional and which shares its argument with the sentence’s main verb (e.g., cansada ‘tired’ in Portuguese Ela chega cansada ‘She arrives tired’). A basic distinction within the class of adjunct secondary predicates is that between depictives and resultatives. Depictives, such as cansada in the Portuguese example, describe the state of an argument during the event denoted by the verb. Typically, Romance depictives morphologically agree with their argument in gender and number (as in the case of cansada). Resultatives, such as flat in John hammered the metal flat, describe the state of an argument which results from the event denoted by the verb. Resultatives come in different types, and the strong resultatives, such as flat in the English example, are missing in Romance languages. Although strong resultatives are missing, Romance languages possess other constructions which express a sense of resultativity: spurious resultatives, where the verb and the resultative predicate are linked because the manner of carrying out the action denoted by the verb leads to a particular resultant state (e.g., Italian Mia figlia ha cucito la gonna troppo stretta ‘My daughter sewed the skirt too tight’), and to a much lesser extent weak resultatives, where the meaning of the verb and the meaning of the resultative predicate are related (the resultative predicate specifies a state that is already contained in the verb’s meaning, e.g., French Marie s’est teint les cheveux noirs ‘Marie dyed her hair black’). In Romance languages the distinction between participant-oriented secondary predicates and event-oriented adjectival adverbs is not always clear. On the formal side, the distinction is blurred when (a) adjectival adverbs exhibit morphological agreement (despite their event orientation) or (b) secondary predicates do not agree with the argument they predicate over. On the semantic side, one and the same string may be open to interpretation as a secondary predicate or as an adjectival adverb (e.g., Spanish Pedro gritó colérico ‘Pedro screamed furious/furiously’).