Four types of English exist in Africa, identifiable in terms of history, functions, and linguistic characteristics. West African Pidgin English has a history going back to the 15th century, 400 years before formal colonization. Creole varieties of English have a history going back to repatriation of slaves from the Caribbean and the United States in the 19th century. Second language varieties, which are the most widespread on the continent, are prototypically associated with British colonization and its education systems. L1 (First language) English occurred mostly in Southern and East Africa, and is best represented in South Africa. The latter shows significant similarities with the other major Southern Hemisphere varieties of English in Australia and New Zealand. All four subgroups of English are growing in numbers.
“Altaic” is a common term applied by linguists to a number of language families, spread across Central Asia and the Far East and sharing a large, most likely non-coincidental, number of structural and morphemic similarities. At the onset of Altaic studies, these similarities were ascribed to the one-time existence of an ancestral language—“Proto-Altaic,” from which all these families are descended; circumstantial evidence and glottochronological calculations tentatively date this language to some time around the 6th–7th millennium
The debate over the nature of the relationship between the various units that constitute “Altaic,” sometimes referred to as “the Altaic controversy,” has been one of the most hotly debated topics in 20th-century historical linguistics and a major focal point of studies dealing with the prehistory of Central and East Eurasia. Supporters of “Proto-Altaic,” commonly known as “(pro-)Altaicists,” claim that only divergence from an original common ancestor can account for the observed regular phonetic correspondences and other structural similarities, whereas “anti-Altaicists,” without denying the existence of such similarities, insist that they do not belong to the “core” layers of the respective languages and are therefore better explained as results of lexical borrowing and other forms of areal linguistic contact.
As a rule, “pro-Altaicists” claim that “Proto-Altaic” is as reconstructible by means of the classic comparative method as any uncontroversial linguistic family; in support of this view, they have produced several attempts to assemble large bodies of etymological evidence for the hypothesis, backed by systems of regular phonetic correspondences between compared languages. All of these, however, have been heavily criticized by “anti-Altaicists” for lack of methodological rigor, implausibility of proposed phonetic and/or semantic changes, and confusion of recent borrowings with items allegedly inherited from a common ancestor. Despite the validity of many of these objections, it remains unclear whether they are sufficient to completely discredit the hypothesis of a genetic connection between the various branches of “Altaic,” which continues to be actively supported by a small, but stable scholarly minority.
Analogy is traditionally regarded as one of the three main factors responsible for language change, along with sound change and borrowing. Whereas sound change is understood to be phonetically motivated and blind to structural patterns and semantic and functional relationships, analogy is licensed precisely by those patterns and relationships. In the Neogrammarian tradition, analogical change is regarded, at least largely, as a by-product of the normal operation (acquisition, representation, and use) of the mental grammar. Historical linguists commonly use proportional equations of the form A : B = C : X to represent analogical innovations, where A, B, and C are (sets of) word forms known to the innovator, who solves for X by discerning a formal relationship between A and B and then deductively arriving at a form that is related to C in the same way that B is related to A.
Along with the core type of analogical change captured by proportional equations, most historical linguists include a number of other phenomena under the analogy umbrella. Some of these, such as paradigm leveling—the reduction or elimination of stem alternations in paradigms—are arguably largely proportional, but others such as contamination and folk etymology seem to have less to do with the normal operation of the mental grammar and instead involve some kind of interference among the mental representations of phonetically or semantically similar forms.
The Neogrammarian approach to analogical change has been criticized and challenged on a variety of grounds, and a number of important scholars use the term “analogy” in a rather different sense, to refer to the role that phonological and/or semantic similarity play in the influence that forms exert on each other.
Since the start of the Islamic conquest of the Maghreb in the 7th century
Linguistic influence is found on all levels: phonology, morphology, syntax, and lexicon. In those cases where only innovative patterns are shared between the two language groups, it is often difficult to make out where the innovation started; thus the great similarities in syllable structure between Maghrebian Arabic and northern Berber are the result of innovations within both language families, and it is difficult to tell where it started. Morphological influence seems to be mediated exclusively by lexical borrowing. Especially in Berber, this has led to parallel systems in the morphology, where native words always have native morphology, while loans either have nativized morphology or retain Arabic-like patterns. In the lexicon, it is especially Berber that takes over scores of loanwords from Arabic, amounting in one case to over one-third of the basic lexicon as defined by 100-word lists.
Definition of the copula as a discrete grammatical category is problematic. It is the semantically unmarked copulas (simple equivalents of the English verb ‘to be’) which deserve most attention in a comparison of the Romance languages; they have a typically suppletive historical morphology and are often the result of the grammaticalization of full lexical verbs, the point at which true unmarked copular status is achieved being sometimes difficult to identify. The unmarked copulas of the Ibero-Romance languages (Spanish, Portuguese, and Catalan) have the most complex distribution and have proved the most difficult to account for synchronically.
The situation in the early 21st century is the consequence of a progressive encroachment of the reflexes of Latin stare and other verbs on the functions of Latin esse (reference is made to the Classical Latin form esse for convenience; however, the Romance paradigms must be taken to derive from a Vulgar Latin form *essĕre, into which other verbs, notably sedēre ‘to sit’ in the case of the Ibero-Romance languages, were suppletively incorporated). The contrastive study of the development of cognate copular verbs in closely related languages needs closer attention in regard to the identification of the parameters of copula choice with adjectival complements.
In the Early Modern English period (1500–1700), steps were taken toward Standard English, and this was also the time when Shakespeare wrote, but these perspectives are only part of the bigger picture. This chapter looks at Early Modern English as a variable and changing language not unlike English today. Standardization is found particularly in spelling, and new vocabulary was created as a result of the spread of English into various professional and occupational specializations. New research using digital corpora, dictionaries, and databases reveals the gradual nature of these processes. Ongoing developments were no less gradual in pronunciation, with processes such as the Great Vowel Shift, or in grammar, where many changes resulted in new means of expression and greater transparency. Word order was also subject to gradual change, becoming more fixed over time.
Éva Buchi and Steven N. Dworkin
Etymology is the only linguistic subdiscipline that is uniquely historical in its study of the relevant linguistic data and one of the oldest fields in Romance linguistics.
The concept of etymology as practiced by Romanists has changed over the last 100 years. At the outset, Romance etymologists took as their brief the search for and identification of individual word origins. Starting in the early 20th century, various specialists began to view etymology as the preparation of the complete history of all facets of the evolution over time and space of the words or lexical families being studied. Identification of the underlying base was only the first step in the process. From this perspective, etymology constitutes an essential element of diachronic lexicology, which covers all formal, semantic, and syntactic facets of a word’s evolution, including, if appropriate, the circumstances leading to its demise and replacement.
John E. Joseph
Ferdinand de Saussure (1857–1913), the founding figure of modern linguistics, made his mark on the field with a book he published a month after his 21st birthday, in which he proposed a radical rethinking of the original system of vowels in Proto-Indo-European. A year later, he submitted his doctoral thesis on a morpho-syntactic topic, the genitive absolute in Sanskrit, to the University of Leipzig. He went to Paris intending to do a second, French doctorate, but instead he was given responsibility for courses on Gothic and Old High Gerrman at the École Pratique des Hautes Études, and for managing the publications of the Société de Linguistique de Paris. He abandoned more than one large publication project of his own during the decade he spent in Paris. In 1891 he returned to his native Geneva, where the University created a chair in Sanskrit and the history and comparison of languages for him. He produced some significant work on Lithuanian during this period, connected to his early book on the Indo-European vowel system, and yielding Saussure’s Law, concerning the placement of stress in Lithuanian. He undertook writing projects about the general nature of language, but again abandoned them. In 1907, 1908–1909, and 1910–1911, he gave three courses in general linguistics at the University of Geneva, in which he developed an approach to languages as systems of signs, each sign consisting of a signifier (sound pattern) and a signified (concept), both of them mental rather than physical in nature, and conjoined arbitrarily and inseparably. The socially shared language system, or langue, makes possible the production and comprehension of parole, utterances, by individual speakers and hearers. Each signifier and signified is a value generated by its difference from all the other signifiers or signifieds with which it coexists on an associative (or paradigmatic) axis, and affected as well by its syntagmatic axis. Shortly after Saussure’s death at 55, two of his colleagues, Bally and Sechehaye, gathered together students’ notes from the three courses, as well as manuscript notes by Saussure, and from them constructed the Cours de linguistique générale, published in 1916. Over the course of the next several decades, this book became the basis for the structuralist approach, initially within linguistics, and later adapted to other fields. Saussure left behind a large quantity of manuscript material that has gradually been published over the last few decades, and continues to be published, shedding new light on his thought.
This is an advance summary of a forthcoming article in the Oxford Research Encyclopedia of Linguistics. Please check back later for the full article.
In a special focus-predicate concord construction (kakari musubi), specific focus particles called kakari joshi correlate with particular predicate conjugational endings, or musubi, other than regular finite forms, creating special illocutionary effects such as emphatic assertion or question. In Old Japanese (OJ), a particle ka, s(/z)ö, ya, or namu triggers an adnominal ending, while kösö calls for a realis ending. In Old Okinawan (OOk), ga or du prompts an adnominal ending, while sɨ associates with realis endings. Kakari musubi existed in proto-Japonic but died out in the Japanese branch; however, it is still preserved in its sister branch, Ryukyuan, in the Okinawan language.
This concord phenomenon, observed in only a few languages of the world, presents diverse issues concerning its evolution from origin to demise, the functional and semantic differences of its kakari particles (e.g., question-forming OJ ka vs. ya), and positional (sentence-medial vs. sentence-final) contrast. Furthermore, kakari musubi bears relevance to syntactic constructions such as clefts and nominalizations. Last, some kakari particles stemming from demonstratives offer worthy data for theory construction in grammaticalization or iconicity. Because of its far-reaching relevance, the construction has garnered attention from both formal and functional schools of linguistics.
Different methods exist for classifying languages, depending on whether the task is to work out the relations among languages already known to be related—internal language classification—or whether the task is to establish that certain languages are related—external language classification.
The comparative method in historical linguistics, developed during the latter part of the 19th century, represents one method for internal language classification; lexicostatistics, developed during the 1950s, represents another. Elements of lexicostatistics have been transformed and carried over into modern computational linguistic phylogenetics, and currently efforts are also being made to automate the comparative method. Recent years have seen rapid progress in the development of methods, tools, and resources for language classification. For instance, computational phylogenetic algorithms and software have made it possible to handle the classification of many languages using explicit models of language change, and data have been gathered for two thirds of the world’s language, allowing for rapid, exploratory classifications. There are also many open questions and venues for future research, for instance: What are the real-world counterparts to the nodes in a family tree structure? How can shortcomings in the traditional method of comparative historical linguistics be overcome? How can the understanding of the results that computational linguistic phylogenetics have to offer be improved?
External language classification, a notoriously difficult task, has also benefitted from the advent of computational power. While, in the past, the simultaneous comparison of many languages for the purpose of discovering deep genealogical links was carried out in a haphazard fashion, leaving too much room for the effect of chance similarities to kick in, this sort of activity can now be done in a systematic, objective way on an unprecedented scale. The ways of producing final, convincing evidence for a deep genealogical relation, however, have not changed much. There is some room for improvement in this area, but even more room for improvement in the way that proposals for long-distance relations are evaluated.
The German sinologist and general linguist Georg von der Gabelentz (1840–1893) occupies an interesting place at the intersection of several streams of linguistic scholarship at the end of the 19th century. As Professor of East Asian languages at the University of Leipzig from 1878 to 1889 and then Professor for Sinology and General Linguistics at the University of Berlin from 1889 until his death, Gabelentz was present at some of the main centers of linguistics at the time. He was, however, generally critical of mainstream historical-comparative linguistics as propagated by the neogrammarians, and instead emphasized approaches to language inspired by a line of researchers including Wilhelm von Humboldt (1767–1835), H. Steinthal (1823–1899), and his own father, Hans Conon von der Gabelentz (1807–1874).
Today Gabelentz is chiefly remembered for several theoretical and methodological innovations which continue to play a role in linguistics. Most significant among these are his contributions to cross-linguistic syntactic comparison and typology, grammar-writing, and grammaticalization. His earliest linguistic work emphasized the importance of syntax as a core part of grammar and sought to establish a framework for the cross-linguistic description of word order, as had already been attempted for morphology by other scholars. The importance he attached to syntax was motivated by his engagement with Classical Chinese, a language almost devoid of morphology and highly reliant on syntax. In describing this language in his 1881 Chinesische Grammatik, Gabelentz elaborated and implemented the complementary “analytic” and “synthetic” systems of grammar, an approach to grammar-writing that continues to serve as a point of reference up to the present day. In his summary of contemporary thought on the nature of grammatical change in language, he became one of the first linguists to formulate the principles of grammaticalization in essentially the form that this phenomenon is studied today, although he did not use the current term. One key term of modern linguistics that he did employ, however, is “typology,” a term that he in fact coined. Gabelentz’s typology was a development on various contemporary strands of thought, including his own comparative syntax, and is widely acknowledged as a direct precursor of the present-day field.
Gabelentz is a significant transitional figure from the 19th to the 20th century. On the one hand, his work seems very modern. Beyond his contributions to grammaticalization avant la lettre and his christening of typology, his conception of language prefigures the structuralist revolution of the early 20th century in important respects. On the other hand, he continues to entertain several preoccupations of the 19th century—in particular the judgment of the relative value of different languages—which were progressively banished from linguistics in the first decades of the 20th century.
While in phonology Middle Indo-Aryan (MIA) dialects preserved the phonological system of Old Indo-Aryan (OIA) virtually intact, their morphosyntax underwent far-reaching changes, which altered fundamentally the synthetic morphology of earlier Prākrits in the direction of the analytic typology of New Indo-Aryan (NIA). Speaking holistically, the “accusative alignment” of OIA (Vedic Sanskrit) was restructured as an “ergative alignment” in Western IA languages, and it is precisely during the Late MIA period (ca. 5th–12th centuries
(a) We shall start with the restructuring of the nominal case system in terms of the reduction of the number of cases from seven to four. This phonologically motivated process resulted ultimately in the rise of the binary distinction of the “absolutive” versus “oblique” case at the end of the MIA period). (b) The crucial role of animacy in the restructuring of the pronominal system and the rise of the “double-oblique” system in Ardha-Māgadhī and Western Apabhramśa will be explicated. (c) In the verbal system we witness complete remodeling of the aspectual system as a consequence of the loss of earlier synthetic forms expressing the perfective (Aorist) and “retrospective” (Perfect) aspect. Early Prākrits (Pāli) preserved their sigmatic Aorists (and the sigmatic Future) until late MIA centuries, while on the Iranian side the loss of the “sigmatic” aorist was accelerated in Middle Persian by the “weakening” of s > h > Ø. (d) The development and the establishment of “ergative alignment” at the end of the MIA period will be presented as a consequence of the above typological changes: the rise of the “absolutive” vs. “oblique” case system; the loss of the finite morphology of the perfective and retrospective aspect; and the recreation of the aspectual contrast of perfectivity by means of quasinominal (participial) forms. (e) Concurrently with the development toward the analyticity in grammatical aspect, we witness the evolution of lexical aspect (Aktionsart) ushering in the florescence of “serial” verbs in New Indo-Aryan.
On the whole, a contingency view of alignment considers the increase in ergativity as a by-product of the restoration of the OIA aspectual triad: Imperfective–Perfective–Perfect (in morphological terms Present–Aorist–Perfect). The NIA Perfective and Perfect are aligned ergatively, while their finite OIA ancestors (Aorist and Perfect) were aligned accusatively. Detailed linguistic analysis of Middle Indo-Aryan texts offers us a unique opportunity for a deeper comprehension of the formative period of the NIA state of affairs.
The basic vocabulary of Portuguese—the second largest Romance language in terms of speakers (about 210 million as of 2017)—comes from (vulgar) Latin, which itself incorporated a certain amount of so-called substratum and superstratum words. Whereas the former were adopted in a situation of language contact between Latin and the languages of the conquered peoples inhabiting the Iberian Peninsula, the latter are Germanic loans brought mainly by the Visigoths. From 711 onward, until the end of the Middle Ages, Arabic played a major role in the Peninsula, contributing about 1,000 words that are common in Modern Portuguese. (Classical) Latin and Greek were other sources for lexical enrichment especially in the 15th and 16th centuries as well as in the 18th and 19th centuries. Contact with other European languages—Romance and Germanic (especially English, and to a lower extent German)—led to borrowings in several thematic fields reflecting the economic, cultural, and scientific radiance that emanated from the respective language communities. In the course of colonial expansion, Portuguese came into contact with several African, Asian, and Amerindian languages from which it borrowed words for concepts and realia unknown to the Western world.
Ever since the fundamental studies carried out by the great German Romanist Max Leopold Wagner (b. 1880–d. 1962), the acknowledged founder of scientific research on Sardinian, the lexicon has been, and still is, one of the most investigated and best-known areas of the Sardinian language.
Several substrate components stand out in the Sardinian lexicon around a fundamental layer which has a clear Latin lexical background. The so-called Paleo-Sardinian layer is particularly intriguing. This is a conventional label for the linguistic varieties spoken in the prehistoric and protohistoric ages in Sardinia. Indeed, the relatively large amount of words (toponyms in particular) which can be traced back to this substrate clearly distinguishes the Sardinian lexicon within the panorama of the Romance languages. As for the other Pre-Latin substrata, the Phoenician-Punic presence mainly (although not exclusively) affected southern and western Sardinia, where we find the highest concentration of Phoenician-Punic loanwords.
On the other hand, recent studies have shown that the Latinization of Sardinia was more complex than once thought. In particular, the alleged archaic nature of some features of Sardinian has been questioned.
Moreover, research carried out in recent decades has underlined the importance of the Greek Byzantine superstrate, which has actually left far more evident lexical traces than previously thought. Finally, from the late Middle Ages onward, the contributions from the early Italian, Catalan, and Spanish superstrates, as well as from modern and contemporary Italian, have substantially reshaped the modern-day profile of the Sardinian lexicon. In these cases too, more recent research has shown a deeper impact of these components on the Sardinian lexicon, especially as regards the influence of Italian.
Kra-Dai, also known as Tai–Kadai, Daic, and Kadai, is a family of diverse languages found in southern China, northeast India, and Southeast Asia. The number of these languages is estimated to be close to a hundred, with approximately 100 million speakers all over the world. As the name itself suggests, Kra-Dai is made up of two major groups, Kra and Dai. The former refers to a number of lesser-known languages, some of which have only a few hundred fluent speakers or even less. The latter (also known as Tai, or Kam-Tai) is well established, and comprises the best-known members of the family, Thai and Lao, the national languages of Thailand and Laos respectively, whose speakers account for over half of the Kra-Dai population.
The ultimate genetic affiliation of Kra-Dai remains controversial, although a consensus among western scholars holds that it belongs under Austronesian. The majority of Kra-Dai languages have no writing systems of their own, particularly Kra. Languages with writing systems include Thai, Lao, Sipsongpanna Dai, and Tai Lue. These use Indic-based scripts. Others use Chinese character-based scripts, such as the Zhuang and Kam-Sui in southern China and surrounding regions. The government introduced Romanized scripts in the 1950s for the Zhuang and the Kam-Sui languages. Almost every group within Kra-Dai has a rich oral history tradition.
The languages are typically tonal, isolating, and analytic, lacking in inflectional morphology, with no distinction for number and gender. A significant number of basic vocabulary items are monosyllabic, but bisyllabic and multisyllabic compounds also abound. There are morphological processes in which etymologically related words manifest themselves in groups through tonal, initial, or vowel alternations. Reduplication is a salient word formation mechanism. In syntax, the Kra-Dai languages can be said to have basic SVO word order. They possess a rich system of noun classifiers. Other features include verb serialization without overt marking to indicate grammatical relations. A number of lexical items (mostly verbs) may function as grammatical morphemes in syntactic operations. Temporal and aspectual meanings are expressed through tense-aspect markers typically derived from verbs, while mood and modality are conveyed via a rich array of discourse particles.
Cecilio Garriga Escribano
The language of chemistry has seldom been the object of study by linguists, who tend to prioritize literary works. Nevertheless, in recent years its study has developed at a different pace for each of the Romance languages. It is therefore important to describe the current state of research separately for French, Spanish, Italian, Portuguese, Romanian, and Catalan. The work of historians of science, who have always dedicated particular attention to the language of chemistry, is particularly pertinent to this purpose.
Toward the end of the 18th century, French chemists spearheaded a terminological revolution: traditional terms used in alchemy were replaced by a well-structured, systematic nomenclature that was quickly adopted by the scientific community, mainly through the translation of French chemical texts, many of which were pedagogical in nature. It is important to trace the dissemination process of new chemical nomenclature in each country and in each language, since it was not uniform.
This new nomenclature is firmly based on the classical languages, particularly Greek, and it adopts a broad range of suffixes and prefixes for systematization. During the 19th century, this system steadily consolidated as the field of chemistry developed until a standardized international nomenclature was established.
From a lexicographical standpoint, the treatment of chemical terms in both general and specialized dictionaries deserves attention. Traditional lexicography has mistakenly classified many chemical terms as Hellenisms, while from the early 21st century onward they have been recognized as Gallicisms thanks to research carried out by historians of scientific language.
Finally, the procedures the Romance languages follow to coin chemical terms—both to name elements and chemicals and to express chemical combinations by means of word formation processes—must be taken into account.
The expression language of the economy and business refers to an extremely heterogeneous linguistic reality. For some, it denotes all text and talk produced by economic agents in the pursuit of economic activity, for others the language used to write or talk about the economy or business, that is, the language of the economic sciences and the media. Both the economy and business contain a myriad of subdomains, each with its own linguistic peculiarities. Language use also differs quite substantially between the shop floor and academic articles dealing with it. Last but not least, language is itself a highly articulate entity, composed of sounds, words, concepts, etc., which are taken care of by a considerable number of linguistic disciplines and theories. As a consequence, this research landscape offers a very varied picture.
The state of research is also highly diverse as far as the Romance languages are concerned. The bulk of relevant publications concerns French, followed at a certain distance by Spanish and Italian, while Romanian, Catalan, and Portuguese look like poor relations. As far as the dialects are concerned, only those of some Italian cities that held a central position in medieval trade, like Venice, Florence, or Genoa, have given rise to relevant studies. As far as the metalanguage used in research is concerned, the most striking feature is the overwhelming preponderance of German and the almost complete absence of English. The insignificant role of English must probably be attributed to the fact that the study of foreign business languages in the Anglo-Saxon countries is close to nonexistent. Why study foreign business languages if one own’s language is the lingua franca of today’s business world? Scholars from the Romance countries, of course, generally write in their mother tongue, but linguistic publications concerning the economic and business domain are relatively scarce there. The heterogeneity of the metalanguages used certainly hinders the constitution of a close-knit research community.
Aidan Pine and Mark Turin
The world is home to an extraordinary level of linguistic diversity, with roughly 7,000 languages currently spoken and signed. Yet this diversity is highly unstable and is being rapidly eroded through a series of complex and interrelated processes that result in or lead to language loss. The combination of monolingualism and networks of global trade languages that are increasingly technologized have led to over half of the world’s population speaking one of only 13 languages. Such linguistic homogenization leaves in its wake a linguistic landscape that is increasingly endangered.
A wide range of factors contribute to language loss and attrition. While some—such as natural disasters—are unique to particular language communities and specific geographical regions, many have similar origins and are common across endangered language communities around the globe. The harmful legacy of colonization and the enduring impact of disenfranchising policies relating to Indigenous and minority languages are at the heart of language attrition from New Zealand to Hawai’i, and from Canada to Nepal.
Language loss does not occur in isolation, nor is it inevitable or in any way “natural.” The process also has wide-ranging social and economic repercussions for the language communities in question. Language is so heavily intertwined with cultural knowledge and political identity that speech forms often serve as meaningful indicators of a community’s vitality and social well-being. More than ever before, there are vigorous and collaborative efforts underway to reverse the trend of language loss and to reclaim and revitalize endangered languages. Such approaches vary significantly, from making use of digital technologies in order to engage individual and younger learners to community-oriented language nests and immersion programs. Drawing on diverse techniques and communities, the question of measuring the success of language revitalization programs has driven research forward in the areas of statistical assessments of linguistic diversity, endangerment, and vulnerability. Current efforts are re-evaluating the established triad of documentation-conservation-revitalization in favor of more unified, holistic, and community-led approaches.
Victor A. Friedman
The Balkan languages were the first group of languages whose similarities were explained in modern linguistic terms as a result of language contact rather than as a result of descent from a common ancestor. Nikolai Trubetzkoy coined the term Sprachbund ‘linguistic league’ (as opposed to Sprachfamilie ‘language family’) to describe this relationship. Balkan linguistics, as both a subset of and precursor to contact linguistics, is, at its base, an historical linguistic discipline. It seeks to explain similarities among the relevant languages as the result of diffusion rather than of either transmission or of putative universal, typological properties of human language (which latter assumes parallel developments whose causation is ahistorical, i.e., unconnected with either contact or ancestry). The relevant languages are, with the exception of Turkic, all part of the Indo-European language family, but they belong to five distinct groups that are known to have been separated for a significant length of time (presumably millennia). Moreover, for four out of five Indo-European groups as well as for Turkic, there exists documentation that goes back more than a millennium, and in some cases several millennia. The Balkan languages are thus the oldest example of a well-documented and still living Sprachbund.
The primary questions that Balkan linguistics seeks to answer are these: What are the results of language contact in the Balkan languages, and how did they come about? The Balkan languages are traditionally defined as Albanian, Modern Greek, Balkan Romance (Romanian, Aromanian, and Meglenoromanian), and Balkan Slavic (Bulgarian, Macedonian, and the southernmost dialects of the former Serbo-Croatian). In recent decades, it has been recognized that the relevant dialects of Romani, Judezmo, and Turkish and Gagauz also participate in at least some of the convergent processes that are taken as definitive of the Balkan linguistic league. While the language family is defined by regular sound correspondences, which in turn help define shared morphology and a core lexicon, the Balkan linguistic league is defined principally by shared morphosyntactic developments and a shared lexicon of borrowings often called “cultural.” In the Balkan linguistic league, phonological developments are sometimes shared among different languages at the dialectal level, but there are no such features that characterize the Balkan languages as a group. Just as in the language family not every diagnostic item is represented in every branch, so, too, in the Balkan linguistic league not every feature is equally represented in all languages and dialects.
Among the most characteristic morphosyntactic features are the following: (1) replacement of infinitives by analytic subjunctives, (2) the use of a particle derived from etymological ‘want’ to mark the future, (3) replacement of synthetic gradation of adjectives with analytic constructions, (4) replacement of conditionals by anterior futures, (5) resumptive clitic pronouns for certain direct and indirect objects, (6) various simplifications in the declensional system, (7) postposed definite articles (for Balkan Slavic, Balkan Romance, and Albanian), (8) grammaticalized evidentials (Balkan Slavic, Albanian, Turkic, and to some extent Balkan Romance and Romani). While some of these convergences began in the ancient or medieval periods, the Balkan linguistic league took its definitive modern shape during the centuries of the Ottoman Empire (14th to early 20th centuries).
Linking elements occur in compound nouns and derivatives in the Indo-European languages as well as in many other languages of the world. They can be described as sound material or graphemes with or without a phonetic correspondence appearing between two parts of a word-formation product. Linking elements are meaningless per definition. However, in many cases the clear-cut distinction between them and other, meaningful elements (like inflectional or derivational affixes) is difficult. Here, a thorough examination is necessary.
Simple rules cannot describe the occurrence of linking elements. Instead, their distribution is fully erratic or at least complex, as different factors including the prosodic, morphological, or semantic properties of the word-formation components play a role and compete. The same holds for their productivity: their ability to appear in new word-formation products differs considerably and can range from strongly (prosodically, morphologically, or lexically) restricted to the virtual absence of any constraints.
Linking elements should be distinguished from singular, isolated insertions (cf. Spanish rousseau-n-iano) or extensions of one specific stem or affix (cf. ‑l- in French congo-l-ais, togo-l-ais, English Congo-l-ese, Togo-l-ese). As they link two parts of a word formation, they also differ from word-final elements attached to compounds like ‑(s)I in Turkish as in ana‑dil‑i (mother‑tongue‑i) ‘mother tongue’. Furthermore, they are also distinct from infixes, i.e., derivational affixes that are inserted into a root, as well as from confixes, which are for bound, but meaningful (lexical) morphemes.
Linking elements are attested in many Indo-European languages (Slavic, Romance, Germanic, Baltic languages, and Greek) as well as in other languages across the world. They seem to be more common in compounds than in derivatives. Additionally, some languages display different sets of linking elements in both compounds and derivatives. The linking inventories differ strongly even between closely related languages. For example, Frisian and Dutch, each of which has five different linking elements, share only two linking forms (‑s- and ‑e-).
In some languages, linking elements are homophonous to other (meaningful) elements, e.g., inflectional or derivational suffixes. This is mostly due to their historical development and to the degree of the dissociation from their sources. This makes it sometimes difficult to distinguish between linking elements and meaningful elements. In such cases (e.g., in German or Icelandic), formal and functional differences should be taken into account. It is also possible that the homophony with the inflectional markers is incidental and not a remnant of a historical development. Generally, linking elements can have different historical sources: primary suffixes (e.g., Lithuanian), case markers (e.g., many Germanic languages), derivational suffixes (e.g., Greek), prepositions (e.g., Sardinian and English). However, the historical development of many linking elements in many languages still require further research.
Depending on their distribution, linking elements can have different functions. Accordingly, the functions strongly differ from language to language. They can serve as compound markers (Greek), as “reopeners” of closed stems for further morphological processes (German), as markers of prosodically and/or morphologically complex first parts (many Germanic languages), as plural markers (Dutch and German), and as markers of genre (German).