The Oxford Research Encyclopedia of African History will be available via subscription on April 26. Visit About to learn more, meet the editorial board, learn about subscriber services.

Show Summary Details

Page of

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, AFRICAN HISTORY ( (c) Oxford University Press USA, 2019. All Rights Reserved. Personal use only; commercial use is strictly prohibited (for details see Privacy Policy and Legal Notice).

date: 21 April 2019

Historical Linguistics: Classification

Summary and Keywords

Many societies in pre-1800 Africa depended on orality both for communication and for record keeping. Historians of Africa, among other ways of dealing with this issue, treat languages as archives and apply what is sometimes called the “words and things” approach. Every language is an archive, in the sense that its words and their meanings have histories. The presence and use of particular words in the vocabulary of the language can often be traced back many centuries into the past. They are, in other words, historical artifacts. Their presence in the language in the past and their meanings in those earlier times tell us about the things that people knew, made use of, and talked about in past ages. They provide us complex insights into the world in which people of past societies lived and operated.

But in order to reconstruct word histories, historians first need to determine the relationships and evolution of the languages that possessed those words. The techniques of comparative historical linguistics and language classification allow one to establish a linguistic stratigraphy: to show how the periods can be established in which meaning changes in existing words or changes in the words used for particular meanings took place, to assess what these word histories reveal about changes in a society and its culture, and to identify whether internal innovation or encounters with other societies mediated such changes.

The comparative method on its own cannot establish absolute dates of language divergence. The method does allow scholars, however, to reconstruct the lexicons of material culture used at each earlier period in the language family tree. These data identify the particular cultural features to look for in the archaeology of people who spoke languages of the family in earlier times, and that evidence in turn enables scholars to propose datable archaeological correlations for the nodes of the family tree. A second approach to dating a language family tree has been a lexicostatistical technique, often called glottochronology, which seeks to estimate how long ago sister languages began to diverge out of their common ancestor language by using calculations based on the proportion of words in the most basic parts of the vocabulary that the languages still retain in common. Recent work in computational linguistic phylogenetics makes use of elements of lexicostatistics, and there have been efforts to automate the comparative method as well.

In order to compare languages historically, two important issues first have to be confronted, namely data acquisition and data analysis. Linguistic field collection of vocabularies from native speakers and linguistic archive work, especially with dictionaries, are principal means of data acquisition. The comparative historical linguistic approach and methods provide the tools for analyzing these linguistic data, both diachronically and synchronically.

Nearly all African languages have been classified into four language families, namely: Niger-Congo, Nilo-Saharan, Afroasiatic, and Khoisan. The Malagasy language of Madagascar is an exception, in that it was brought west across the Indian Ocean to that island from the East Indies early in the first millennium ce. Malagasy as well as several languages with an Indo-European origin, such as Afrikaans, Krio, and Nigerian Pidgin English, are not part of this discussion.

Keywords: comparative method, regular sound change, reconstruction, shared innovation, lexicostatistics, glottochronology, phylogenetics, African language families, word histories and social histories

Using Language as an Archive

Every language is an archive, a repository of words telling the story of human history. Although linguistic reconstruction of history does not allow precise dating, the data provide a narrative of culture and human agency from a long-term perspective. In order to place words in time and space and assess changes over time, historians use historical and comparative linguistic data to establish a linguistic stratigraphy that provides a relative chronological framework. A linguistic stratigraphy that can be applied for such a purpose rests on the genetic classification of the languages spoken by the peoples whose history is the subject of historical study and analysis. By reconstructing the ancestral lexicons of culture, one can uncover many features of the social and cultural fabrics of such former societies.

These lexical data inform us about the world inhabited by the people who spoke the languages at different periods of time. Ethnographical evidence and oral traditions can then be used to illuminate the cultural implications of reconstructed vocabulary and to advance historical arguments to assess the time depths of the changes revealed in this fashion.1

The potential correlations of these findings with archaeology should be explored where archaeological data are available. Along with numerous facets of material culture and economy among such peoples, such data can also reveal the richness of the cultural universe of early societies and key elements of their spiritual and ideological understandings.

Equally important are the “copied” or borrowed words that we can reconstruct in the lexicons of those past eras. These data tell us which societies interacted with each other and reveal a great deal about the kinds, content, and consequences of cross-cultural encounters among and between the peoples of long-ago times.

Establishing a Genetic Classification

The first step in reconstructing earlier vocabularies of culture is a genetic classification of the languages to establish a historical stratigraphy, which in turn enables the previous existence of particular words to be situated within a relative time frame. Genetic classifications establish which languages have descended from the same protolanguage and depict their historical relationships.

As noted, change is an ongoing feature of all spoken languages. But when language speakers separate from each other and move apart spatially and temporally, the linguistic changes that then take place in each daughter speech community will be different. As each daughter community gradually changes its inherited language, it will first begin to speak an evolving distinct dialect. As linguistic change proceeds in each community, their dialects become progressively more and more different from each other, in time evolving into new and distinct languages. Those languages will still retain many core lexical and grammatical features in common inherited from the original protolanguage, while at the same time developing and adopting new words and grammatical features.

The patterns of shared innovation in the development of new features reveal to the linguistic historian which daughter languages are more closely related and thus belong to the same branch, and which are less closely related and belong to different branches of the language group. From a historical linguistic point of view, if two or more languages belong in the same subgroup within the family, it is because they emerged out of a common intermediate protolanguage and thus are more closely related to each other than to languages outside their subgroup. These languages share particular phonological, lexical, and grammatical innovations unique to them, because these shared changes arose within the protolanguage. As the historical linguist Terry Crowley, among others, has suggested, shared innovations in particular linguistic features are unlikely to occur by chance.2

Determining relatedness and determining subgrouping are not the same process, although similar types of evidence can be used in each case. Demonstrating that languages are genetically related primarily involves showing that they share material and that this sharing did not happen by chance. In contrast, demonstrating subgrouping requires showing that languages have undergone the same changes.3

Internal developments as well as cultural contact can initiate language change on the levels of syntax, morphology, and, in particular, lexical semantics. These changes, however, are rather more visible within the cultural vocabulary and not so much within the core vocabulary, which is less subject to rapid changes. The universal core items of basic vocabulary contain such terms as first- and second-person pronouns, the numerals one and two, the most basic body parts (foot, hand, head, eye, ear, nose, heart, etc.), universal environmental features (for example, rain), and primary actions (go, come, sit, stand, lie, die, kill, etc.). The method operates under two premises: that the words for these core meanings are more resistant to replacement than any other part of the lexicon and that they are rarely affected by word borrowing.4 To use Crowley’s term, these words, as opposed to cultural-specific words, are unlikely to be replaced by words “copied” from other languages.5

Visible changes in the cultural vocabulary people use can shed light on historical developments within a geographic and spatial region. This kind of evidence can equally supplement, as well as correct, other available sources, in particular, oral traditions about the past.

Words as well as grammatical and morphological features sometimes spread by borrowing not just from one particular language to another but across a whole set of adjacent languages. This kind of history reveals the past existence of a wide historical sphere of interacting societies along with the particular cross-cultural encounters among them. Grammatical commonalities and culture words can spread in this fashion because of widespread multilingualism that allows the people of several different neighboring societies to communicate with each other. For example, trade relations or incorporation of these peoples into a multilingual empire can lead to this kind of contact history. In addition, individual words for a new material item or a new concept often spread from language to language along with the spread of the item or concept itself from society to society.

In such situations it is sometimes difficult to identify with certainty which item is a borrowing and which is an old shared retention. Faced with such analytical issues, some linguists have resorted to an analytical dead end called “wave theory.” The approach attributes shared features in languages to waves of the spread of the features across languages rather than to common inheritance. It is an ultimately unfruitful way for the investigator to avoid the long, difficult, and painstaking work of comparative historical linguistics and having to deal with the reality that, even if the overall evidence is quite clear, there will always be individual pieces of data that are ambiguous in their indications.

Applying the Historical Linguistic Method to History

While linguists are interested in language change in order to learn about how developments take place diachronically, historians using historical linguistics apply linguistic methods to learn about the people who spoke these languages in the past and the world they lived in. In the past two decades a growing array of historians of Africa have utilized historical linguistic data to build integrated regional stories of the cultural, historical, and social past of those regions and their peoples.

Historians of Africa who apply the comparative method to subclassifying a language family often use it in conjunction with lexicostatistics, a technique for generating a numerical measure of the time distance for the relationship between languages. This relationship is expressed in a shared cognate percentage rate in the core vocabulary. The common list for eliciting this kind of data consists of one hundred core meanings and is often called a hundred-word list or Swadesh list, after Morris Swadesh who first proposed this approach.6 Africanists have adapted the list to situations in Africa and replaced terms such as snow, used by Swadesh, with others more relevant for the continent and its people. If the core vocabularies of two related languages retain a relatively high proportion of the same words with the same meanings, that means that they have diverged more recently and belong therefore in a lower-level subgroup. By the same token, if the core vocabulary is more dissimilar, then these languages will have diverged at an earlier time and thus belong to a deeper level of subgrouping.7 Table 1—following Crowley’s terminology—shows how levels of subgrouping can be correlated with the quantity of shared core vocabulary in order to determine the level of language relatedness, with a dialect being the closest and the microphyla of a mesophylum the most distant relationship between two or more languages.

Table 1. Levels of Subgrouping

Levels of Subgrouping8

Shared Cognate Percentages in Core Vocabulary

Dialect of language


Languages of a family


Families with a stock


Stocks of microphylum


Microphyla of a mesophylum


By using the comparative method, one can not only reconstruct many of the features of a protolanguage but also determine which languages are more closely related than others in a language family tree. A full classification of a group of related languages can be diagrammed as a linguistic stratigraphy, allowing the establishment of a relative chronology of the history of the language group and the people who spoke the languages.

Having established a relative chronology, the second task for the historian is to give absolute dates, where possible, to strata in the stratigraphy. The strongest cases are those where the archaeological and linguistic histories match up point-for-point in their stratigraphies, and where the material cultural features attested in the archaeology match up with the lexically reconstructed features at the comparable periods in the linguistic stratigraphy. The absolute dates derived from the archaeology data can then be extended to the correlative points evident in the linguistic history.9

One feature of the lexicostatistical measuring of core vocabularies is that it offers an additional tool for attaching absolute dates, although of a very rough and probabilistic kind, to linguistic stratigraphies. What has been observed in languages from many language families and from different parts of the world is that the accumulation of change in core vocabularies over particular time spans tends toward similar median outcomes.10 The early developers of the method treated the phenomenon as if a regular rate of attrition characterized this part of the lexicon of all languages, and they proposed a mathematical formula for calculating the date of the split between two languages from the percentage of words they still retained in common in the hundred-word list:



The value t stands for the number of thousand years that two languages have been separated, C represents the percentage of cognates as worked out by comparing basic vocabulary, and r stands for the observed approximate median retention factor of 0.86 in a single language over a period of around one thousand years.

However, it has been shown that what is involved here is not a regular rate of replacement but rather the accumulation of individually random changes among quanta of like properties. Christopher Ehret has argued that standard retention figures have often continued to be treated as if they expressed a constant rate of change rather than merely the median of a statistical distribution.11 Essentially, what glottochronology describes is the cumulative effect over the long term of innumerable small choices reflecting usages and vocabularies made by people in their everyday language use.

The tendency of the quantities of change in basic vocabulary to cluster around median figures over particular spans of time allows the investigator to use the actually attested ranges of shared cognates in related languages to propose a rough time range in which their common protolanguage was most probably spoken. The particular figures and associated dating scale followed here specifically applies the results of Ehret’s study of over twenty African correlations of archaeology and language groups, the findings of which produce results that parallel those from other parts of the world.12 According to these findings, two languages that retain the same core vocabulary item in close to 73 percent of the hundred-word list can be argued to have probably begun to diverge out of their common protolanguage sometime in the order of around one thousand years ago. Similarly, languages that share around 53 percent in core vocabulary most likely began their divergence from each other somewhere around two thousand years ago. With around 39–40 percent cognation, their divergence took place in the range of 1000 bce; with around 29–30 percent, most likely around 2000 bce; with around 21–22 percent, sometime in the range of 3000 bce; with 16 percent, in the rough neighborhood of around 4000 bce; and with 12 percent, in a span of time dating back as long ago as 5000 bce.

Thus, for now the glottochronological dating of linguistic stratigraphies remains the key tool for provisionally estimating the spans of time, especially relating to the period before the past thousand years. The application of the lexicostatistics and glottochronology techniques requires thorough knowledge about the languages as well as training in linguistics, particularly in phonetics, phonology, and morphology. Training in these techniques provides the necessary skills for the historian to uncover and establish regular sound correspondence histories, and these in turn provide the critical framework for determining which items should be counted as cognate and which can be excluded as borrowed vocabulary.

Constraints, Limitations, and Emerging New Methods

The method of glottochronology has been highly criticized among linguists.13 Historians use it only in conjunction with other methods, and they use its results more as a reference than an absolute entity. Glottochronology is criticized because, first, by summarizing cognate data into percentage scores, much of the information in the discrete data set is lost, diminishing the power of evolutionary history. Second, the clustering methods used may produce inaccurate trees when lineages emerge at different rates, grouping together languages that evolve slowly rather than languages that share a recent common ancestor. When borrowing of lexical items from one language into another is so substantial that it affects basic vocabulary, this means that those borrowings often replaced old cognates that otherwise might have been retained. This results in lowering the observed cognation rates in the borrowing languages. In general, both glottochronology and aspects of lexicostatistics have stirred the most controversy and criticism for the assumption that the range of lexical change is relatively constant. However, historians using glottochronology look rather at the cumulative effect over the long term of innumerable small choices reflecting usages and vocabularies made by people in their everyday language use, treating dates as relative references within the system of divergences and understanding the rate of change as the median of a statistical distribution.

The adaptation of phylogenetic methods from biology, which are able to infer trees from the full information on individual cognancy relations, is opening new opportunities to deal with the shortcomings of lexicostatistics. The loss of information is avoided by an aggregate percentage of similarities. Phylogenetic-statistical methods have recently been applied to shed new light on aspects of African historical linguistics, such as the Bantu expansion.

In addition, scholars have recently been testing out a variety of computational methods for determining language subgrouping and for discovering cognate sets. The methods for working out subclassifications of language relationships offer useful shortcuts, but their findings must still then be tested against the fuller findings of the comparative method. As to the computational methods for discovering cognates, historical linguists, and notably Crowley, argue that the correspondence set identification programs do much worse than a human at identifying the correspondences, and the error rate is too high to rely on such programs.14 Computational methods can augment the comparative method in several ways but do not replace it.

The Comparative Method: With Examples from Southern Cushitic

Languages that are considered to be related are classified into language families. Africa is home to four recognized language families: Afroasiatic, Nilo-Saharan, Niger-Congo, and Khoisan. Language families consist of languages that descend from a single common ancestor language, which is called a protolanguage. Even if there are no written records available, it is often possible to reconstruct an approximation of the phonology of the original language from reflexes in daughter languages by using the comparative method.

The reconstruction is an estimation of what the protolanguage looked like. We are in a sense “undoing” the changes that have taken place between the protolanguage and its various descendant languages. To do this we have to examine what we call “reflexes of forms” of the original language in these daughter languages. This means one has to look for forms—words, affixes, and so on—in the various related languages that appear to have derived from a common original form. Two such forms are cognate with each other, and both are reflexes of the same original form in the protolanguage.

We use the comparative method in carrying out linguistic reconstruction in this way. This means we compare cognate forms in two, or preferably more, related languages in order to work out some original form from which these cognates can regularly be derived. In doing this, we have to keep in mind what is already known about the kinds of sound changes that are likely and the kinds of changes that are unlikely. We can do so by looking for synchronic evidence that points to earlier linguistic change.

Diachronic Reconstruction

When we compare the vocabulary items of various languages, we cannot help but notice the strong resemblance certain words bear to each other. By systematically comparing languages, we can establish whether two or more languages descended from a common parent and are therefore genetically related. The comparative method refers to the procedure of reconstructing earlier forms on the basis of a comparison of later forms. By means of such comparative reconstruction, we can reconstruct properties of the parent language. The comparative method is a set of techniques that permits us to recover linguistic constructs of earlier—usually not directly attested—stages in a family of related languages. The recovered ancestral elements may be phonological, morphological, syntactic, semantic, and so on. Systematic comparison yields sets of regularly corresponding forms from which an antecedent form can be deduced and its place in the proto-linguistic system determined. In practice this has nearly always involved beginning with cognate basic vocabulary, the identification of recurring sound correspondences, and the reconstruction of the proto-phonological system and partial lexicon.

The comparative method proceeds in several recognizable stages, which in practice overlap considerably. In addition, internal reconstruction is useful when initially applied to the daughter languages and may also be practiced at various points along the way. A relatively full comparative treatment of a family of languages would include most or all of the following, beginning with the discovery of cognates, both lexical and morphological, and the concomitant confirmation of a genetic relationship.

Cognate Searches

For most linguists, the search for cognate vocabulary is a very challenging task. A list of one hundred or two hundred basic words is often used initially in cognate searches, the idea being that basic concepts are the least likely to have been borrowed. Some linguists warn that careful attention should be paid to known areal phenomena in the zone where one is working, because there are cases in which languages have borrowed basic vocabulary. English is a prime example of this, as it contains 10 percent of basic vocabulary that is borrowed, mostly from French. In East and Southeast Asia, it is well known that, for example, the most basic numerals have often been borrowed into local languages from Chinese. Atypical syllable structures, phoneme clusters, and marginal phonemes often stand out and point to borrowing.

One begins with regularly corresponding phonemes in basic vocabulary and in basic grammatical formants (if typology permits, preferably in paradigms). Derivational morphology is borrowed relatively easily, and its examination can wait until basic regularities have been worked out. The bedrock criterion for establishing family relationships is the existence of systematic phonological correspondences in the vocabulary items of different languages. Since the relationship between the phonological form of a word and its meaning is mostly arbitrary, the existence of systematic phonetic correspondences in the forms of two or more languages must point toward a common source. Conversely, where languages are not related, their vocabulary items fail to show systematic similarities. Words that have descended from a common source (as shown by systematic phonetic correspondences and semantic similarity) are called cognates. Once the existence of a relationship between two or more languages has been established, an attempt can be made to reconstruct the common source. This reconstructed language, or protolanguage, is made up of protoforms, which are written with a preceding asterisk (*) to indicate their character as reconstructions of earlier forms that have not been written down or are not directly observable.

Reconstruction of a protoform makes use of two general strategies. The most important one is the phonetic plausibility strategy, which requires that any changes posited to account for differences between the protoforms and later forms must be phonetically plausible. Secondarily, the majority rules strategy stipulates that if no phonetically plausible change can account for the observed differences, then the segment found in the majority of cognates should be assumed. It is important to note that the first strategy always takes precedence over the second; the second strategy is the last resort.

When the forms of two or more languages appear to be related, we can reconstruct the common form from which all the forms can be derived by means of phonetically plausible changes, through consideration of systematic phonetic correspondences among cognates. (Genetically related lexical forms of different languages are called cognates, while the reconstructed forms are protoforms, and a reconstructed language, a protolanguage).

To do historical linguistic reconstruction, one must know not just the phonology but also the morphology of each of the languages being compared. The core component in seeking to identify regular sound correspondences is the root or stem portion of the words being compared.

The first requirement is that one must analytically distinguish the stem from any prefixes or suffixes that particular daughter languages may have added to it. Our examples here come from Southern Cushitic, which is a branch of the Cushitic subfamily of the Afroasiatic language family. These examples are extracted from hundred-word-list comparisons of Southern Cushitic. The languages being compared are Iraqw, Burunge, Kwadza, Asa, Ma’a, and Dahalo.15 For the cognate counting, a modified version of the Swadesh hundred-word list was used that rendered a cognation matrix, shown in Figure 1, which represents the shared cognate percentage between the various languages.

Historical Linguistics: ClassificationClick to view larger

Figure 1. Cognation matrix of Southern Cushitic languages. The cognition chart is reproduced with permission of Christopher Ehret. Christopher Ehret, The Historical Reconstruction of Southern Cushitic Phonology and Vocabulary (Berlin: Reimer, 1980), 20.

Once the cognates have been determined, it is important to understand which changes the various languages have undergone in order to understand shared innovations. This is achieved by going line by line through the hundred-word list. The following example demonstrates this process. Let us consider the case of the notable proto–Southern Cushitic (PSC) verb root “to eat,” consisting of two consonants and an intervening vowel, along with its reflexes in the Southern Cushitic languages (the hyphens identify the locations at which the conjugational markers attach to the stem in each particular language).

Historical Linguistics: Classification

In this case the morphological additions are easily identified. Three of the languages, Iraqw, Alagwa, and Asa, possess versions of this root with an added *-im- verb suffix, indicative of an ongoing as opposed to single action. With this suffix distinguished and removed from the comparisons, the stems thus identified in the various languages provide examples of a number of regular sound-change rules known to be specific to the languages in question from the wider comparative evidence in Southern Cushitic.

The history of these reflexes in the various Southern Cushitic languages also aptly illustrates the strategies described above for protoform reconstruction and in this case leads to reconstruction of the particular protoform *?ag-. In particular, all of the languages have /a/ as the vowel, a majority of the attestations have /?/ as the first consonant of the stem, and a different majority have /g/ as the second consonant. In addition, the normative direction of consonant change in spoken human languages is (a) from *g to /y/ as in Iraqw, and never the other way around; (b) from a pharyngeal feature to the loss of the pharyngeal feature, as in Kw’adza, Asa, and Ma’a; and (c) from the presence of a consonant to the entire loss of that consonant, as in Ma’a.

One normally begins the reconstruction process by comparing the outcomes in each language for the initial phoneme in the stem. From the Iraqw, Burunge, Alagwa, and Dahalo reflexes we see that each of these languages preserved the original root-initial consonant, the pharyngeal voiced stop *? (in many languages this consonant is a continuant, but in Southern Cushitic it is generally a stop). On the other hand, each of the three languages Kw’adza, Asa, and Ma’a had a phonological history in which a regular sound-change rule converted all original voiced pharyngeals to glottal stop consonants (written “/”).

As for the second phoneme in the root, each language preserved the original PSC vowel *a, so no regular sound-change rules need be postulated in this case.

Finally, in the matter of the second consonant, and final phoneme, of this root, two languages, Iraqw and Ma’a, did not maintain the original PSC *g. But quite different regular sound-change rules brought about the loss of *g in those two.

These different sound-change rules alert us to the second consideration of fundamental importance in pursuing historical linguistic reconstruction—one must always keep in mind the phonological environment of the phoneme in deciphering its reconstruction. The Iraqw language kept *g at the beginning of words, but elsewhere in the word Iraqw regularly converted *g to either y, or if the preceding vowel was o, to w. The Ma’a rule, in contrast, simply deleted any consonant, and not just *g, when that consonant occurred as the final phoneme in a word. Conversely, Ma’a preserved g in its word for the derived noun, ki/agu “food,” in which the consonant occurs not at the end of the word but between vowels. The addition of a noun-deriving suffix, *-u, to the verb root meant that the *g was no longer at the end of the word and so was no longer subject to the regular sound-change rule deleting word-final consonants.

In other cases the analysis may be much less complex. The proto–Southern Cushitic root word for “mouth,” which has reflexes in all the Southern Cushitic languages except for Ma’a, is such an instance:

Historical Linguistics: Classification

No sound changes happen to have affected the consonants and vowels of the stem of this old Southern Cushitic noun. The original glottal stop consonant */ was maintained in each language at the beginning of words, and *a and *f continued to be realized respectively as a and f in all of the languages.

Differences did develop, however, in the gender- and number-marking suffixes attached to this root. The original final vowel in the singular form of this noun appears to have been *o. The languages of the West Rift subgroup of Southern Cushitic—Iraqw, Burunge, and Alagwa—replaced that original *o suffix with a masculine singular suffix, *-a. In contrast, both East Rift languages, Kw’adza and Asa, attached their own versions of a still different masculine singular suffix, *-ko. This suffix surfaces in the Asa reflex as a simple -k because of Asa’s regular sound-change rule deleting *o after *k at the end of a word; hence also Afa becomes /afok. The same suffix appears in Kw’adza, but in an extended form, -uko, which combines an older, archaic Cushitic masculine suffix *-u with *-ko; hence Kw’adza /afuko.

Language and Society: Establishing a Linguistic Stratigraphy

After the collection of linguistic data, the establishment of language relatedness, and the construction of the linguistic stratigraphy of the family, the historian can now begin to situate the linguistic-historical evidence into the stratigraphy in order to visualize how language and society connect. The longer the period of time since the divergence of daughter languages out of a protolanguage, the more varied and diverse are the visible linguistic representations of social and cultural continuities. From the sociolinguistic point of view, if the language is not actively spoken anymore and is endangered, the language ceases to be passed down to younger generations and starts to die out.

When we reconstruct the relationships among a group of languages, we simultaneously establish the necessary historical existence of the societies that would have spoken the languages. We also establish that some sort of societal continuity connects the histories of the speakers of each language right back in time to the people who spoke the ancestral language (the protolanguage) of the family as a whole. Thus the tree of relationships among the Southern Cushitic daughter languages shown here is at the same time a representation of the history of the succession of Southern Cushitic–speaking societies that have existed in the past and right till today (see figure 2). It forms a socio-historical as well as linguistic stratigraphy.16 The assigned dates are approximations and should still be understood as rather relative, unless they can be correlated with other absolute data, such as those stemming from archaeology. However, applying these dates helps to understand the time depth of the divergence.

Historical Linguistics: ClassificationClick to view larger

Figure 2. Family tree of Southern Cushitic languages.

In addition to the direct historical descent of languages and the societies that spoke them, lateral transmissions of languages can also take place. Dahalo, one of the Southern Cushitic languages spoken in Kenya, is an example of such lateral transmission. About two thousand years ago the ancestors of the later Dahalo-language speakers were hunter-gatherers who spoke a language belonging to the Khoisan language family. After generations of close contact with neighboring dominant Southern Cushitic–speaking herding societies, the Dahalo adopted their neighbors’ Southern Cushitic language but continued their food-collecting ways of life. As the shift took place, however, they carried over many words of their original language into their newly adopted Southern Cushitic language. To make matters even more complicated, at around the first millennium ce, an Eastern Cushitic herding society, the Garree, expanded into the region, assimilating the former Southern Cushitic herding communities into their society. The Dahalo of this era, however, retained their distinct Southern Cushitic language and also continued their older way of life, despite having a new society move in all around them.17

Once we have established the language relationships and the linguistic stratigraphy of this history, we can recover elements of social and cultural history from the individual artifacts of history, the words that make up the vocabularies we are considering in our stratigraphy. That means, in order to trace a modern-day word back to an earlier protolanguage state, the word must have been transmitted from that protolanguage through the direct line of social and linguistic descent of the language in which it is found in later times.

As initially explained, the analysis can be done through the establishment of cognates and regular sound changes in the daughter languages. Here we can discover whether the word is part of a long-term inheritance of a language or an item that was adopted (borrowed) from another language. If the word is a retention, we must determine whether it underwent any changes in meaning or grammar. If the word was borrowed from another language, we must consider its relationship to other loanwords in the language into which it was borrowed and assess whether it was part of a range of borrowed words that got adopted at the same time. In addition, we may be able to establish whether or not particular semantic fields tended to be borrowed (for example, household items, religious terms, etc.). The identification of word histories of words represented in the cultural vocabulary can only be performed after the linguistic stratigraphy has been established and the linguistic properties of all the languages under consideration have been reconstructed. Only then is it possible to understand when, how, and why a word was borrowed from another language or to assess which changes a retained word has undergone.

In general, some word histories reveal the history of material culture items and economic practices among the people who used the words. Other words attest to long-term continuities of concepts and cultural ideas and beliefs. In contrast, semantic shifts often reveal older, now lost practices and concepts, or shifts in how such practices were carried out or how concepts were interpreted. Ultimately, word histories can provide a narrative of changes in societies either triggered by internal innovations or cross-cultural encounters. If these word histories can be correlated with ethnographic, oral, and archaeological data, the historian of Africa can use them as pathways into the past, situating them into an approximate time-space correlation.

Discussion of the Literature

African languages are divided among four language families or phyla: Niger-Kordofanian, Afroasiatic, Khoisan, and Nilo-Saharan. This classification was originally established by Joseph Greenberg in 1963 in his monograph The Languages of Africa.18 This classification did not include sign languages and languages belonging to language families that originated outside of Africa, such as Afrikaans and Malagasy. At the time when Greenberg established the classification, many of the language isolates had not been identified and are not included in this early classification. Recently, African historical linguistics has shifted its attention toward better understanding how borrowing, areal diffusion, and change have influenced similarities between languages and have called for a more critical evaluation of established classifications. The role of language contact in accounting for linguistic patterning within and across families has been a recent area of study as well.19

Parallel to the classification efforts and new contributions of African linguistics, historians of Africa have begun to apply linguistic methods and documentation to early African history and to shed light on regional stories of longue durée in Africa. And while the primary focus of these research developments was to provide insights into past societies of whom no written documentation existed, it was soon recognized that language evidence can open a window into histories not covered in writing. In some instances Africanist historians have also contributed new insights to the field of linguistics itself, publishing their findings in linguistics journals and monographs targeting an audience of linguists rather than historians.

Pioneers like Christopher Ehret and Jan Vansina were at the forefront of the application of linguistics to African history. This methodology was soon applied by other African historians. The writing of African history from language evidence was subsequently applied to the writing of integrative macro-regional histories. Among major works that appeared during this time were Jan Vansina’s monographs Paths in the Rainforest: Toward a History of Political Tradition in Equatorial Africa and How Societies Are Born: Governance in West Central Africa before 1600.20 In 1998 David Schoenbrun’s monograph A Green Place, a Good Place: Agrarian Change, Gender, and Social Identity in the Great Lakes Region to the 15th Century appeared, in which he uses linguistics and other non-written evidence to write a history of technological and agricultural change as well as the history of beliefs and ideas over the longue durée.21 More recently, scholars like Rhiannon Stephens, who has explored the changing concepts of sex and gender in her monograph A History of African Motherhood: The Case of Uganda, 700–1900, and Kathryn de Luna, who provides new insights into the role of agriculture as a vehicle for political and social change in her monograph Collecting Food, Cultivating People: Subsistence and Society in Central Africa, have widened the application of language evidence to conceptual history.22

Primary Sources

Language classification depends to a large extent on language documentation. On the one hand, Joseph Greenberg’s classification of Africa’s languages into four major phyla has had an impact on research and been successful for the best-known and well-documented languages. Linguistic data can be obtained from dictionaries, wordlists, and through fieldwork. If no or only little documentation of the language exists, the historian needs to elicit the data in the field. This requires thorough training in field methods and linguistics in general. Some library collections, such as the School of Oriental and African Studies at the University of London, United Kingdom; the Melville J. Herskovitz Library of African Studies at Northwestern University in Evanston, Illinois; and the University of California, Los Angeles, have language repositories.

Further Reading

Bender, Marvin Lionel. The Nilo-Saharan Languages: A Comparative Essay. Munich: Lincom Europa, 1996.Find this resource:

Crowley, Terry. An Introduction to Historical Linguistics. Auckland: Oxford University Press, 1998.Find this resource:

De Luna, Kathryn Michelle. Collecting Food, Cultivating People: Subsistence and Society in Central Africa. New Haven: Yale University Press, 2016.Find this resource:

Dimmendaal, Gerrit Jan. Historical Linguistics and the Comparative Study of African Languages. Amsterdam: John Benjamins, 2011.Find this resource:

Ehret, Christopher. The Historical Reconstruction of Southern Cushitic Phonology and Vocabulary. Berlin: Reimer, 1980.Find this resource:

Ehret, Christopher. An African Classical Age: Eastern and Southern Africa in World History, 1000 b.c. to a.d. 400. Charlottesville: University Press of Virginia, 1998.Find this resource:

Ehret, Christopher. “Subclassifying Bantu: The Evidence of Stem Morpheme Innovation.” In Bantu Historical Linguistics: Theoretical and Empirical Perspectives. Edited by Larry M. Hyman and Jean-Marie Hombert, 43–147. Stanford, CA: Center for the Study of Language and Information, 1999.Find this resource:

Ehret, Christopher. A Historical-Comparative Reconstruction of Nilo-Saharan. Cologne: R. Köppe Verlag, 2001.Find this resource:

Ehret, Christopher. History and the Testimony of Language. Berkeley, CA: University of California Press, 2010.Find this resource:

Ehret, Christopher. “A Guide to Cognate Discovery in Nilo-Saharan.” In Language, History and Reconstructions. Edited by Jörg Adelberger and Rudolf Leger (pp. 9–93). Frankfurt am Main : Stadt- und Universitätsbibliothek Frankfurt, Köln: R. Köppe, 18, 2014.Find this resource:

Fields-Black, Edda L. Deep Roots: Rice Farmers in West Africa and the African Diaspora. Bloomington: Indiana University Press, 2008.Find this resource:

Grollemund, Rebecca, Simon Branford, Koen Bostoen, Andrew Meade, Chris Venditti, and Mark Pagel. “Bantu Expansion Shows that Habitat Alters the Route and Pace of Human Dispersals.” Proceedings of the National Academy of Sciences of the United States of America 112, no. 43 (2015): 13,296–13,301.Find this resource:

Guthrie, Malcolm. Comparative Bantu: an introduction to the comparative linguistics and prehistory of the Bantu languages. 4 vols. Farnborough: Gregg Press, 1967–1971.Find this resource:

Heine, Bernd, and Derek Nurse. African Languages: An Introduction. Cambridge, UK: Cambridge University Press, 2006.Find this resource:

Klieman, Kairn. “The Pygmies Were Our Compass”: Bantu and Batwa in the History of West Central Africa, Early Times to c. 1900 C.E. Portsmouth, NH: Heinemann, 2003.Find this resource:

Nurse, Derek, and Thomas T. Spear. The Origins and Development of Swahili: Reconstructing the History of an African Language and People. Washington, DC: Cliveden Press, 1985.Find this resource:

Schoenbrun, David L. A Green Place, a Good Place: Agrarian Change and Social Identity in the Great Lakes Region to the 15th Century. Oxford: James Currey, 1999.Find this resource:

Stephens, Rhiannon. A History of African Motherhood: The Case of Uganda, 700–1900. New York: Cambridge University Press, 2013.Find this resource:

Van de Velde, Mark L. O., Koen A. G. Bostoen, Derek Nurse, and Gérard Philippson. The Bantu Languages (2nd ed.). New York: Routledge, 2018. Routledge language family series.Find this resource:

Vansina, Jan. Paths in the Rainforest: Toward a History of Political Tradition in Equatorial Africa. London: James Currey, 1990.Find this resource:

Vansina, Jan. How Societies Are Born: Governance in West Central Africa Before 1600. Charlottesville: University of Virginia Press, 2004.Find this resource:

Vossen, Rainer. The Khoesan Languages. London: Routledge, 2013.Find this resource:


(1.) See the very good discussion by Rhiannon Stephens of how to correlate comparative ethnography with linguistics data, in Rhiannon Stephens, A History of Motherhood, Food Procurement and Politics in East-Central Uganda to the Nineteenth Century (Ann Arbor: UMI, 2008), 61–67.

(3.) Crowley, An Introduction, 168.

(4.) Christopher Ehret, History and the Testimony of Language, The California World History Library (Berkeley, CA: University of California Press, 2011), California world history library, 16.171.

(5.) Ehret, History and the Testimony, 172.

(6.) Morris Swadesh, “Linguistics as an Instrument of Prehistory,” Southwestern Journal of Anthropology 15, no. 1 (1959): 20–35.

(7.) Crowley, An Introduction, 172.

(8.) Table of linguistics relationship based on Crowley, An Introduction, 173.

(9.) Ehret, History and the Testimony, chapter 5, presents an extended range of examples of this kind of correlation in Africa.

(10.) For an early survey of work along this line, see Dell HathawayH. Hymes, “Lexicostatistics So Far,” Current Anthropology 1 (1960): 3–44; and Dell Hathaway Hymes. H. Hymes, “More on Lexicostatistics,” Current Anthropology 1 (1960): 338–345.

(11.) Ehret, History and the Testimony, 106.

(12.) Christopher Ehret, “Testing the Expectations of Glottochronology against the Correlations of Language and Archaeology in Africa,” in Time Depth in Historical Linguistics, ed. Colin Renfrew, April. McMahon, and Robert Lawrence Trask (Cambridge, UK: McDonald Institute, 2000), 373–399.

(13.) Knut Bergsland and Hans Vogt, “On the Validity of Glottochronology,” Current Anthropology 3 (1962): 115–153.

(14.) Crowley, An Introduction, 152

(16.) Ehret, History and the Testimony, 27.

(17.) Christopher Ehret, Ethiopians and East Africans: The Problem of Contacts (Nairobi: East African Publishing House, 1974).

(18.) Joseph H. Greenberg, The Languages of Africa (Bloomington: Indiana University Press, 1963).