Show Summary Details

Page of

Printed from Oxford Research Encyclopedias, African History. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

Subscriber: null; date: 18 May 2025

Genetics and Southern African Historylocked

Genetics and Southern African Historylocked

  • Francesco MontinaroFrancesco MontinaroInstitute of Genomics, Estonian Biocentre, University of Tartu
  •  and Cristian CapelliCristian CapelliDepartment of Zoology, University of Oxford

Summary

Southern Africa’s past is constellated by a series of demographic events tracing back to the dawn of our species, approximately 300,000 years ago. The intricate pattern of population movements over the millennia contributed to creating an exceptional level of diversity, which is reflected by the high degree of genomic variability of southern African groups. Although a complete characterization of the demographic history of the subcontinent is still lacking, several decades of extensive research have contributed to shed light on the main events.

Genetic and archaeological researches suggest that modern humans may have emerged as the result of admixture between different African groups, possibly including other Homo populations, challenging the common view of a unique origin of our species. Although details are still unknown, surveys suggest that long term resident populations (related to Khoe-San speakers) of the subcontinent may have emerged hundreds of thousand years ago, and have inhabited the area for at least five millennia.

Population movements, and the introduction of new cultural features, characterize the history of southern Africa over the last five millennia and have had a dramatic impact on subcontinental genetic variability. Traces of these migrations can be identified using different genetic systems, revealing a complex history of adaptation to new selective pressures and sex-biased admixture.

The historical events of the European colonization and the slave trade of the last millennium, and the emergence of new cultural groups, further increased the genomic variability of human populations in this region, one of the most genetically diverse in the world.

Subjects

  • Southern Africa

The Genetic Variability of Southern Africa

Southern Africa displays a variety of different environments distributed across several ecological and climatic backgrounds, which encompass the arid Kalahari and Namib deserts, the tropical savanna of Angola and Mozambique, and the humid rainforests of Madagascar. This biodiversity is mirrored by the rich cultural variation which characterizes the populations living in this area. Such diversity is well exemplified by the approximately one hundred different languages spoken across the whole region;1 the inhabitants of this region number over 170 million (approximately 17 percent of all sub-Saharan Africa). While anthropologists and linguists have studied populations living in this region for many years, very little was known about the genetic variability characterizing these groups until recently.

From the late 1990s, several works have focused on the groups from this area and have provided detailed descriptions of the distribution of variation across groups, and of the demographic events that have shaped such distribution. In the worldwide analysis of human population using protein polymorphisms, Cavalli-Sforza and co-workers identified Khoe-San, Somali, and Mbuti (East Africa rainforest hunter-gatherers, also known as East African Pygmies) as early diverging populations within sub-Saharan Africa. Further analysis of the genetic distances underlying the population tree showed that a subset of the Khoe-San population differed significantly from other sub-Saharan groups. Khoe-San populations varied in their affinity to other groups, with some being closer to Bantu-speaking groups, a result that was interpreted as the effect of gene flow, in agreement with the fact that these included Colored individuals, who have West African ancestry (in line with other investigations focusing on the genetic variation of southern African populations, in using the term Colored here we note that this term is used by the South African government “to monitor progress in moving away from the apartheid-based discrimination of the past. However, membership of a population group is now based on self-perception and self-classification, not on a legal definition.” We also acknowledge that this term may have a derogatory connotation in certain cultures, but this is certainly not intended here).2

Subsequent analyses based on Y chromosome and mitochondrial DNA (mtDNA), which are inherited only from one parent (the Y chromosome from the father and the mtDNA from the mother), provided insights into sex-specific population dynamics. Most of these early works focused on populations from southern Africa which are often identified as “Khoe-San.” The term Khoe-San has been used to label a rather heterogeneous group of people with the assumption they share biological, linguistic, and/or cultural traits.3 Linguistically, this grouping was broadly justified by the presence in the region of several languages enriched by click sounds, but recent investigations have suggested the existence of three to five different lineages.4 Taking this in consideration, the term Khoe-San will be used here when referring to the linguistic lineages identified by Güldemann, and will refer to those people currently living in southern Africa who speak languages belonging to one of the three largest click-rich linguistic families (Tuu, K’xa, and Khoe-Kwadi) and show genetic affinity with groups living in the region before the arrival of Bantu-speaking communities.

The early molecular investigations showed that Khoe-San-speaking populations were characterized by high within-population variation and a high degree of differentiation from other African and non-African populations. Such evidence generated the popular view of the Khoe-San-speaking Bushmen as the “earliest people,” and suggested southern Africa as the place for the emergence of H. sapiens (the term Bushmen is used without any derogatory meaning; we note that people in Namibia and other parts of southern Africa often refer to themselves as Bushmen). Both these views are incorrect. First, all humans descend from the same most recent common ancestor, and therefore have the same evolutionary age. Second, humans did not necessarily originated in a single place (eastern Africa has been generally proposed as the possible origin for our species), but the emergence of our species probably involved gene flow between different populations across the African continent, possibly including other Homo populations.5

More recent investigations have been exploring the richness of information offered by the variation at genome level, by looking at specific positions along the DNA (single nucleotide polymorphisms (SNP) arrays) or by analyzing the whole genome.

Placed in a continental context, southern African populations are characterized by five different components (Figure 1), which are often shared across groups: a so-called Khoe-San component (green in Figure 1), which is modal in most Khoe-San-speaking populations, and that can be present also in groups speaking other languages; a West African component (red), which is predominant in Bantu-speaking populations from southern Africa and is related to the components present in the Bantu-speaking groups across the continent; an eastern Africa component (orange), which is mostly present in some Khoe-San-speaking groups; and, finally, two non-African components, which relate to more recent European and Asian contributions (blue and purple, respectively). Interestingly, these five components appear to be associated with well-defined historical and demographic events which have characterized the history of this part of the African continent and which will be analyzed the details in the specific paragraphs of this chapter.

In addition, most of the ancestries described here show a high degree of heterogeneity and differentiation at a geographic local scale, highlighting the complexity of long- and short-term patterns of isolation and admixture in the subcontinent.

Despite the large amount of genomic data produced since the early 2000s, the coverage in terms of number of individuals and populations analyzed is far from homogeneous, with populations from Namibia, Botswana, and South Africa representing the vast majority of the data currently available, particularly at genome level. Nevertheless, the information based on the analysis of uniparental markers, for which more extensive datasets are available, and the recent release of data from less investigated areas have allowed scholars to provide a general overview of the regional variation and genetic history of the subcontinent.

Figure 1. The five main genetic components of southern Africa.

Note: This is an admixture plot illustrating the genetic composition of southern African populations. Each bar shows the individual composition in ancestry (each color represents an ancestral component). A total of 138,000 genome-wide markers from different Illumina arrays were used. For clarity, we show only a representative subset of southern Africa groups,6 and also included Western(Yoruba from Nigeria)7 and Eastern Africa Anuak and Amhara from Ethiopia),8 (European (British), South and Southeastern Asian (Gujarati Indians and Han Chinese) groups.9

Credit: Francesco Montinaro.

Early Genetic Structure in Southern Africa

One of the most differentiated components emerging from the analysis of African populations is commonly represented by groups who speak Khoe-San languages (see Figure 2A), and represents the ancestral genetic component present in the southernmost area of the continent prior to the arrival of pastoralists and agriculturalists.

Nowadays, this ancestry is present in Khoe-San-speaking groups in South Africa, Botswana, and Namibia, but also in Bantu-speaking groups and Colored people as a result of the admixture events that have occurred over the last 2,000 years.

Genetic analyses of modern populations from southern Africa have demonstrated the existence of at least three different ancestral components (“Khoe-San components,” Figure 2B) in the area, which diverged approximately at least 20,000 years ago or possibly much earlier.10

In detail, the first Khoe-San ancestry component is common in extant populations from the area north of the Kalahari desert, the Ju|’hoan and the !Xun, which trace their roots to the region of modern-day Namibia and Angola. Interestingly, some of the Bantu-speaking populations present in this area show traces of this ancestry, suggesting the occurrence of recent episodes of gene flow.

Figure 2. Khoe-San populations characterized by three main ancestries.

Note: Map A shows the geographic distribution of the three Khoe-San linguistic lineages described in the text. Modified from Kalahari Basin Area. Map B shows the multidimensional scaling of Khoesan-specific genetic fragments. To remove the confounding effect of recent admixture events, we considered only genetic fragments classified as Khoe-San by means of a “local ancestry approach,” and estimated the genetic distance between populations.11 The distance matrix is shown in a multidimensional scaled scatterplot. Redrawn from Francesco Montinaro et al., “Complex Ancient Genetic Structure and Cultural Transitions in Southern African Populations,” Genetics (2016): 303–316.

Credit: Alessandro Corlianò and Francesco Montinaro.

The second Khoe-San ancestry component is common to all the populations from the central Kalahari area. Notably, these populations belong to all three Khoe-San linguistic lineages mentioned, and also include some of the Bantu-speaking individuals previously analyzed, as the result of contacts with populations characterized by these ancestral components.12

The last Khoe-San ancestry component is mainly present in populations living on the southernmost tip of the continent, such as the Khomani, the Nama, and the Karretjie.13 The Khomani are a hunter-gatherer population speaking a language belonging to the Tuu linguistic lineage (N|uu language of the !Ui subgroup), whereas the Nama, a Khoe-Kwadi-speaking group from Namibia, are characterized by a pastoralist subsistence. The Bantu-speaking and Colored populations from South Africa and Lesotho draw most of the Khoe-San ancestry assimilated in their genomes from this group. However, the two groups show some non-negligible difference in their Khoe-San specific ancestry, possibly reflecting the overlap of differential ancient and recent admixture history.

Interestingly, multiple regression and allele frequency analyses showed a high and significant correlation between genetic and geographic distance and the Khoe-San ancestry components in the area, suggesting the existence of isolation-by-distance dynamics, in which individuals are more likely to mate with those from neighboring populations, generating a cline of allele frequencies.

The analysis of genetic material from archaeological remains in the continent shed new light on the origin of the ancestry of Khoe-San populations.14 The analysis of fifteen ancient African individuals from South Africa, Malawi, Tanzania, and Kenya, dated between ~500 and 8,000 years ago, showed that southern African individuals from ~2,000 years ago are genetically close to extant Khoe-San populations, pointing toward a long-standing genetic continuity in the area. Surprisingly, the same investigation highlighted that southern and eastern African individuals from between ~8100 and ~400bp are related to southern African Khoe-San and Tanzanian Hadza, to an extent which correlates with their geographic location.15 As an example, seven ancient individuals from Malawi, dating between 8100 and 2100bp, were characterized by a relatively low degree of heterogeneity, pointing toward demographic continuity for at least five millennia. However, this component appears to be currently absent in the region, which is inhabited by Bantu-speaking populations with no detectable signature of gene flow from Khoe-San groups.16

Lineages present in Khoe-San populations have been previously described as the earliest extant diverging branches in modern humans. However, it has been shown that ancient southern Africans populations are more related to some eastern (both ancient and modern) than western African groups, which in turn reveals differential relationships with southern and eastern Africa. These results challenge the “early Southern Africa divergence model,” while supporting two equally likely models involving an “out-of-East Africa” or a “differential southeast–west admixture” scenario (Figure 3). The former postulates the existence of a basal West African group, which contributed to the modern West African Mende and Yoruba populations, and corroborates the idea of an ancient structure in the continent (Figure 3A). On the other hand, the latter is compatible with a complex pattern of migration between different subregions of the continent (Figure 3B).

A similar analysis based on three ancient individuals has shown that the genomic features of Stone Age individuals from South Africa closely resemble that of modern-day southern Khoe-San populations.17

The comparison between southern Africa Stone Age individuals and a selection of modern-day and ancient individuals pushed back time from most recent common ancestor (TMRCA) of our species to 260,000–350,000 years ago. Such a deep TMRCA fits well with fossils displaying facial features compatible with early anatomically humans, which have been recently dated to ~315kya (thousands of years ago), and the presence in the same time period of modern-looking (or transitional) humans present in southern Africa, as attested by the Florisbad skull.18 This observations, together with the new date proposed for Homo naledi (236,000 to 335,000 years old),19 suggests the coexistence of multiple hominin populations at the dawn of our species.

Figure 3. Two models explaining the different relationship between southern African and western or eastern groups.

Note: Map A shows the out-of-East Africa model. It postulates that after the expansion of populations from the east, western groups (WA2) admixed with a basal African lineage (WA1). Map B shows that the same pattern can be explained by the presence of complex ancient genetic structures throughout the continent, which limited the contact between southern and western populations. Notably, different western groups show asymmetric relationships with southern Africa. Based on Pontus Skoglund et al., “Reconstructing Prehistoric African Population Structure,” Cell 171, no. 1 (2017): 59–71.e21.

Credit: Alessandro Corlianò and Francesco Montinaro.

Genetic Evidence for a Pastoral-Inspired Migration

The arrival of pastoralism in southern Africa around 2,000 years ago has been the subject of an extensive debate among archaeologists for many years.20 From a genetic perspective, the presence of a signature possibly related to the arrival of sheep and goat herders was, for a long time, elusive, suggesting a cultural rather than demic diffusion. The first evidence for such a genetic signature was initially found by looking at the variation of the Y chromosome. The Y chromosome haplotype (a haplotype is a DNA sequence defined by a specific combination of DNA variants and inherited from the same ancestor), defined by the E-M293 mutation, was identified in populations from southern (Namibia and South Africa) and eastern Africa (Tanzania). Y chromosomes haplotypes carrying the M293 mutation were originally found at the highest frequency in the Khwe from the Caprivi strip in Namibia and the Datog from Tanzania.21 The analysis of microsatellite (short tandem repeats (STRs): short DNA motifs which are tandemly repeated and that vary in number between different alleles) variation associated with this mutation was interpreted as supporting an expansion through Tanzania to southern-central Africa, independent of the dispersal of Bantu-speaking people and dating to 2,000 years ago.22

More recent work conducted at genome level has confirmed this link between southern and eastern Africa, and has offered the opportunity to quantify this contribution in populations across southern Africa as well as provide further indication of when it entered these populations. In detail, harnessing the decay of admixture linkage-disequilibrium (LD) patterns (a measure of how the association between alleles changes in relation to their distance along chromosomes), several studies identified western Eurasian ancestry in southern African groups dating between 900 and 1,800 years ago.23 Similar signatures were also detected in eastern African populations, dating to around 3,000 years ago. By extensive population comparison, authors have suggested that the best proxies for Eurasian ancestry in the south are represented by eastern African groups. The amount of the genetic ancestry putatively related to these ancient migrations varies across populations, but is present in all the Khoe-San populations tested so far. Notably, strong signatures were detected in the Nama, a Khoe-speaking pastoralist group in Namibia, which showed the strongest contribution at 14 percent, while it was lower than 6 percent in all the other southern African populations tested.24

Although the demographic impact of this contribution from East Africa has not probably been as high from the genetic point of view as other migration events (as, for example, the arrival of Bantu-speaking communities), it certainly had an important effect on the genetics of southern African populations.

An example is the distribution of the genetic variant linked to the ability to digest lactose after weaning (rs145946881, also known as C14010, located in the MCM6 gene), the disaccharide sugar present in milk. Human adults are not able to digest lactose, and drinking milk has particularly unpleasant results, with effects ranging from flatulence to severe diarrhea. The variant conferring the ability to digest lactose common in eastern Africa was also found in southern Africa (C14010) at frequencies between 20 and 35 percent in the pastoralist Nama(Figure 4).25 The haplotypes associated with this variant on chromosome 2 were shared across populations in southern Africa, and were characterized by a low degree of variation and relatively extended linkage disequilibrium, genomic features usually associated with signatures of selection. In fact, the Nama showed a higher than expected frequency of the C14010 allele (~35 percent) when considering the estimated degree of eastern African ancestry (~10 percent). The genetic background in which the mutation is present in southern African populations is the same as the one in eastern Africa. Interestingly, the signal of selection in this region appears weaker in the Nama than in other African pastoralist populations, as for example the Maasai, which might be related to the relatively recent introduction of this allele in southern African groups.

Another signature of selection related to this ancestry component refers to a polymorphism (rs1426654 in SLC24A5 gene) associated with the lighter skin color common in Europe that is present in a considerable proportion in some Khoe-San populations, ranging between 7 (Ju|’hoan) and 49 percent (Nama).26 The average allele frequency of this polymorphism in modern-day Khoe-San populations (around ~10 percent),27 is too high to be explained simply by a recent introgression following European migration, even when positive selection is taken into account. Therefore, it may be possible that this mutation, potentially originating in Eurasia, arrived with the first pastoralist populations from East Africa before 17th-century colonialism, and has subsequently been the target of selective pressure.

Figure 4. Early contacts between eastern African pastoralist and southern African foragers revealed by the distribution of the lactase persistence associated allele C14010.

Note: Allele frequency data for the C14010 allele are from Alessia Ranciaro et al., “Genetic Origins of Lactase Persistence and the Spread of Pastoralism in Africa,” American Journal of Human Genetics 94, no. 4 (2014): 496–510; Gwenna Breton et al., “Lactase Persistence Alleles Reveal Partial East African Ancestry of Southern African Khoe Pastoralists,” Current Biology: CB 24, no. 8 (2014): 852–858; and Enrico Macholdt et al., “Tracing Pastoralist Migrations to Southern Africa with Lactase Persistence Alleles,” Current Biology: CB 24, no. 8 (2014): 875–879.

Credit: Francesco Montinaro.

The “Bantu Expansion”

The spread of the Bantu languages, belonging to the Niger-Congo family, across most of sub-Saharan Africa, has often been referred to as one of the most dramatic events that has occurred in the African continent.28 By the end of the 20th century approximately five hundred Bantu languages (from “ba-ntu,” meaning “human being”) were spoken by ~300 million people in sub-Saharan Africa, in an area larger than Europe (6.8 million km2).29

Linguistic and anthropological researches indicate that this “expansion” started between four and five thousand years ago, in an area close to the present-day Cameroon/Nigeria border, and involved both the movements of people (demes) and technology(ideas).30 The strong similarities between Bantu languages suggest that the diffusion occurred in a relatively short period, a hypothesis supported by archaeological evidence for agro-pastoralist communities’ settlements in the southernmost part of the continent dated as early as 1,000–2,000 years ago.

Geneticists have joined the debate by exploring the variation present in Bantu-speaking groups and their neighboring populations.31 One interesting characteristic emerging when comparing the genetics of Bantu-speaking groups is the overall similarity found in these populations, despite their distribution across thousands of kilometres (Figure 6). This long-distance homogeneity has been interpreted as supporting the demic model for the dispersal of Bantu languages and the associated Iron Age and agro-pastoralist technologies.32 Nevertheless, differences can be found among Bantu-speaking groups. Some of these differences are related to local events of gene flow with neighboring populations, whereas others are caused by demographic processes associated with the dispersal.

Although the broad dynamics of the dispersal process have been uncovered, little is known about the path of this expansion and the local dynamics of the phenomenon.

Two main linguistic hypotheses have been promoted over the years, which propose different timings for the split and dispersal patterns. The “early split hypothesis” postulates that the two main linguistic branches (northwest and southeast) diverged at the beginning of the expansion,33 the Bantu speakers east of the great lakes being closely related to those from northern Congo. On the contrary, the “late split hypothesis” places the separation of the two branches later in time, between approximately 2–3kya, after a first migration toward the southeast, through the Congo rainforest. The latter is the most supported scenario, according to the latest linguistic investigations, which evaluated similarities among more than four hundred Bantu languages.

The dispersal from the Cameroon homeland has been described as a two-step process in the archaeological context: a first step moving from Cameroon to the surrounding areas and a second step from the Great Lakes region. The archaeological record points to a steady dispersal of related pottery traditions that started from the area at the time when Bantu languages possibly originated, ~4–5kya. Comb-stamped pottery appears just south of the Bantu languages homeland around 2,600–1,500kya, from 2,600 years ago in Gabon, 2,400–2,100 years ago at the convergence of main rivers of the Congo Basin (Imbonga tradition), and 2,300–2,000 years ago in the Republic of Congo (Ngovo group). Iron technology temporally follows the pottery, but might not have played a major role in shaping subsistence strategies.34 Interestingly, evidences of pearl millet (a savannah crop) appear around 2,300–2,100 years ago, and the word indicating this crop has been reconstructed in Narrow West Bantu languages.35 Overall, the order of appearance of the various aspects of Bantu-speaking societies (subsistence, metallurgy, pottery) is not clear and might have occurred at different times, but a north to south dispersal pattern has been suggested. The second dispersal step is centered on the Great Lake region and has been associated with the spread of eastern Bantu languages. Data from Uganda, Rwanda, Burundi, and Tanzania associate the beginning of the Iron Age with the Urewe pottery tradition around 1,000 years ago. The latter dispersed rapidly, in the form of related styles, throughout eastern and southern Africa. Linguistically, the Proto-Great Lake Bantu languages have been shown to have references for yams and goats, and later for cattle, miller, and sorghum, which might have come from Sudanic speakers in the region.36 Most of the pottery in south-central and southern Africa (thick wares) is related to Urewe ceramics, which, together with the Kalundu tradition, it has been suggested, is related to the Chifumbaze complex (Figure 5). Both Urewe and Kalundu pottery makers were probably eastern Bantu speakers. Urewe-related pottery traveled south along two routes, one, represented by the Mwabulambo and Nkope traditions, traveled more internally through Malawi (4th century) and later Zimbabwe; the other, represented by the Kwale pottery, moved along the coast. In the west, the presence of the Naviundu tradition (a very different pottery style from the Urewe and Kalundu) signaled the arrival of western Bantu speakers in northern Namibia, Botswana, and western Zambia (Figure 5). The second branch of the Chifumbaze complex represented by Kalundu pottery, originating in central Angola in the early first millennium ce, moved from west to south across Zambia and Botswana, reaching South Africa, with the earliest ceramic evidences dating to the 6th/8th century in Limpopo and southeastern Botswana.37

Figure 5. Possible dispersal routes of early farming cultures in sub-equatorial Africa.

Note: Redrawn from Peter Mitchell and Paul J. Lane, eds., The Oxford Handbook of African Archaeology (Oxford: Oxford University Press, 2013).

Credit: Alesssandro Corlianò.

DNA-based investigations have suggested some degree of genetic homogeneity among Bantu speakers across the subcontinent, which is compatible with the hypothesis of a demic rather than only a cultural diffusion. However, none of the genetic investigations conducted so far have found strong support for either of the two models based on linguistics. The comparison of the genetic (unilinear and autosomal) and geographic distances drawn on the basis of different expansion models provided contrasting results, although overall showed marginal support for the late split hypothesis.38

More recently, stronger support for the late split hypothesis has been found,39 and the analysis of single nucleotide polymorphisms (SNPs) and autosomal microsatellites showed that the eastern and southern branches are more closely related than western ones.40

The genomic analysis of four Iron Age individuals from South Africa (Champagne Castle, Eland Cave, Mfongosi, Newcastle) dated between 300 to 500 years ago has revealed a strong connection with populations from South Africa speaking southeastern Bantu languages (Zulu, Sotho, and Xhosa). When the early vs. late split model was tested on all the western Bantu groups, the Iron Age sample was closer to populations from modern-day Angola, as expected by the late split hypothesis.41

However, this observation is still compatible with other models and/or the coexistence of multiple migration waves. Furthermore, it is not clear how the admixture dynamics between different Bantu-speaking groups after their settlement affected the present-day pattern of variation.

The analysis of additional genetic material recovered from pre- and post-expansion ages, covering most of the Bantu-speaking sub-Saharan Africa, will be crucial to shed light on the global and local dynamics of this tremendous expansion.

Nevertheless, the genomic analysis from modern-day populations provided several insights both on the genetic structure related to these Bantu-speaking groups or their recent admixture dynamics.

It is well known that a series of admixture events took place during the expansion when Bantu-speaking newcomers met the resident foraging groups, as shown for rainforest and southern African foragers.42 The degree of their interaction and the impact these episodes of gene flow had in shaping the current distribution of genetic variation has been only marginally explored. Similarly, multiple waves, chronologically separated but occurring along similar paths, have been also proposed on the basis of archaeological data. In this context, the emerging picture related to the dispersal of the Bantu-speaking communities across Africa appears as characterized by a high degree of complexity, which includes overlapping diffusion waves and admixture with local populations along the main migration routes, suggesting a genomic variation extending beyond the simplistic consensus of such communities’ suggested genetic homogeneity.

Gene flow between Bantu-speaking farmers and rainforest hunter-gatherers (also referred to as Pygmies), East African Nilotic and Afro-Asiatic populations has been reported.43 In southern Africa similar admixture events involving farmers and resident hunter-gatherers have occurred; however, they did not resulted in a homogenous assimilation. Gene flow occurred differently in different regions, in line with what is proposed by the “frontier model” of interaction.44 In areas where the dispersal was rapid, as facilitated by environmental conditions compatible with the agro-pastoralist package, limited gene flow between foragers and farmers occurred (as, for example, along the eastern part of southern Africa, e.g. Mozambique and Malawi); on the contrary, in regions where climatic and ecological boundaries were instead present, the dispersal process stalled for prolonged periods of time—a situation that facilitated cultural and genetic exchange, as in the central part of South Africa.45

The dynamics of admixture between incoming Bantu-speaking populations and resident foraging communities were probably characterized by gender-biased gene flow. Comparison between the estimated amounts of Khoe-San Y chromosome and mtDNA haplotypes present in Bantu-speaking groups showed consistently higher maternal than paternal Khoe-San contributions, in line with ethnographic and historical evidences reporting females being more commonly accepted in Bantu-speaking communities.46 Overall, these episodes of gene flow contributed to genetic variants present in resident populations and shaped at least in part the differences present across Bantu speakers. In addition to variation due to admixture from neighboring groups, an additional source of variation was generated by the process of dispersal itself, when this was associated with bottlenecks, founder events, and isolation. The main components common across all Bantu-speaking groups are in fact not similarly distributed, showing clines and differential distributions, as the result of genetic drift.47

Beside drift and gene flow, local demographic histories and multiple dispersal events might have additionally contributed to the current pattern of relationships among Bantu-speaking groups. The Bantu-speaking pastoralist groups Himba and Herero in Namibia form a genetically related group with the Khoe-Kwadi-speaking Damara, but diverge from other southwest Bantu-speaking groups present in Namibia, such as Owambo, Mbukushu, and Kwangali. Linguistically all these groups speak very similar languages, all part of the so-called southwestern Bantu (the languages spoken by Himba, Herero, and Owambo are all part of the R subgroup; Mbukushu and Kwangali speak languages belonging to the less related K branch48) but genetically they clearly cluster separately (Figure 6), even when admixture is controlled for. The differentiation of these two groups of populations may be caused by cultural processes, as for example a linguistic shift following different waves of Bantu-speaking groups, or demographic events, the result of a distinct and possibly early dispersal of Bantu-speaking populations, or the result of recent severe isolation.

The analysis of the mitochondrial lineages of diverse Angolan populations suggests that the latter hypothesis is the most supported, indicating that the Herero and the Himba may have diverged from other Bantu-speaking groups over the last few hundred years, after a severe bottleneck. If such a scenario is indeed proven to be supported, this would imply a demographic event resulting in a very strong reduction of diversity affecting groups which number in the tens of thousands. However, additional investigations exploring autosomal and ancient DNA are needed to confirm and clarify the demographic history of these populations.

Figure 6. The differing genetic ancestries composition of Bantu-speaking groups.

Note: This is an admixture plot illustrating the heterogeneity of Bantu-speaking populations. Each bar shows the individual composition in ancestry (each color represent an ancestral component). For clarity, we show only Bantu populations accompanied by southern hunter-gatherer (Ju|’hoan and !Xun),49 western African (Yoruba from Nigeria),50 rainforest hunter-gatherer (Mbuti from the Democratic Republic of Congo),51 eastern African (Anuak and Amhara from Ethiopia), European (British), and South Asian (Gujarati Indians) groups.

Credit: Francesco Montinaro.

The Modern Era

The genetic investigation of several groups in southern Africa has provided evidence for the presence of genetic components related to extra-continental influences possibly associated with the Age of Exploration and the subsequent colonial period. Some of the signatures associated with non-African sources have been explained by the introgression of western Eurasia ancestry and by the contribution of eastern African groups in association with the dispersal of pastoralism, but signatures of more recent contribution have been also highlighted.

The colonization of different southern Africa areas mostly by Portuguese, Dutch, German, and English from the 16th century, and the Indian Ocean slave trade, with the deportation of slaves from South and Southeast Asia as well as Madagascar, generated a melting pot whose signatures can be detected in populations nowadays.

European-related ancestries have been reported in Namibia and South Africa, particularly, but not exclusively, in populations with a known history of admixture with European colonists, as for example the Basters in Namibia and several Colored groups in South Africa.

The Basters in particular have a very unique history which traces back to the earliest colonists who settled at the Cape of Good Hope from 1652. These early settlers comprised Europeans and people from Asia and Africa, imported as slaves. The descendants of these mixed group were the founders of the Colored people, some of which moved in 1870 to the region later to become Namibia, where they identified themselves as the Baster nation (1872). Signatures of gene flow related to South and Southeast Asia have been found in the Colored communities in the western Cape, whose ancestry contribution is estimated around 10 percent European, while West African and Khoe-San make up the remaining 90 percent in almost in equal parts. However, substantial inter-individual variation in ancestry composition was reported across self-identified Colored people, underlying the complex demographic and cultural processes shaping Colored identity.52 Colored communities from other areas are characterized by similar variability, where differences in the average ancestry composition were also observed. Coloreds from Wellington are characterized by lower Khoe-San ancestry (14 percent)53 than Coloreds from Colesberg (33 percent),54 Cape Town (31 percent),55 or Upington (61 percent),56 and variation was also reported for other ancestries. Wellington displayed the largest European (28 percent), and Asian (South and Southeast, both at 17 percent) contributions across the sampled Colored communities, while West African ancestry was estimated as 26–22 percent except in Upington, where it was below 10 percent.57 Notably, the pattern of admixture in Colored populations was strongly gender biased. The Colored from Cape Town show greatly reduced European ancestry in the X chromosome markers, which instead is enriched by Indonesian ancestry.58 Y chromosome and mtDNA analysis on the same dataset confirmed male/female imbalance contributions for Europeans (higher male contribution), and Khoe-San and India (higher female contribution).59

The Baster population has been reported to have the largest Khoe-San maternal and the largest European paternal contributions across the Colored communities (both over 90 percent). At genome level, the average European ancestry in the Basters is estimated at almost 50 percent, while Khoe-San ancestry reaches more than 28 percent. West African and Asian ancestries are estimated at less than 6 percent and more than 17 percent respectively.60

One particular source of variation in the Colored communities which has not been addressed so far in genetic investigations has been Madagascar and the Malagasy people. The historical records for immigration in South Africa are incomplete and possibly biased toward the overall composition of the European settlers more than slaves, which makes the task of estimating the demographic impact of the slave trade in general, and the contributions from Madagascar in particular, difficult.

Slaves were initially imported from Angola and Dahomey (West Africa) and later from Mozambique.61 The majority of the Indian Ocean slaves were traded through private slave traders,62 from outposts such as the Sunda Islands, Moluccas, Ceylon, India, and Bengal.63

Records from 1672 to 1682 give an account of five batches of slaves totaling 569 Malagasy,64 while between 1680 and 1731 half of the imported slaves were from Madagascar and Indian and Indonesians comprised a third of the total.65 The contribution from Madagascar was certainly substantial though it has not been possible to estimate the contribution from isotope analysis.66

As an indication, in 1753, 26 percent of slaves were from the east coast of Africa, 25 percent from Madagascar, 26 percent from South Asia, and 23 percent from the Indonesian archipelago.67 The records indicate a decline in the number of resident Malagasy and Mozambican slaves at the Cape over the years, despite the importation of a substantial number of people. Malagasy/Indonesian and South Asian ancestry in the Khoe-San populations is suggested by the presence of mitochondrial haplogroups M36 (present in Dravidian from India) and M37c3c (present in Austronesian-speaking peoples from Southeast Asia) in the Nama and Khomani people and likewise in the Cape Colored community.68 Genome-wide single nucleotide polymorphism (SNP) data have also revealed the presence of Indonesian ancestry in the latter,69 but there has been no discussion regarding whether this is present through Malagasy ancestry or through direct contributions from Indonesian exiles.70 Overall, the role of slaves imported from Madagascar in shaping the variation present in the Colored communities is still unknown. The genetic profile of the Malagasy population is a combination of West African and Asian ancestries, which came together 1,000–2,000 years ago.71

The fact that such ancestries also came together more recently when the European colonists imported slaves directly from coastal Africa, India, and Southeast Asia, makes the identification and dissection of the specific Malagasy contribution more complicated. Future work is expected to focus on the characterization of such additional sources of variation in the populations of southern Africa.

Future Perspectives

The attention that geneticists have given to the population of southern Africa has provided new insights into the demographic events that have shaped the human genetic variation of the subcontinent. The collection of data from areas only partially investigated so far, and the increase in the amount of genomic data available for these populations, are expected to provide additional details on these processes. One particular aspect of interest is the possibility that gene flow between Homo sapiens and other Homo populations might have occurred in Africa, as it did outside Africa with Neanderthals and Denisovans (the latter indicates a previously unknown hominin from the Altai mountains in Siberia and represented so far only by a handful of remains but a full genome72). Together with more samples from extant populations, a very promising aspect is the increase in the number of human remains from which DNA has been extracted and characterized.73

The molecular investigation of ancient human remains is certainly a promising direction of research expected to contribute to our understanding of the history of southern Africa. Similar work done in Europe has revolutionized the consensus picture of the ancient history of this continent. A similar impact can be envisioned for the African continent once more and more samples have been analyzed. Current limitations in the ability to retrieve genetic material due to DNA degradation may be bypassed by additional technical breakthroughs, which will make possible to push the boundaries of what is currently considered feasible.

While more extensive molecular analysis of ancient African samples is sure to come, an exhaustive survey of continental variation via ancient samples might be difficult to achieve given the smaller density of human archaeological remains available for DNA analysis across the continent when compared to Europe; as such, an approach combining data from ancient and modern samples is expected to be very beneficial to the process of understanding the history of southern Africa and of the African continent overall. Africa is still a part of the world in which whole genomes are few, but international efforts are in place to correct this situation.74

As more genome data is accumulated and novel biostatistical approaches are developed, our ability to recover the fine details of the demographic and evolutionary process that have shaped the variation of human populations is expected to improve, and in doing so, complement and enrich work by archaeologists, linguists and anthropologists. Over the last few years DNA analysis has contributed to sketching the ancient and recent history behind the variation of extant populations in southern Africa. Despite this, there are still uncertainties. For example, is not clear if humans admixed with other Homo populations, while the exact geographical and temporal extension of population structure among the early modern humans in Africa is unknown. Furthermore, the regional pattern of admixture events from the rest of the continent is far from complete, with only a fraction of regions and historical periods investigated.

Acknowledgments

The authors would like to thank the funding bodies which supported their work on African populations and the several people who contributed to the realization of these projects over the years. In particular, they would like to thank St Hugh’s College and the Leverhulme Trust (“The Genetic Landscape of Southern Africa Human Populations”; RPG-2013-298) for support. Further support has been received from the Boise Trust Fund and the John Fell Fund in Oxford, and the Wenner-Gren Foundation. They would also like to acknowledge the contributions of Miguel Gonzales-Santos, who provided relevant references for the Bantu dispersal section; Ryan Daniels, who gave suggestions about the contribution of Malagasy to South Africa genetic variation, and Alessandro Corlianò for help provided in the editing of the figures. Cristian Capelli would like to thank all the students, post-docs, and collaborators who, over the last ten years, have contributed to research into the genetic variation of southern African populations, and particularly the people who contributed to the realization of these projects by donating their DNA. The support of the Ministries of Health of Namibia, Lesotho, and South Africa, and the Lesotho Ministries of Local Government, Tourism, and Environment is also acknowledged.

Further Reading

  • Busby, George B. J., et al. “Admixture into and within Sub-Saharan Africa.” eLife 5 (2016): n.p.
  • The Encyclopedia of Global Human Migration. Wiley Online Library, 2013.
  • Gomez, Felicia, Jibril Hirbo, and Sarah A. Tishkoff. “Genetic Variation and Adaptation in Africa: Implications for Human Evolution and Disease.” Cold Spring Harbor Perspectives in Biology 6, no. 7 (2014): a008524.
  • Gurdasani, Deepti, et al. “The African Genome Variation Project Shapes Medical Genetics in Africa.” Nature 517, no. 7534 (2015): 327–332.
  • “Hunters and Herders Southern Africa Comparative Ethnography Khoisan Peoples | Social and Cultural Anthropology | Cambridge University Press.” n.p., February 16, 2018.
  • Patin, Etienne, et al. “Dispersals and Genetic Adaptation of Bantu-Speaking Populations in Africa and North America.” Science 356, no. 6337 (2017): 543–546.
  • Pickrell, Joseph K., Nick Patterson, Chiara Barbieri, et al. “The Genetic Prehistory of Southern Africa.” Nature Communications 3 (2012): 1143.
  • Pickrell, Joseph K., Nick Patterson, Po-Ru Loh, et al. “Ancient West Eurasian Ancestry in Southern and Eastern Africa.” Proceedings of the National Academy of Sciences of the United States of America 111, no. 7 (2014): 2632–2637.
  • Schlebusch, Carina M., et al. “Southern African Ancient Genomes Estimate Modern Human Divergence to 350,000 to 260,000 Years Ago.” Science 358, no. 6363 (2017): 652–655.
  • Skoglund, Pontus, et al. “Reconstructing Prehistoric African Population Structure.” Cell 171, no. 1 (2017): 59–71.e21.
  • Vansina, J. “New Linguistic Evidence and ‘The Bantu Expansion.’” Journal of African History 36, no. 2 (1995): n.p.

Notes