The spatiotemporal patterns of major human admixture events during the European Holocene

ManjushaChintalapati[email protected]1
NickPatterson[email protected]2 3
PriyaMoorjani[email protected]1 4

Department of Molecular and Cell Biology, University of California, Berkeley
BerkeleyUnited States
Broad Institute of Harvard and MIT
CambridgeUnited States
Human Evolutionary Biology, Harvard University
BostonUnited States
Center for Computational Biology, University of California, Berkeley
BerkeleyUnited States

2022-05-30

Research Article

Evolutionary Biology
Genetics and Genomics

genomic clocks
ancient DNA
admixture
molecular clock
European Holocene
Neolithic
Human

publisher-id77625
doi10.7554/eLife.77625
elocation-ide77625

Abstract

Recent studies have shown that admixture has been pervasive throughout human history. While several methods exist for dating admixture in contemporary populations, they are not suitable for sparse, low coverage ancient genomic data. Thus, we developed DATES (Distribution of Ancestry Tracts of Evolutionary Signals) that leverages ancestry covariance patterns across the genome of a single individual to infer the timing of admixture. DATES provides reliable estimates under various demographic scenarios and outperforms available methods for ancient DNA applications. Using DATES on~1100 ancient genomes from sixteen regions in Europe and west Asia, we reconstruct the chronology of the formation of the ancestral populations and the fine-scale details of the spread of Neolithic farming and Steppe pastoralist-related ancestry across Europe. By studying the genetic formation of Anatolian farmers, we infer that gene flow related to Iranian Neolithic farmers occurred before 9600 BCE, predating the advent of agriculture in Anatolia. Contrary to the archaeological evidence, we estimate that early Steppe pastoralist groups (Yamnaya and Afanasievo) were genetically formed more than a millennium before the start of Steppe pastoralism. Our analyses provide new insights on the origins and spread of farming and Indo-European languages, highlighting the power of genomic dating methods to elucidate the legacy of human migrations.

Introduction

Recent studies have shown that population mixture (or ‘admixture’) is pervasive throughout human history, including mixture between the ancestors of modern humans and archaic hominins (i.e., Neanderthals and Denisovans), as well as in the history of many contemporary human groups such as African Americans, South Asians, and Europeans 50Pickrell and Reich2014. Understanding the timing and signatures of admixture offers insights into the historical context in which the mixture occurred and enables the characterization of the evolutionary and functional impact of the gene flow. Many admixed groups are formed due to population movements involving ancient migrations that predate historical records. The recent availability of genomic data for a large number of present-day and ancient genomes provides an unprecedented opportunity to reconstruct population events using genetic data, providing evidence complementary to linguistics and archaeology.

To characterize patterns of admixture, genetic methods use the insight that the genome of an admixed individual is a mosaic of chromosomal segments inherited from distinct ancestral populations 11Chakraborty and Weiss1988. Due to recombination, these ancestral segments get shuffled in each generation and become smaller and smaller over time. The length of the segments is inversely proportional to the time elapsed since the mixture 11Chakraborty and Weiss198840Moorjani et al.2011. Several genetic approaches – ROLLOFF 40Moorjani et al.201149Patterson et al.2012, ALDER 37Loh et al.2013, Globetrotter 27Hellenthal et al.2014, and Tracts 19Gravel2012 – have been developed that use this insight by characterizing patterns of admixture linkage disequilibrium (LD) or haplotype lengths across the genome to infer the timing of mixture. Haplotype-based methods perform chromosome painting or local ancestry inference at each locus in the genome and characterize the distribution of ancestry tract lengths to estimate the time of mixture 19Gravel201227Hellenthal et al.2014. This requires accurate phasing and inference of local ancestry, which is often difficult when the admixture events are old (as ancestry blocks become smaller over time) or when reference data from ancestral populations is unavailable. Admixture LD-based methods, on the other hand, measure the extent of the allelic correlation across markers to infer the time of admixture 37Loh et al.201340Moorjani et al.2011. They do not require phased data from the target or reference populations and work reliably for dating older admixture events (>100 generations). However, they tend to be less efficient in characterizing admixture events between closely related ancestral groups.

While highly accurate for dating admixture events using data from present-day samples, current methods do not work reliably for dating admixture events using ancient genomes. Ancient DNA samples often have high rates of DNA degradation, contamination (from human and other sources), and low sequencing depth, leading to a large proportion of missing variants and uneven coverage across the genome 46Orlando et al.2021. Additionally, most studies generate pseudo-haploid genotype calls – consisting of a haploid genotype determined by randomly selecting one allele at the variant site – that can lead to some issues in the inference. In such sparse datasets, estimating admixture LD can be noisy and biased (see Simulations below). Moreover, haplotype-based methods require phased data from both admixed and reference populations which remains challenging for ancient DNA specimens 19Gravel201227Hellenthal et al.2014.

An extension of admixture LD-based methods, recently introduced by 41Moorjani et al.2016, leverages ancestry covariance patterns that can be measured in a single sample using low coverage data. This approach measures the allelic correlation across neighboring sites, but instead of measuring admixture LD across multiple samples, it integrates data across markers within a single diploid genome. Using a set of ascertained markers that are informative for Neanderthal ancestry (where sub-Saharan Africans are fixed for the ancestral alleles and Neanderthals have a derived allele), 41Moorjani et al.2016, inferred the timing of Neanderthal gene flow in Upper Paleolithic Eurasian samples and showed the approach works accurately in ancient DNA samples 41Moorjani et al.2016. However, this approach is inapplicable for dating admixture events within modern human populations, as there are very few fixed differences across populations 3Auton et al.2015.

Motivated by the single sample statistic in 41Moorjani et al.2016, we developed DATES (Distribution of Ancestry Tracts of Evolutionary Signals) that measures the ancestry covariance across the genome in a single admixed individual, weighted by the allele frequency difference between two ancestral populations. This method was first introduced in 43Narasimhan et al.2019, where it was used to infer the date of gene flow between groups related to Ancient Ancestral South Indians, Iranian farmers, and Steppe pastoralists in ancient South and Central Asian populations 43Narasimhan et al.2019. In this study, we evaluate the performance of DATES by carrying out extensive simulations for a range of demographic scenarios and comparing the approach to other published genomic dating methods. We then apply DATES to infer the chronology of the genetic formation of the ancestral populations of Europeans and the spatiotemporal patterns of admixture during the European Holocene using data from ~1100 ancient DNA specimens spanning ~8000–350 BCE.

Results

Overview of DATES: model and simulations

DATES estimates the time of admixture by measuring the weighted ancestry covariance across the genome using data from a single diploid genome and two reference populations (representing the ancestral source populations). DATES works like haplotype-based methods as it is applicable to a single genome and not like admixture LD-based methods, which by definition require multiple genomes to be co-analyzed; but unlike haplotype-based methods, it is more flexible as it does not require local ancestry inference. There are three main steps in DATES: we start by first learning the genome-wide ancestry proportions by performing a simple regression analysis to model the observed genotypes in an admixed individual as a linear mix of allele frequencies from two reference populations. For each marker, we then compute the likelihood of the observed genotype in the admixed individual using the estimated ancestry proportions and allele frequencies in each reference population (this is similar in spirit to local ancestry inference). This information is, in turn, used to compute the joint likelihood of shared ancestry at two neighboring markers, accounting for the probability of recombination between the two markers. Finally, we compute the covariance across pairs of markers located at a particular genetic distance, weighted by the allele frequency differences in the reference populations (Materials and methods).

Following 41Moorjani et al.2016, we bin the markers that occur at a similar genetic distance across the genome, rather than estimating admixture LD for each pair of markers, and compute the covariance across increasing genetic distance between markers. The estimated covariance is expected to decay exponentially with genetic distance, and the rate of decay is informative of the time of the mixture 11Chakraborty and Weiss198840Moorjani et al.2011. Assuming the gene flow occurred instantaneously, we can then infer the average date of gene flow by fitting an exponential distribution to the decay pattern (Materials and methods). In cases where data for multiple individuals is available, we compute the likelihood by summing over all individuals. To make DATES computationally tractable, we implement the fast Fourier transform (FFT) for calculating ancestry covariance as described in ALDER 37Loh et al.2013. This provides a speedup from $O (n^{2})$ to $O (n l o g n)$ that reduces the typical runtimes from hours to seconds with minimal loss in accuracy (Appendix 1—figure 2).

To assess the reliability of DATES, we performed simulations where we constructed 10 admixed diploid genomes by randomly sampling haplotypes from two source populations (Materials and methods). Briefly, we simulated individual genomes with 20% European and 80% African ancestry by using phased haplotypes of northern Europeans (Utah European Americans, CEU) and west Africans (Yoruba from Nigeria, YRI) from the 1000 Genomes Project, respectively 3Auton et al.2015. As reference populations in DATES, we used closely related surrogate populations of French and Yoruba respectively, from the Human Genome Diversity Panel (HGDP) 35Li et al.2008. We first investigated the accuracy of DATES by varying the time of admixture between 10 and 300 generations. For comparison, we also applied ALDER 37Loh et al.2013 to these simulations. Both methods reliably recovered the time of admixture up to 200 generations or ~5600 years ago, assuming a generation time of 28 years 41Moorjani et al.2016, though DATES was more precise than ALDER for older admixture events (>100 generations) (Figure 1—figure supplement 1, Appendix 1—table 4). Further, DATES shows accurate results even for single target samples (Figure 1A, Figure 1—figure supplement 2A) and even when few reference individuals are available for dating (Figure 1A, Figure 1—figure supplement 2B). However, the use of large numbers of reference samples, if available, can improve the inference. In DATES, allele frequencies of the reference populations are used for computing the likelihood as well as the weighted pairwise ancestry covariance across the genome (Materials and methods). With large samples, allele frequencies of the reference populations are more reliably computed, which in turn, can improve the precision of inferred dates (Figure 1A, Figure 1—figure supplement 2B).

Figure 1.

#' @width 10
#' @height 5
library(ggplot2)
library(maps)
library(ggrepel)

library(rnaturalearth)
library(rnaturalearthdata)
library(rgeos)

par(mfrow=c(1,2))

# Panel A
data=read.table(file = "data/Figure1_data_panelA", header = T)
dd=data[data$n==1,]
plot(x=dd$admixture_time,y=dd$d_mean, col="darkorange2",pch=16,xlab = "True admixture time (gen)",
     ylab = "Estimated admixture time (gen)", ylim = c(0,max(dd$d_mean+dd$d_se)),
     main="(A) Variation in sample size",cex.axis=1.2,cex.lab=1.2); grid ();
segments(x0 =dd$admixture_time,x1 = dd$admixture_time,y0 =(dd$d_mean-dd$d_se),y1 = (dd$d_mean+dd$d_se),col="darkorange2");
segments(x0 =dd$admixture_time-2,x1 = dd$admixture_time+2,y0 =(dd$d_mean+dd$d_se),y1 = (dd$d_mean+dd$d_se),col="darkorange2");
segments(x0 =dd$admixture_time-2,x1 = dd$admixture_time+2,y0 =(dd$d_mean-dd$d_se),y1 = (dd$d_mean-dd$d_se),col="darkorange2");

dd=data[data$n==10,]
points(x=dd$admixture_time,y=dd$d_mean, col="green3",pch=16,xlab = "True admixture time (gen)",
     ylab = "Estimated admixture time (gen)", ylim = c(0,max(dd$d_mean+dd$d_se)),
     main="(A) Variation in sample size",cex.axis=1.2,cex.lab=1.2); grid ();
segments(x0 =dd$admixture_time,x1 = dd$admixture_time,y0 =(dd$d_mean-dd$d_se),y1 = (dd$d_mean+dd$d_se),col="green3");
segments(x0 =dd$admixture_time-2,x1 = dd$admixture_time+2,y0 =(dd$d_mean+dd$d_se),y1 = (dd$d_mean+dd$d_se),col="green3");
segments(x0 =dd$admixture_time-2,x1 = dd$admixture_time+2,y0 =(dd$d_mean-dd$d_se),y1 = (dd$d_mean-dd$d_se),col="green3");
lines(x=dd$admixture_time,y=dd$admixture_time,col="darkgrey",lty=2)
legend("topleft",legend = c("n=1","n=20"),col=c("darkorange2","green3"),pch = 16,bg = "white");

#panel B
data=read.table(file = "data/Figure1_data_panelB")
kol=c("tomato","darkviolet","forestgreen")
len1=c("missing proportion=10%","missing proportion=30%","missing proportion=60%")
len=c(10,30,60)
c=1;
dd=data[data$V2==10,]
plot(x=dd$V1,y=dd$V3, col="tomato",pch=15,xlab = "True admixture time (gen)",ylab = "Estimated admixture time (gen)",
     ylim = c(min(dd$V3-dd$V4),max(dd$V3+dd$V4)),
     main="(B) Effect of data quality",cex.axis=1.2,cex.lab=1.2); grid ();
lines(x=dd$V1,y=dd$V5, col=dd$V8,pch=6,type="p")
segments(x0 =dd$V1,x1 = dd$V1,y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col="tomato");
segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3+dd$V4),y1 = (dd$V3+dd$V4),col="tomato");
segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3-dd$V4),y1 = (dd$V3-dd$V4),col="tomato");
c=2;
dd=data[data$V2==30,]
points(x=dd$V1+2,y=dd$V3, col="darkviolet",pch=16,xlab = "True admixture time (gen)",ylab = "Estimated admixture time (gen)",
       ylim = c(min(dd$V3-dd$V4),max(dd$V3+dd$V4))); grid ();
segments(x0 =dd$V1+2,x1 = dd$V1+2,y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col="darkviolet");
segments(x0 =dd$V1+2-2,x1 = dd$V1+2+2,y0 =(dd$V3+dd$V4),y1 = (dd$V3+dd$V4),col="darkviolet");
segments(x0 =dd$V1+2-2,x1 = dd$V1+2+2,y0 =(dd$V3-dd$V4),y1 = (dd$V3-dd$V4),col="darkviolet");
dd=data[data$V2==60,]
c=3
points(x=dd$V1+4,y=dd$V3, col="forestgreen",pch=17); grid ();
segments(x0 =dd$V1+4,x1 = dd$V1+4,y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col="forestgreen");
segments(x0 =dd$V1+4-2,x1 = dd$V1+4+2,y0 =(dd$V3+dd$V4),y1 = (dd$V3+dd$V4),col="forestgreen");
segments(x0 =dd$V1+4-2,x1 = dd$V1+4+2,y0 =(dd$V3-dd$V4),y1 = (dd$V3-dd$V4),col="forestgreen");
lines(x=dd$V1,y=dd$V1,col="darkgrey",lty=2)
legend("topleft",legend = len1,col=kol,pch = c(15,16,17),bg = "white");

Simulation results.

We constructed n admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry using ~380,000 genome-wide SNPs for admixture dates ranging between 10 and 200 generations. To minimize any issues with overfitting, we used French and Yoruba from the Human Genome Diversity Panel as reference populations in DATES (Distribution of Ancestry Tracts of Evolutionary Signals). We show the true time of admixture (X-axis, in generations) and the estimated time of admixture (±1 SE) (Y-axis, in generations). Standard errors were calculated using a weighted block jackknife approach by removing one chromosome in each run (Materials and methods). (A) Effect of sample size: We varied the sample size (n) of target group between 1 and 10 individuals. (B) Effect of data quality: To mimic the features of ancient genomes, we generated n=10 target individuals with pseudo-haploid genotypes and missing genotype rate as 10% (orange), 30% (purple), and 60% (green). See Figure 1—figure supplements 1–!number(9) for additional simulations to test the performance of DATES.

Figure 1—figure supplement 1.

#' @width 8
#' @height 6
data=read.table(file = "data/Figure1_Supplement1")
plot(x=data$V1,y = data$V3,col="deepskyblue",pch=15,
     xlab = "True admixture time (generations)",ylab = "Estimated admixture time (generations)",
     main = "Varying time of admixture up to 300 generations",xlim=c(0,300), ylim = c(0,max(data$V3[1:30]))); grid (nx=5,ny=4)
segments(x0 =data$V1,x1 = data$V1,y0 =(data$V3+data$V6),
         y1 = (data$V3-data$V6),col="deepskyblue")
segments(x0 =data$V1-2,x1 = data$V1+2,y0 =(data$V3+data$V6),
         y1 = (data$V3+data$V6),col="deepskyblue")
segments(x0 =data$V1-2,x1 = data$V1+2,y0 =(data$V3-data$V6),
         y1 = (data$V3-data$V6),col="deepskyblue")
lines(x=data$V1,y=data$V1,col="darkgrey",lty=2)

Varying time of admixture up to 300 generations.

We simulated data for 10 admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry and varied the time of admixture between 10 and 300 generations. The X-axis shows the true time of admixture, and the Y-axis shows the estimated time of admixture (±1 SE) inferred using DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

Figure 1—figure supplement 2.

#' @width 10
#' @height 6
par(mfrow=c(2,4))
data=read.table(file = "data/Figure1_Supplement2A")
kol=c("darkorange2","green3","violetred1","royalblue1","coral","yellow3","cyan2","darkgoldenrod1","darkorchid1","dodgerblue1",
      "paleturquoise4");
len=c("n=1","n=5","n=10","n=20")
arr=c(1,5,10,20)
c=1;
for(i in arr)
{
  k=which(arr==i);
  dd=data[data$V2==i,]
  plot(x=dd$V1,y=dd$V3, col=kol[c],pch=16,xlab = "True admixture time (gen)",
       ylab = "Estimated admixture time (gen)", ylim = c(0,max(dd$V3+dd$V4))); grid ();
  segments(x0 =dd$V1,x1 = dd$V1,y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col=kol[c]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3+dd$V4),y1 = (dd$V3+dd$V4),col=kol[c]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3-dd$V4),y1 = (dd$V3-dd$V4),col=kol[c]);
  lines(x=dd$V1,y=dd$V1,col="darkgrey",lty=2)
  legend("topleft",legend = len[k],col=kol[c],pch = 16,bg = "white");
  c=c+1
}
# Panel B - reference sample size
data=read.table(file = "data/Figure1_Supplement2B")
kol=c("darkorange","green3","cyan2","violetred1","royalblue1")
len1=c("reference pop size=1","reference pop size=5","reference pop size=10","reference pop size=20")
len=c(1,5,10,20)
#par(mfrow=c(3,2))
c=1;
for(i in len)
{
  #len=1
  k=which(i==len);
  dd=data[data$V1==i,]
  plot(x=dd$V2,y=dd$V3, col=kol[c],pch=17,xlab = "True admixture time (gen)",
       ylab = "Estimated admixture time (gen)", ylim = c(min(dd$V3-dd$V4),max(dd$V3+dd$V4))); grid ();
  segments(x0 =dd$V2,x1 = dd$V2,y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col=kol[c]);
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V3+dd$V4),y1 = (dd$V3+dd$V4),col=kol[c]);
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V3-dd$V4),y1 = (dd$V3-dd$V4),col=kol[c]);
  lines(x=dd$V2,y=dd$V2,col="darkgrey",lty=2)
  legend("topleft",legend = len1[k],col=kol[c],pch = 17,bg = "white");
  c=c+1;
}
title("(A) Effect of sample size of target population", line = -1.5, outer = TRUE,cex=1.3)
title("(B) Effect of sample size of reference populations", line = -25, outer = TRUE,cex=1.3)

Impact of sample size of the target (admixed) and reference populations.

We simulated $n$ admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry and applied DATES with m reference samples of French and Yoruba ancestry. (A) Effect of sample size of target population. Each panel shows the results of simulations with n target individuals shown in the legend and m=28 French and m=21 Yoruba reference samples from each source group. (B) Effect of sample size (m) of reference populations. Each panel shows the results of simulations with n=10 target individuals and m reference samples from each source group shown in the legend. The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 3.

#' @width 10
#' @height 6
par(mfrow=c(2,4))
data=read.table(file = "data/Figure1_Supplement3A")
len=c("α=0.01","α=0.05","α=0.2","α=0.4")
arr=c(0.01,0.05,2,4)
kol=c("green","darkorchid","yellowgreen","blue","deeppink","red2","purple")
#print (arr)
c=1
for(i in arr)
{
  k=which(arr==i);
  dd=data[data$V2==i,]
  plot(x=dd$V1,y=dd$V3, col=kol[c],pch=18,xlab = "True admixture time (gen)",ylab = "Estimated admixture time (gen)", 
       ylim = c(0,max(dd$V3+dd$V4)),cex=1.5); grid ();
  #main="Admixture time for varying θ"
  segments(x0 =dd$V1,x1 = dd$V1,y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col=kol[c]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3+dd$V4),y1 = (dd$V3+dd$V4),col=kol[c]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3-dd$V4),y1 = (dd$V3-dd$V4),col=kol[c]);
  lines(x=dd$V1,y=dd$V1,col="darkgrey",lty=2)
  legend("topleft",legend = len[k],col=kol[c],pch = 18,bg = "white");
  c=c+1;  
}
title("(A) Impact of admixture proportion on the estimated time of admixture", line = -1.5, outer = TRUE,cex=1.3)
# panel B- proportions of admixture
data=read.table(file = "data/Figure1_Supplement3B")
len=c("α=0.01","α=0.05","α=0.1","α=0.2","α=0.4")
c=1;
for(i in arr)
{
  #i=5
  if(i<1) {rr=i} else {rr=i/10}
  k=which(arr==i);
  dd=data[data$V2==i,]
  lim=max(dd$V6)+rr/4
  plot(x=dd$V1,y=dd$V6, col=kol[c],pch=19,xlab = "True admixture time (gen)",
       ylab = "Estimated Theta (θ)",las=1,ylim = c(0,lim)); grid ();
  #main="Admixture prorportion inference"
  segments(x0 =dd$V1,x1 = dd$V1,y0 =(dd$V6+dd$V10),y1 = (dd$V6-dd$V10),col=kol[c]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V6+dd$V10),y1 = (dd$V6+dd$V10),col=kol[c]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V6-dd$V10),y1 = (dd$V6-dd$V10),col=kol[c]);
  lines(x=dd$V1,y=rep(rr,NROW(dd)),col="red",lty=2)
  legend("bottomright",legend = len[k],col=kol[c],pch = 19,bg = "white");
  c=c+1;
}
title("(B) Impact of admixture proportion on estimated ancestry proportion", line = -25, outer = TRUE,cex=1.3)

Impact of admixture proportion.

We simulated data for 10 admixed individuals with European (CEU) ancestry (α) in the range of 1–40% (the rest derived from Africans). We ran DATES (Distribution of Ancestry Tracts of Evolutionary Signals) to infer the time of admixture and ancestry proportion. (A) Impact on the estimated time of admixture: Each panel shows the estimated date of admixture for a different value of $α$ shown in the legend. The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis. (B) Impact on estimated ancestry proportion: Each panel shows the estimated proportion of admixture for a different value of $α$ shown in the legend. The red dashed horizontal line further indicates the value of $α$ used. The true time of admixture is shown on X-axis with the inferred proportion of admixture on Y-axis.

Figure 1—figure supplement 4.

#' @width 9
#' @height 9
par(mfrow=c(2,2))
data=read.table(file = "data/Figure1_Supplement4")
kol=c("darkorange","green3","cyan","violetred1")
len1=c("Fst(Yoruba-Yoruba)=0.000","Fst(Yoruba-BantuKenya)=0.009",
       "Fst(Yoruba-San)=0.103")
arr=c("Yoruba","BantuKenya","San")
c=1;
for(i in arr)
{
  dd=data[data$V2==i,]
  plot(x=dd$V3,y=dd$V4, col=kol[c],pch=15,xlab = "True admixture time (gen)",ylab = "Estimated admixture time (gen)",
       ylim = c(min(dd$V4-dd$V5),220)); grid ();
  segments(x0 =dd$V3,x1 = dd$V3,y0 =(dd$V4+dd$V5),y1 = (dd$V4-dd$V5),col=kol[c]);
  segments(x0 =dd$V3-2,x1 = dd$V3+2,y0 =(dd$V4+dd$V5),y1 = (dd$V4+dd$V5),col=kol[c]);
  segments(x0 =dd$V3-2,x1 = dd$V3+2,y0 =(dd$V4-dd$V5),y1 = (dd$V4-dd$V5),col=kol[c]);
  lines(x=dd$V3,y=dd$V3,col="darkgrey",lty=2)
  legend("topleft",legend = len1[c],col=kol[c],pch = 15,bg = "white");
  c=c+1;
}
title("Impact of divergence between the ancestral population and reference populations used in DATES", line = -1.5, outer = TRUE,cex=1.3)

Impact of divergence between the ancestral population and reference populations used in DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

We simulated 10 admixed individuals with 80% European (CEU) and 20% African (YRI) ancestry. We applied DATES to infer the timing of admixture using reference populations. In each panel, we show the estimated dates of admixture using French and a group that is increasingly divergent from Yoruba (shown in the legend as the F_ST with Yoruba). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 5.

#' @width 12
#' @height 6
par(mfrow=c(2,4))
data=read.table(file = "data/Figure1_Supplement5A")
len1=c("CEU/YRI", "CEU/CHB","CEU/MXL","CEU/TSI")
len2=c("French/Yoruba(0.154)", "French/Tujia(0.110)",
       "French/Maya(0.037)","French/Italian(0.004)")
len=c("YRI","CHB","MXL","TSI")
for(i in len)
{
  # i="ITU"
  dd=data[data$V1==i,]
  k=which(i==len);
  plot(x=dd$V2,y=dd$V3, col="navy",pch=19,xlab = "True admixture time (gen)",
       ylab = "Estimated admixture time (gen)",ylim = c(0,250),
       main=paste("True source populations:",len1[k],sep=" ")); grid ();
  segments(x0 =dd$V2 ,x1 = dd$V2, y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col="navy");
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V3+dd$V4),y1 =(dd$V3+dd$V4),col="navy");
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V3-dd$V4),y1 =(dd$V3-dd$V4),col="navy");
  lines(x=dd$V2,y=dd$V2,col="darkgrey")
  kol=c("white",dd$V7[1])
  legend("topleft",legend = paste(" References",len2[k],sep="="),
         col="navy",pch = c(19),bg = "white");
}
title("(A) Impact of divergence between the two source populations (n=10)", line = -1, outer = TRUE,cex=1.3)
# panel B- for target n=1
data=read.table(file = "data/Figure1_Supplement5B")
for(i in len)
{
  dd=data[data$V1==i,]
  k=which(i==len);
  plot(x=dd$V2,y=dd$V3, col="red",pch=16,xlab = "True admixture time (gen)",
       ylab = "Estimated admixture time (gen)",ylim = c(0,250),
       main=paste("True source populations:",len1[k],sep=" ")); grid ();
  segments(x0 =dd$V2 ,x1 = dd$V2, y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col="red");
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V3+dd$V4),y1 =(dd$V3+dd$V4),col="red");
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V3-dd$V4),y1 =(dd$V3-dd$V4),col="red");
  lines(x=dd$V2,y=dd$V2,col="darkgrey")
  kol=c("white",dd$V8[1])
  legend("topleft",legend = paste(" References",len2[k],sep="="),
         col="red",pch = c(16),bg = "white");
}
title("(B) Impact of divergence between the two source populations (n=1)", line = -23, outer = TRUE,cex=1.3)

Impact of divergence between the two source populations.

We simulated n admixed individuals with 20% European (CEU) and 80% ancestry from a range of populations with increasing relatedness to Europeans (shown in the legend as the F_ST to Europeans). Specifically, the other reference population we used was either West Africans (YRI), East Asians (CHB), South Americans (MXL) or Southern Europeans (TSI). We used the following reference populations for the inference: French (for all simulations) with one of the other references as either Yoruba, Tujia, Maya, or Italian, respectively. We show results for varying target sample sizes of (A) n=10 and (B) n=1. The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis. We note the inferred dates for CEU/TSI mixtures were not significant for older timescales and hence not shown.

Figure 1—figure supplement 6.

#' @width 11
#' @height 11
par(mfrow=c(2,2))
data=read.table(file = "data/Figure1_Supplement6")
len=c(2,4,6,8)
a=c("A","B","C","D")
par(mfrow=c(2,2))
for(i in len)
{
  dd=data[data$V1==i,]
  k=which(i==len)
  rr=paste("(",paste(a[k],paste("Admixed pop (as target and reference):",i*10,"%CEU+",(10-i)*10,"%YRI"),sep=") "),sep="")
  
  plot(x=dd$V2,y=dd$V4, col="deepskyblue",pch=15,xlab = "True admixture time (gen)",
       ylab = "Estimated admixture time (gen)",
       ylim = c(0,max(data$V4+data$V5)),main=rr); grid ()
  lines(x=dd$V2,y=dd$V8, col="orange2",pch=19,type="p")
  segments(x0 =dd$V2,x1 = dd$V2,y0 =(dd$V4+dd$V5),y1 = (dd$V4-dd$V5),col="deepskyblue");
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V4+dd$V5),y1 = (dd$V4+dd$V5),col="deepskyblue");
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V4-dd$V5),y1 = (dd$V4-dd$V5),col="deepskyblue");
  segments(x0 =dd$V2,x1 = dd$V2,y0 =(dd$V8+dd$V9),y1 = (dd$V8-dd$V9),col="orange2");
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V8+dd$V9),y1 = (dd$V8+dd$V9),col="orange2");
  segments(x0 =dd$V2-2,x1 = dd$V2+2,y0 =(dd$V8-dd$V9),y1 = (dd$V8-dd$V9),col="orange2");
  
  lines(x=dd$V2,y=dd$V2,col="darkgrey",lty=2)
  legend("topleft",legend = c("Refpops:French and Admixed","Refpops:Yoruba and Admixed"),text.col = c("deepskyblue","orange2"),pch = c(15,19),
         col = c("deepskyblue","orange2"),bg = "white");
  
}

Impact of using the admixed individuals themselves as one of the reference groups in DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

We simulated data for 10 admixed individuals with European (CEU) ancestry ( $α$ ) in the range of 20–80% (the rest derived from Africans [YRI]) using CEU and YRI as reference populations. Using a non-overlapping set of CEU and YRI individuals, we generated 10 additional individuals that we used as reference samples in DATES. For each simulation, we ran DATES with Europeans (French) and a non-overlapping set of simulated admixed individuals as the reference populations (shown in blue), or Yoruba and simulated admixed individuals (shown in orange). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 7.

#' @width 12
#' @height 9
par(mfrow=c(3,4))
data=read.table(file = "data/Figure1_Supplement7A")
kol=c("darkorange","green3","cyan","violetred1","royalblue1","orangered1","darkorchid1",
      "yellowgreen","slateblue2","tomato","palevioletred2","red","orange")
len1=c("missing prop=10%","missing prop=20%","missing prop=40%","missing prop=60%")
len=c(10,20,40,60)
c=1;
for(i in len)
{
  dd=data[data$V2==i,]
  k=which(i==len);
  plot(x=dd$V1,y=dd$V3, col=kol[k],pch=17,xlab = "True admixture time (gen)",ylab = "Estimated admixture time (gen)",
       ylim = c(0,220)); grid ();
  lines(x=dd$V1,y=dd$V5, col=dd$V8,pch=6,type="p")
  segments(x0 =dd$V1,x1 = dd$V1,y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col=kol[k]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3+dd$V4),y1 = (dd$V3+dd$V4),col=kol[k]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3-dd$V4),y1 = (dd$V3-dd$V4),col=kol[k]);
  lines(x=dd$V1,y=dd$V1,col="darkgrey",lty=2)
  legend("topleft",legend = len1[k],col=kol[k],pch = c(17,15),bg = "white");
  c=c+1;
}
title("(A) Diploid genotypes with missing data for n=10 admixed individuals", line = -1.5, outer = TRUE,cex=1.3)

data=read.table(file = "data/Figure1_Supplement7B")
data1=read.table(file = "data/Figure1_Supplement7C")
kol=c("darkorange","green3","cyan","violetred1","royalblue1","orangered1","darkorchid1","yellowgreen","slateblue2","tomato","palevioletred2")
c=1;
for(i in len)
{
  dd=data[data$V2==i,]
  k=which(i==len);
  plot(x=dd$V1,y=dd$V3, col=kol[c],pch=16,xlab = "True admixture time (gen)",ylab = "Estimated admixture time (gen)",
       ylim = c(min(dd$V3-dd$V4),max(dd$V3+dd$V4,250))); grid ();
  segments(x0 =dd$V1,x1 = dd$V1,y0 =(dd$V3+dd$V4),y1 = (dd$V3-dd$V4),col=kol[c]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3+dd$V4),y1 = (dd$V3+dd$V4),col=kol[c]);
  segments(x0 =dd$V1-2,x1 = dd$V1+2,y0 =(dd$V3-dd$V4),y1 = (dd$V3-dd$V4),col=kol[c]);
  lines(x=dd$V1,y=dd$V1,col="darkgrey",lty=2)
  legend("topleft",legend = c(len1[k]),col=kol[c],pch = 16,bg = "white");
  c=c+1;
}
title("(B) Pseudo-haploid genotypes with missing data for n=10 admixed individuals", line = -25, outer = TRUE,cex=1.3)

c=1;
for(i in len)
{
  dd1=data1[data1$V2==i,]
  k=which(i==len);
  plot(x=dd1$V1,y=dd1$V3, col=kol[c],pch=22,xlab = "True admixture time (gen)",ylab = "Estimated admixture time (gen)",
       ylim = c(min(dd1$V3-dd1$V4),max(dd1$V3+dd1$V4,250))); grid ();
  segments(x0 =dd1$V1,x1 = dd1$V1,y0 =(dd1$V3+dd1$V4),y1 = (dd1$V3-dd1$V4),col=kol[c],lty=2);
  segments(x0 =dd1$V1-2,x1 = dd1$V1+2,y0 =(dd1$V3+dd1$V4),y1 = (dd1$V3+dd1$V4),col=kol[c],lty=2);
  segments(x0 =dd1$V1-2,x1 = dd1$V1+2,y0 =(dd1$V3-dd1$V4),y1 = (dd1$V3-dd1$V4),col=kol[c],lty=2);
  lines(x=dd1$V1,y=dd1$V1,col="darkgrey",lty=2)
  legend("topleft",legend = c(len1[k]),col=kol[c],pch = 16,bg = "white");
  c=c+1;
}
title("(C) Pseudo-haploid genotypes with missing data for n=1 admixed individuals", line = -47, outer = TRUE,cex=1.3)

Impact of sample size and data quality of target samples.

We simulated data for n admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry. In each panel, we varied three key features of the data from the target population, notably sample size (n=1 or 10), type of genotypes (diploid or pseudo-haploid) and missing genotype rate (between 10% and 60%). (A) Diploid genotypes with missing data for n=10 admixed individuals. Each panel shows the results of x% of missing diploid genotypes (shown in the legend). (B) Pseudo-haploid genotypes with missing data for n=10 admixed individuals. Each panel shows the results of x% of missing pseudo-haploid genotypes (shown in the legend). (C) Pseudo-haploid genotypes with missing data for n=1 admixed individuals. Each panel shows the results of x% of missing pseudo-haploid genotypes (shown in the legend). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 8.

#' @width 10
#' @height 10
par(mfrow=c(3,3))
data=read.table(file = "data/Figure1_Supplement8",header=T)
kol=c("darkorange","darkorchid1","cyan","deeppink")
ref_used=c("FY","FB","FS")
len1=c("Reference missing prop=0%","Reference missing prop=20%","Reference missing prop=40%")
len=c(20,40,60)
rlen=c(20,40)
for(w in ref_used)
{
  sub=data[data$references==w,]
  for(r in len)
  {
    c=1;
    rd=sub[sub$missing_target==r,]
    w=which(r==len);
    dd=rd[rd$missing_ref==0,]
    plot(x=dd$admixture_time,y=dd$dates_n10, col=kol[c],pch=15,xlab = "True admixture time (gen)",ylab = "Estimated admixture time (gen)",main=paste("Missing proportion in the target: ",paste(len[w],"%",sep=""),sep = ""),xlim=c(0,210),ylim = c(0,250)); grid ();
    segments(x0 =dd$admixture_time,x1 = dd$admixture_time,y0 =(dd$dates_n10+dd$dates_n10_se),y1 = (dd$dates_n10-dd$dates_n10_se),col=kol[c]);
    segments(x0 =dd$admixture_time-2,x1 = dd$admixture_time+2,y0 =(dd$dates_n10+dd$dates_n10_se),y1 = (dd$dates_n10+dd$dates_n10_se),col=kol[c]);
    segments(x0 =dd$admixture_time-2,x1 = dd$admixture_time+2,y0 =(dd$dates_n10-dd$dates_n10_se),y1 = (dd$dates_n10-dd$dates_n10_se),col=kol[c]);
    lines(x=dd$admixture_time,y=dd$admixture_time,col="darkgrey",lty=2)
    c=c+1; err=3
    for(i in rlen)
    {
      dd=rd[rd$missing_ref==i,]
      k=which(i==len);
      points(x=dd$admixture_time+err,y=dd$dates_n10, col=kol[c],pch=15); grid ();
      segments(x0 =dd$admixture_time+err,x1 = dd$admixture_time+err,y0 =(dd$dates_n10+dd$dates_n10_se),y1 = (dd$dates_n10-dd$dates_n10_se),col=kol[c]);
      segments(x0 =dd$admixture_time+err-2,x1 = dd$admixture_time+err+2,y0 =(dd$dates_n10+dd$dates_n10_se),y1 = (dd$dates_n10+dd$dates_n10_se),col=kol[c]);
      segments(x0 =dd$admixture_time+err-2,x1 = dd$admixture_time+err+2,y0 =(dd$dates_n10-dd$dates_n10_se),y1 = (dd$dates_n10-dd$dates_n10_se),col=kol[c]);
      lines(x=dd$admixture_time,y=dd$admixture_time,col="darkgrey",lty=2)
      c=c+1; err=err+5
    }
    legend("topleft",legend = c(len1),col=kol[1:4],pch = 15,bg = "white");
  }
}
title("(a) Reference populations of French and Yoruba (FST(true, reference) ~ 0) for target population n=10 ", line = -1, outer = TRUE,cex=1.3)
title("(b) Reference populations of French and Bantu Kenya ( FST(true, reference) ~ 0.009) for target population n=10", line = -26, outer = TRUE,cex=1.3)
title("(c) Reference populations of French and San ( FST(true, reference) ~ 0.103) for target population n=10", line = -51, outer = TRUE,cex=1.3)

Impact of data quality of target and reference populations as a function of divergence between true and reference populations used in DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

We simulated data for n=10 admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry with pseudo-haploid genotypes. The reference populations used also had pseudo-haploid genotypes. We further varied three key features of the data, missing genotype rate in reference populations, missing genotype rate in target populations, and divergence between true source populations and reference population used for the analysis. In each row, we show the admixture dates using reference populations with increasing divergence to true source population (F_ST shown in the row title). In each column, we varied the missing genotype rate in the target population (shown in the column title). Further, each panel shows results of missing data in the reference genomes (shown in the legend). (a) Reference populations of French and Yoruba (F_ST(true, reference)_{0). (b) Reference
populations of French and Bantu Kenya (_F}ST_{_(true, reference)}0.009). (c) Reference populations of French and San (F_ST(true, reference)~0.103). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 9.

#' @width 10
#' @height 10
par(mfrow=c(3,3))
data=read.table(file = "data/Figure1_Supplement9",header=T)
kol=c("darkorange","darkorchid1","cyan","deeppink")
ref_used=c("FY","FB","FS")
len1=c("Reference missing prop=0%","Reference missing prop=20%","Reference missing prop=40%")
len=c(20,40,60)
rlen=c(20,40)
for(w in ref_used)
{
  sub=data[data$references==w,]
  for(r in len)
  {
    c=1;
    rd=sub[sub$missing_target==r,]
    w=which(r==len);
    dd=rd[rd$missing_ref==0,]
    plot(x=dd$admixture_time,y=dd$dates_n1, col=kol[c],pch=15,xlab = "True admixture time (gen)",ylab = "Estimated admixture time (gen)",
         main=paste("Missing proportion in the target: ",paste(len[w],"%",sep=""),sep = ""),xlim=c(0,210),
         ylim = c(0,250)); grid ();
    segments(x0 =dd$admixture_time,x1 = dd$admixture_time,y0 =(dd$dates_n1+dd$dates_n1_se),y1 = (dd$dates_n1-dd$dates_n1_se),col=kol[c]);
    segments(x0 =dd$admixture_time-2,x1 = dd$admixture_time+2,y0 =(dd$dates_n1+dd$dates_n1_se),y1 = (dd$dates_n1+dd$dates_n1_se),col=kol[c]);
    segments(x0 =dd$admixture_time-2,x1 = dd$admixture_time+2,y0 =(dd$dates_n1-dd$dates_n1_se),y1 = (dd$dates_n1-dd$dates_n1_se),col=kol[c]);
    lines(x=dd$admixture_time,y=dd$admixture_time,col="darkgrey",lty=2)
    c=c+1; err=3
    for(i in rlen)
    {
      dd=rd[rd$missing_ref==i,]
      k=which(i==len);
      points(x=dd$admixture_time+err,y=dd$dates_n1, col=kol[c],pch=15); grid ();
      segments(x0 =dd$admixture_time+err,x1 = dd$admixture_time+err,y0 =(dd$dates_n1+dd$dates_n1_se),y1 = (dd$dates_n1-dd$dates_n1_se),col=kol[c]);
      segments(x0 =dd$admixture_time+err-2,x1 = dd$admixture_time+err+2,y0 =(dd$dates_n1+dd$dates_n1_se),y1 = (dd$dates_n1+dd$dates_n1_se),col=kol[c]);
      segments(x0 =dd$admixture_time+err-2,x1 = dd$admixture_time+err+2,y0 =(dd$dates_n1-dd$dates_n1_se),y1 = (dd$dates_n1-dd$dates_n1_se),col=kol[c]);
      lines(x=dd$admixture_time,y=dd$admixture_time,col="darkgrey",lty=2)
      c=c+1; err=err+5
    }
    legend("topleft",legend = c(len1),col=kol[1:4],pch = 15,bg = "white");
  }
}
title("(a) Reference populations of French and Yoruba (FST(true, reference) ~ 0) for target population n=1 ", line = -1, outer = TRUE,cex=1.3)
title("(b) Reference populations of French and Bantu Kenya ( FST(true, reference) ~ 0.009) for target population n=1 ", line = -26, outer = TRUE,cex=1.3)
title("(c) Reference populations of French and San ( FST(true, reference) ~ 0.103) for target population n=1 ", line = -51, outer = TRUE,cex=1.3)

Impact of small sample size and data quality of target and reference populations as a function of divergence between true and reference populations used in DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

We simulated data for n=1 admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry with pseudo-haploid genotypes. The reference populations used also had pseudo-haploid genotypes. We further varied three key features of the data, missing genotype rate in reference populations, missing genotype rate in target populations, and divergence between true source populations and reference population used for the analysis. In each row, we show the admixture dates using reference populations with increasing divergence to true source population (F_ST shown in the row title). In each column, we varied the missing genotype rate in the target population (shown in the column title). Further, each panel shows results of missing data in the reference genomes (shown in the legend). (a) Reference populations of French and Yoruba (F_ST(true, reference)_{0). (b) Reference
populations of French and Bantu Kenya (_F}ST_{_(true, reference)}0.009). (c) Reference populations of French and San (F_ST(true, reference)~0.103). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Next, we tested DATES for features such as varying admixture proportions and use of surrogate populations as reference groups. By varying of European ancestry proportion between ~1% and 50% (the rest derived from west Africans), we observed DATES accurately estimated the timing in all cases (Figure 1—figure supplement 3A). However, the inferred admixture proportion was overestimated for lower admixture proportions (<10%) (Figure 1—figure supplement 3B). Thus, we caution against using DATES for estimating ancestry proportions. DATES works reliably for dating admixtures between related groups such as Europeans and Mexicans (F_ST~ 0.03), though it was unable to distinguish mixtures of Southern and Northern Europeans (F_ST< 0.005) (Figure 1—figure supplement 5).

We found DATES is robust to the use of highly divergent surrogates as reference populations. For example, the use of Khomani San as the reference population instead of the true ancestral population of Yoruba (F_ST ~ 0.1) provides unbiased dates of admixture (Figure 1—figure supplement 4). In this regard, for ancient DNA where sometimes only sparse data is available, one can also use present-day samples as reference populations to increase the quality and sample size of the ancestral groups. In principle, as long as the allele frequencies in the reference samples are correlated to the ancestral allele frequencies, the inference of admixture dates should remain unbiased (Materials and methods). In practice, however, recent demographic events (e.g., strong founder events or admixture from additional sources, etc.) in the history of the present-day samples could lead to significant deviation from the ancestral allele frequencies. Thus, the reference populations should be carefully chosen.

Another idea is to use the admixed populations themselves as one of the reference populations as demonstrated by the single reference setup in ALDER 37Loh et al.2013. Admixed individuals have intermediate allele frequencies to the ancestral populations and thus weighted LD or ancestry covariance can be computed with only one reference population (albeit, with reduced power). 37Loh et al.2013, showed that the use of admixed populations as one of the references does not bias the rate of decay of the weighted LD (i.e., time of admixture), though the amplitude of the decay curve (not used in DATES) can be biased under some scenarios. To verify DATES provides reliable results under this setup, we applied DATES with a single reference population and used the admixed population as the other reference. Like ALDER, our inferred dates of admixture were accurate and comparable to using two reference populations. (Figure 1—figure supplement 6).

An important feature of DATES is that it does not require phased data and is applicable to datasets with small sample sizes, making it in principle useful for ancient DNA applications. To test the reliability of DATES for ancient genomes, we simulated data mimicking the relevant features of ancient genomes, namely small sample sizes (n=1–20), large proportions of missing genotypes (between 10% and 60%), and pseudo-haploid genotype calls (instead of diploid genotype calls) in reference and/or target samples. DATES showed reliable results under various setups, even when only a single admixed individual was available (Figure 1B, Figure 1—figure supplements 1–!number(9)). In contrast, admixture LD-based methods require more than one sample and do not work reliably with missing data. For example, ALDER estimates were very unstable for simulations with >40% missing data. For older dates (>100 generations), there was a slight bias even with >10% missing genotypes (Appendix 1—figure 5). This is expected as LD calculations leverage shared patterns across samples, thus variable missingness of genotypes across individuals leads to substantial loss of data leading to unstable and noisy inference. We also generated data for combinations of features including small sample sizes, pseudo-haploid genotypes with large proportions of missing genotypes in both target and reference samples, and use of highly divergent reference samples. We found DATES yielded reliable results with large amounts (~40–60%) of missing data, either in the target or references, even with highly divergent reference populations (Figure 1—figure supplement 8). This was also true when a single target sample was available, though as expected, the inference becomes noisier for older dates and large fractions of missing data (Figure 1—figure supplement 9). The robust performance of DATES in sparse datasets highlights a major advantage for ancient DNA applications.

DATES assumes a model of instantaneous gene flow with a single pulse of mixture between two source populations. However, many human populations have a history of multiple pulses of gene flow. To test the performance of DATES for multi-way admixture events, we generated admixed individuals with ancestry from three sources (East Asians, Africans, and Europeans) where the gene flow occurred at two distinct time points (Appendix 2—figure 1). By applying DATES with pairs of reference populations, we observed that DATES recovered both admixture times for target populations that had equal contributions from all three ancestral groups (Appendix 2—figure 2). In the case of unequal admixture proportions from three ancestral groups, DATES inferred the timing of the recent admixture event in most cases. In some cases, however, the inferred dates were intermediate to the two pulses when the ancestry proportion of the recent event was low (Appendix 2—figure 3). This confounding could be eliminated if the reference populations were set up to match the model of gene flow. For example, the inferred times of admixture were accurate if the two references used in DATES were: reference 1: the source population for the recent event and reference 2: pooled individuals from both ancestral populations that contributed to the first admixture event, or the intermediate admixed group formed after the first event (Appendix 2—table 1). This highlights how the choice of reference populations can help to tune the method to infer the timing of specific admixture events more reliably.

Finally, we explored the impact of more complex demographic events, including continuous admixture and founder events using coalescent simulations (Appendix 2). In the case of continuous admixture, DATES inferred an intermediate timing between the start and the end of the gene flow period, similar to other methods like ALDER and Globetrotter (27Hellenthal et al.2014; 37Loh et al.2013; Appendix 2—table 2). In the case of populations with founder events, we inferred unbiased dates of admixture in most cases except when the founder event was extreme (N_e ~ 10) or the population had maintained a low population size (N_e < 100) until the present (i.e., no recovery bottleneck) (Appendix 2—figure 4, Appendix 2—table 3). In humans, few populations have such extreme founder events, and thus, in most other cases, our inferred admixture dates should be robust to founder events 55Tournebize et al.2020. We note that while DATES is not a formal test of admixture, in simulations, we find that in the absence of gene flow, the method does not infer significant dates of admixture even if the target has a complex demographic history (Appendix 2—figure 6, Appendix 2—figure 7).

Comparison to other methods

We assessed the reliability of DATES in real data by comparing our results with published methods: Globetrotter, ALDER, and ROLLOFF. These methods are designed for the analysis of present-day samples that typically have high-quality data with limited missing variants. In addition, Globetrotter uses phased data which is challenging for ancient DNA samples. Thus, instead of rerunning other methods, we took advantage of the published results for contemporary samples presented in 27Hellenthal et al.2014. Following 27Hellenthal et al.2014, we created a merged dataset including individuals from HGDP 35Li et al.20085Behar et al.201028Henn et al.2012 (Materials and methods). We applied DATES and ALDER to 29 target groups using the reference populations reported in Table S12 in 27Hellenthal et al.2014, excluding one group where the population label was unclear. Interestingly, the majority of these groups (25/29) failed ALDER’s formal test of admixture; either because the results of the single reference and two reference analyses yielded inconsistent estimates or because the target had long-range shared LD with one of the reference populations (Appendix 1—table 5). Using DATES, we inferred significant dates of admixture in 20 groups, and 14 of those were consistent with estimates based on Globetrotter. In the case of the six populations that disagreed across the two methods, most of the populations appear to have a history of multiple pulses of gene flow either involving more than two populations (e.g., Brahui 48Pagani et al.2017) or multiple instances of contact between the same two reference groups (e.g., Mandenka 51Price et al.2009) or the model of admixture differed (e.g., recent ancient DNA studies suggest present-day Bulgarians have ancestry from western hunter-gatherers [HGs], Near Eastern farmers, and Steppe pastoralists from Eurasia 24Haak et al.2015 but were modeled as a mixture of Polish and Cypriots in Globetrotter). In case of complex admixture scenarios, the inconsistencies across the two methods are hard to interpret as Globetrotter and DATES could be capturing different events or the weighting of both events could differ. Finally, the estimated admixture timing based on DATES, ROLLOFF, and ALDER (assuming two-way admixture regardless of the formal test results) were found to be highly concordant (Appendix 1—table 5).

Fine-scale patterns of population mixtures in ancient Europe

Recent ancient DNA studies have shown that present-day Europeans derive ancestry from three distinct sources: (a) HG-related ancestry that is closely related to Mesolithic HGs from Europe; (b) Anatolian farmer-related ancestry related to Neolithic farmers from the Near East and associated to the spread of farming to Europe; and (c) Steppe pastoralist-related ancestry that is related to the Yamnaya pastoralists from Russia and Ukraine 1Allentoft et al.201524Haak et al.201553Skoglund et al.2012. Many open questions remain about the timing and dynamics of these population interactions, in particular related to the formation of the ancestral groups (which were themselves admixed) and their expansion across Europe. To characterize the spatial and temporal patterns of mixtures in Europe in the past 10,000 years, we used 1096 ancient European samples from 152 groups from the publicly available Allen Ancient DNA Resource (AADR) spanning a time range of ~8000–350 BCE (Materials and methods, Supplementary file 1A). Using DATES, we characterized the timing of the various gene flow events, and below, we describe the key events in chronological order focusing on three main periods.

Holocene to Mesolithic

Pre-Neolithic Europe was inhabited by HGs until the arrival of the first farmers from the Near East 23Haak et al.201029Hofmanová et al.2016. There was large diversity among HGs with four main groups – western hunter-gatherers (WHGs) that were related to the Villabruna cluster in central Europe, eastern hunter-gatherers (EHGs) from Russia and Ukraine related to the Upper Paleolithic group of Ancestral North Eurasians (ANEs), Caucasus hunter-gatherers (CHGs) from Georgia associated to the first farmers from Iran, and the GoyetQ2-cluster associated to the Magdalenian culture in Spain and Portugal 15Fernandes et al.201818Fu et al.201630Jones et al.201552Rivollat et al.202053Skoglund et al.2012. Most Mesolithic HGs fall on two main clines of relatedness: one cline that extends from Scandinavia to central Europe showing variable WHG-EHG ancestry, and the other in southern Europe with WHG-GoyetQ2 ancestry 52Rivollat et al.2020. The latter is already present in the 17,000 BCE El Mirón individual from Spain, suggesting that the GoyetQ2-related gene flow occurred well before the Holocene. However, the WHG-EHG cline was formed more recently during the Mesolithic period, though the precise timing remains less well understood.

To characterize the formation of the WHG-EHG cline, we used genomic data from 16 ancient HG groups (n=101) with estimated ages of ~7500–3600 BCE. We first verified the ancestry of each HG group using qpAdm that compares the allele frequency correlations between the target and a set of source populations to formally test the model of admixture and then infer the ancestry proportions for the best-fitted model 24Haak et al.2015. For each target population, we chose the most parsimonious model, that is, fitting the data with the minimum number of source populations. Consistent with previous studies, our qpAdm analysis showed that most HGs from Scandinavia, the Baltic Sea region, and central Europe could be modeled as a two-way mixture of WHG- and EHG-related ancestry (Supplementary file 2A). To confirm that the target populations do not harbor Anatolian farmer-related ancestry (that could lead to some confounding in estimated admixture dates), we applied D-statistics of the form D(Mbuti, target, WHG, Anatolian farmers) where target = Mesolithic HGs. We observed that none of the target groups had a stronger affinity to Anatolian farmers than WHG (Supplementary file 2B). Together, these results suggest that the mixtures we date below reflect pre-Neolithic contacts between the HGs.

To infer the timing of the mixtures in the history of Mesolithic European HGs, we applied DATES to HGs from Scandinavia, the Baltic regions, and central Europe using WHG- and EHG-related groups as reference populations. DATES infers the time of admixture in generations before the sample lived. Accounting for the average sampling age of the specimens and the mean human generation time of 28 years 41Moorjani et al.2016, we inferred the admixture time in years before present or in BCE (Materials and methods). We report the average dates (or median, where specified) in BCE in the main text and provide additional details in Figure 2 and Supplementary file 1B including the sample sizes, dates in generations, and BCE for each population. Among HGs, we inferred that the earliest admixture occurred in Scandinavian HGs from Norway and Sweden with a range of average dates of ~80–113 generations before the samples lived (Figure 2—figure supplement 1). This translates to admixture dates of ~10,200–8000 BCE, with the most recent dates inferred in Motala HGs from Sweden suggesting substantial substructure in HGs (Figure 2). In the Baltic region, we inferred the range of admixture dates of ~8700–6000 BCE in Latvia and Lithuania HGs, postdating the mixture in Scandinavia. In southeast Europe, the Iron Gates region of the Danube Basin shows widespread evidence of mixtures between HG groups and, in the case of some outliers, the mixture of HGs and Anatolian farmer-related ancestry as early as the Mesolithic period 14Feldman et al.2019. Further, these groups showed a strong affinity to the WHG-related ancestry in Anatolian populations, suggesting ancient interactions with Near Eastern populations 14Feldman et al.2019. We applied qpAdm to test the model of admixture in Iron Gates HG and found that the parsimonious model with WHG- and EHG-related ancestry provides a good fit to the data. Further, when we tested the model with Anatolian-related ancestry using Anatolian HG (AHG) as an additional source population, the AHG ancestry proportion was not significant (Supplementary file 2A). Applying DATES to Iron Gates HG with WHG and EHG as reference populations, we inferred this group was genetically formed in ~9200 BCE (95% confidence interval: 10,000–8400 BCE). Our samples of the Iron Gates HGs include a wide range of C14 dates between 8800 and 5700 BCE. We confirmed our dates were robust to the sampling age of the individuals as we obtained statistically consistent dates when all samples were combined as one group or when subsets of samples were grouped in bins of 500 years (Figure 2—figure supplement 2). The most recent dates of ~7500 BCE were inferred in eastern Europe in Ukraine HGs, highlighting how the WHG-EHG cline was formed over a period ~2000–3000 years (Figure 2, Supplementary file 1B).

Figure 2.

#' @width 10
#' @height 10
sf::sf_use_s2(FALSE)
data=read.table(file = "data/Figure3_data_map",header = T)
world <- ne_countries(scale = "medium", returnclass = "sf")
Europe <- world[which(world$continent == "Europe"),]
# Hunter Gatherer distribution panel
dd=data[data$col=="deepskyblue",]
ggplot(Europe) + geom_sf() + coord_sf(xlim = c(-15,40), ylim = c(35,70), expand = FALSE) + 
  geom_point(data = dd, aes(x=long,y=lat),col=dd$col,inherit.aes = FALSE,pch=dd$pc,cex=4,show.legend = F) +
  scale_fill_manual(values ="deepskyblue") +
  theme(plot.title = element_text(colour = "black"))

#Neolithic Farmer spread
dd=data[data$col=="orange2",]
ggplot(Europe) + geom_sf() + coord_sf(xlim = c(-15,40), ylim = c(35,70), expand = FALSE) + 
  geom_point(data = dd, aes(x=long,y=lat),col=dd$col,inherit.aes = FALSE,pch=dd$pc,cex=4,show.legend = F) + 
  theme(plot.title = element_text(colour = "black"))

# MLBA bronze age samples
dd=rbind(data[data$col=="green3",],data[data$col=="lightpink2",])
ggplot(world) + geom_sf() + coord_sf(xlim = c(-15,90), ylim = c(30,80), expand = FALSE) + 
  geom_point(data = dd, aes(x=long,y=lat),col=dd$col,inherit.aes = FALSE,pch=dd$pc,cex=3,show.legend = F) + 
  theme(plot.title = element_text(colour = "black"))


# Admixture time plots 
par(mai = c(1,1,0.1,3))
data=read.table(file = "data/Figure3_data_admixturetimes")
fossil_range=read.table(file = "data/Figure3_data_fossilages")
# Hunter Gatherer mixture panel
dd=rbind(data[data$V8=="deepskyblue",])
target_fossil=fossil_range[fossil_range$V2%in%as.character(dd$V1),]
group_names=unique(target_fossil$V2)
target_fossil$V2 <- as.character(target_fossil$V2)
target_fossil$V6 <- as.numeric(as.character(target_fossil$V3))-1950
for(i in group_names)
{
  k=which(dd$V1==i)
  dd[k,9]=min(as.numeric(target_fossil[target_fossil$V2==i,]$V6))
  dd[k,10]=max((as.numeric(target_fossil[target_fossil$V2==i,]$V6)))
  dd[k,11]=mean(as.numeric(target_fossil[target_fossil$V2==i,]$V6))
}
plot(x=seq(1,length(dd$V1),1),y=dd$V6,type = "p",
     col=as.character(dd$V8),pch=20,cex=2, ylim = c(4000,12200), xlim = c(0.5,length(dd$V1)+0.5),
     xaxt='n',yaxt='n',xlab = "",ylab = "Estimated admixture time (BCE)", main="");
grid(nx=8,ny=10);
segments(x0 =seq(1,length(dd$V1),1),x1 = seq(1,length(dd$V1),1),y0 =(dd$V6+dd$V7),
         y1 = (dd$V6-dd$V7),col=as.character(dd$V8),lty = 1)
segments(x0 =seq(1,length(dd$V1),1)-0.1,x1 = seq(1,length(dd$V1),1)+0.1,y0 =(dd$V6+dd$V7),
         y1 = (dd$V6+dd$V7),col=as.character(dd$V8))
segments(x0 =seq(1,length(dd$V1),1)-0.1,x1 = seq(1,length(dd$V1),1)+0.1,y0 =(dd$V6-dd$V7),
         y1 = (dd$V6-dd$V7),col=as.character(dd$V8))
axis(1, 1:length(dd$V1), lty = 1,col = "black",tck="y",labels = rep('', length(dd$V1)))
name_lab=gsub(".SG","",dd$V1)
points(x=seq(1,length(dd$V1),1),y=dd$V11,type = "p",col="grey20",pch=1,cex=1.1)
text(1:length(dd$V4), rep(3500, length(dd$V4)),
     labels= name_lab, col="black", srt=25, xpd=TRUE, adj=1,cex=1)
axis(2, seq(4000,12000,1000), lty = 1,col = "black",tck="y",labels = rep('',length(seq(4000,12000,1000))))
text(rep(-0.2,max(4000,12000,500)), seq(4000,12000,1000),labels= seq(4000,12000,1000),
     col="grey30", srt=0, xpd=TRUE, adj=-0.1,cex=0.8)
abline(v=3.5,lty=3,col="grey20")
abline(v=6.5,lty=3,col="grey20")
text(c(1,4,7),rep(12000,5), labels= c("Scandinavian HG","Baltic region HG","Central European HG"),
     col="grey30", srt=0, xpd=TRUE, adj=-0.1,cex=1)
legend(8.9,12000,legend = c("DATES in BCE ± 1SE","Average C14 radiocarbon age"),col=c("grey20","grey20"), 
       pch=c(20,1),lty=c(1,-1),xpd=T,bty='n')

# Neolithic Farmer mixture
par(mai = c(1.4,1,0.1,0.5))
dd=data[data$V8=="orange2",]
target_fossil=fossil_range[fossil_range$V2%in%as.character(dd$V1),]
group_names=unique(target_fossil$V2)
target_fossil$V2 <- as.character(target_fossil$V2)
target_fossil$V6 <- as.numeric(as.character(target_fossil$V3))-1950
for(i in group_names)
{
  k=which(dd$V1==i)
  dd[k,9]=min(as.numeric(target_fossil[target_fossil$V2==i,]$V6))
  dd[k,10]=max((as.numeric(target_fossil[target_fossil$V2==i,]$V6)))
  dd[k,11]=mean(as.numeric(target_fossil[target_fossil$V2==i,]$V6))
}
plot(x=seq(1,length(dd$V1),1),y=dd$V6,type = "p",
     col=as.character(dd$V8),pch=20,cex=2, ylim = c(2000,6800), xlim = c(1,length(dd$V1)+0.5),
     xaxt='n',yaxt='n',xlab = "",ylab = "Estimated admixture time (BCE)", main="");
grid(nx=8,ny=10);
segments(x0 =seq(1,length(dd$V1),1),x1 = seq(1,length(dd$V1),1),y0 =(dd$V6+dd$V7),
         y1 = (dd$V6-dd$V7),col=as.character(dd$V8),lty = 1)
segments(x0 =seq(1,length(dd$V1),1)-0.1,x1 = seq(1,length(dd$V1),1)+0.1,y0 =(dd$V6+dd$V7),
         y1 = (dd$V6+dd$V7),col=as.character(dd$V8))
segments(x0 =seq(1,length(dd$V1),1)-0.1,x1 = seq(1,length(dd$V1),1)+0.1,y0 =(dd$V6-dd$V7),
         y1 = (dd$V6-dd$V7),col=as.character(dd$V8))
axis(1, 1:length(dd$V1), lty = 1,col = "black",tck="y",labels = rep('', length(dd$V1)))
points(x=seq(1,length(dd$V1),1),y=dd$V11,type = "p",col="grey20",pch=1,cex=1)
name_lab=gsub("_published","",gsub("_published.DG","",gsub(".SG","",dd$V1)))
text(1:length(dd$V4), rep(1800, length(dd$V4)),
     labels= name_lab, col="black", srt=35, xpd=TRUE, adj=1,cex=0.7, cex.lab=0.7)
axis(2, seq(2000,6800,500), lty = 1,col = "black",tck="y",labels = rep('',length(seq(2000,6800,500))))
text(rep(-3.5,max(2000,6800,500)), seq(2000,6800,500),labels= seq(2000,6800,500),
     col="grey30", srt=0, xpd=TRUE, adj=-0.1,cex=0.8)
abline(v=2.5,lty=3,col="grey20")
abline(v=20.5,lty=3,col="grey20")
abline(v=23.5,lty=3,col="grey20")
abline(v=28.5,lty=3,col="grey20")
abline(v=36.5,lty=3,col="grey20")
abline(v=42.5,lty=3,col="grey20")
abline(v=47.5,lty=3,col="grey20")
abline(v=56.5,lty=3,col="grey20")
abline(v=63.5,lty=3,col="grey20")
text(c(-1.3,10,20.6,23.8,28.6,31.8,37,43,48,53,57,63.53),rep(6500,5), 
     labels= c("Balkans","Hungary","Czech","Germany","Poland","Ukraine",
               "France","Italy","Spain","Portugal","Britain countries","Scandinavia"),col="grey30", srt=0, xpd=TRUE, adj=-0.1,cex=0.8)
text(c(20.6,23.6),rep(6200,5), labels= c("Republic","Austria"),
     col="grey30", srt=0, xpd=TRUE, adj=-0.1,cex=0.8)

# Bronze age steppe mixture
par(mai = c(2.6,1,0.1,2.8))
dd=rbind(data[data$V8=="green3",],data[data$V8=="lightpink2",])
dd$V8=gsub("springgreen","green3",dd$V8)
target_fossil=fossil_range[fossil_range$V2%in%as.character(dd$V1),]
group_names=unique(target_fossil$V2)
target_fossil$V2 <- as.character(target_fossil$V2)
target_fossil$V6 <- as.numeric(as.character(target_fossil$V3))-1950
pch_str=as.numeric(dd$V9)
for(i in group_names)
{
  k=which(dd$V1==i)
  dd[k,10]=min(as.numeric(target_fossil[target_fossil$V2==i,]$V6))
  dd[k,11]=max((as.numeric(target_fossil[target_fossil$V2==i,]$V6)))
  dd[k,12]=mean(as.numeric(target_fossil[target_fossil$V2==i,]$V6))
}
plot(x=seq(1,length(dd$V1),1),y=dd$V6,type = "p",
     col=dd$V8,pch=dd$V9,cex=1.5, ylim = c(500,5000), xlim = c(1,length(dd$V1)+0.5),
     xaxt='n',yaxt='n',xlab = "",ylab = "Estimated admixture time (BCE)", main="");
grid(nx=8,ny=10);
segments(x0 =seq(1,length(dd$V1),1),x1 = seq(1,length(dd$V1),1),y0 =(dd$V6+dd$V7),
         y1 = (dd$V6-dd$V7),col=dd$V8,lty = 1)
segments(x0 =seq(1,length(dd$V1),1)-0.1,x1 = seq(1,length(dd$V1),1)+0.1,y0 =(dd$V6+dd$V7),
         y1 = (dd$V6+dd$V7),col=dd$V8)
segments(x0 =seq(1,length(dd$V1),1)-0.1,x1 = seq(1,length(dd$V1),1)+0.1,y0 =(dd$V6-dd$V7),
         y1 = (dd$V6-dd$V7),col=dd$V8)
axis(1, 1:length(dd$V1), lty = 1,col = "black",tck="y",labels = rep('', length(dd$V1)))
points(x=seq(1,length(dd$V1),1),y=dd$V12,type = "p",col="grey20",pch=1,cex=0.8)
name_lab=gsub(".SG","",dd$V1)
text(1:length(dd$V4), rep(350, length(dd$V4)),
     labels= name_lab, col="black", srt=45, xpd=TRUE, adj=1,cex=1)
axis(2, seq(500,5000,500), lty = 1,col = "black",tck="y",labels = rep('',length(seq(500,5000,500))))
text(rep(-2,max(500,5000,500)), seq(500,5000,500),labels = c(seq(500,5000,500)),
     col="grey30", srt=0, xpd=TRUE, adj=-0.1,cex=1)
abline(v=7.5,lty=3,col="grey20")
abline(v=26.5,lty=3,col="grey20")
text(c(1.6,15,29),rep(4900,5),
     labels= c("Late Neolithic","Chacolithic to Bronze Age","Middle to Late Bronze Age"),col="grey30", srt=0, xpd=TRUE, adj=-0.1,cex=0.8)
legend(35.5,5000,legend = c("Corded Ware complex","Bell Beaker complex"),
       col=c(rep("green3",2)),pch=c(15,17),lty=c(1,1),xpd=T,cex=1)

Timeline of admixture events in ancient Europe.

We applied DATES (Distribution of Ancestry Tracts of Evolutionary Signals) to ancient samples from Europe. In the right panel, we show the sampling locations of the ancient specimens, and in the left panel, we show the admixture dates for each target group listed on the X-axis. The inferred dates in generations were converted to dates in BCE by assuming a mean generation time of 28 years 40Moorjani et al.2011 and accounting for the average sampling age (shown as gray dots) of all ancient individuals in the target group (Materials and methods). The top panel shows the formation of western hunter-gatherer (WHG)-eastern hunter-gatherer (EHG) cline (in blue) using Mesolithic hunter-gatherers (HGs) as the target and EHG and WHG as reference populations. The middle panel shows admixture dates of local HGs and Anatolian farmers (in orange) using Neolithic European groups as targets and Anatolian farmers-related groups and WHG-related groups as reference populations. The bottom panel shows the spread of Steppe pastoralist-related ancestry (in green) estimated using middle and late Neolithic, Chalcolithic, and Bronze Age samples from Europe as target populations and early Steppe pastoralist-related groups (Afanasievo and Yamnaya Samara) and a set of Anatolian farmers and WHG-related groups as reference populations. For the middle to late Bronze Age (MLBA) samples from Eurasia, we used the early Steppe pastoralist-related groups and the Neolithic European groups as reference populations. The cultural affiliation (Corded Ware Complex [CWC], Bell Beaker complex [BBC], or Steppe MLBA cultures) of the individuals is shown in the legend. See Figure 2—figure supplements 1 and !number(2) we applied DATESfor decay curves for all samples and stratified datesfor Iron Gates HGs.

Figure 2—figure supplement 1.

#' @width 10
#' @height 10
NRMSD <- function(y, yfit,na.rm = TRUE) {
  # y is the vector of empirical values
  # yfit is the vector of fitted values
  if (length(y) != length(yfit)) stop('y and yfit should have the same length')
  if (na.rm) {
    isna = is.na(y) | is.na(yfit)
    y = y[!isna]
    yfit = yfit[!isna]
  }
  nrmsd = sqrt(mean( (yfit - y)**2, na.rm = na.rm )) * (max(yfit, na.rm = na.rm) - min(yfit, na.rm = na.rm))**(-1)
  return(nrmsd)
}
#Usage: nmrds=round(NRMSD(data$V2,data$V3),4)
admix_dates=read.table(file = "data/Figure3_supplement1_admixture_dates")
par(mfrow=c(6,4),oma=c(0,0,3.5,0))
layout(matrix(seq(1,24,1), nrow = 6), heights=c(1,1))
# HG decay curves
dates_files=read.table(file = "data/Figure3_supplement1_HG_curves")
for (i in 1:nrow(dates_files)) 
{
  file=dates_files$V1[i]
  path="data/Figure3_supplement1_decay_files/"; var=paste(path,file,sep = "/")
  mm=unlist(strsplit(as.character(file),'/',fixed=TRUE))[1]
  mm1=unlist(strsplit(as.character(mm),'.',fixed=TRUE))[1]
  data=read.table(file = var)
  jout=gsub("fit", "jout", file); time=paste(path,jout,sep = "/")
  estimate=read.table(file = time)
  name=gsub("estimate_","",gsub(".fit", "", file))
  len=paste("Estimate:",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "+/-"),sep = " ")
  title_name=gsub("EHG_WHG_","",mm1)
  bce=admix_dates[admix_dates$Population==title_name,]
  main_t=paste("Target:",paste(title_name,paste(paste("(n=",bce$n,sep=""),")",sep=""),sep=" "),"\n","References:EHG-WHG")
  nmrsd=round(NRMSD(data$V2,data$V3),4)
  if(nmrsd>0.7) {kol="grey20"} else {kol="deepskyblue"};
  plot(x=data$V1,y=data$V2,type="p",col=kol,pch="*",xlim = c(0,20),xlab = "Genetic Distance(cM)",
       ylab = "Ancestry covariance",las=1,main = main_t,cex=1,cex.main=0.8)
  lines(x=data$V1,y=data$V3,type = "l",col=kol,lty=2)
  len=paste(paste(paste(paste("DATES estimate (gen)",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "±"),sep = ":"),
                  paste("DATES estimate (BCE)",paste(bce$DATES_mean.BCE.,bce$DATES_SE.BCE.,sep=" ± "),sep=":"),sep = "\n"),
            paste("NRMSD",nmrsd,sep="="),sep = "\n"))
  legend("topright",legend = len,col = kol,bty='n',cex=0.7)
}
### Farmer formation 
data=read.table(file = "data/Figure3_supplement1_decay_files/AAF.fit")
jout=read.table(file = "data/Figure3_supplement1_decay_files/AAF.jout")

plot(x=data$V1,y=data$V2,type="p",col="darkorchid",pch="*",xlim = c(0,20),xlab = "Genetic Distance(cM)",
     las=1,main = paste("Target: Early Anatolian farmers (n=9)","References: Levant_pooled-Iran_N_pooled",sep="\n")
     ,cex=1,ylab ="",cex.main=0.8)
title(ylab = "Ancestry covariance", mgp = c(4.2, 1, 0)) 
lines(x=data$V1,y=data$V3,type = "l",col="darkorchid",lty=2)
nmrsd=round(NRMSD(data$V2,data$V3),4)
len=paste(paste(paste("DATES estimate (gen)",paste(round(jout$V2,0),round(jout$V5,0),sep = " ± "),sep = ": "),
                paste("DATES estimate (BCE)",paste(round(jout$V2,0)*28+8071,round(jout$V5,0)*28,sep = " ± "),sep = ": "),sep = "\n"),
          paste("NRMSD",nmrsd,sep="="),sep = "\n")
legend("topright",legend = len,col = "darkorchid",lty=c(2,-1,-1),cex=0.7,bty='n')

dates_files=read.table(file = "data/Figure3_supplement1_FF_curves")
mt=c("Iran_N-Anatolian_N","Iran_N-Anatolian_N")
for (i in 1:nrow(dates_files)) 
{
  file=dates_files$V1[i]
  path="data/Figure3_supplement1_decay_files" ; var=paste(path,file,sep = "/")
  mm=unlist(strsplit(as.character(file),'/',fixed=TRUE))[1]
  mm1=unlist(strsplit(as.character(mm),'-',fixed=TRUE))[1]
  data=read.table(file = var)
  jout=gsub("fit", "jout", file); time=paste(path,jout,sep = "/")
  estimate=read.table(file = time)
  name=gsub("estimate_","",gsub(".fit", "", file))
  title_name=gsub("published_merged","Anatolian Farmer",gsub("Anatolia_N_Serbia_Iron_Gates_HG_","",mm1))
  bce=admix_dates[admix_dates$Population==title_name,]
  main_t=paste("Target:",paste(title_name,paste(paste("(n=",bce$n,sep=""),")",sep=""),sep=" "),"\n",mt[i])
  len=paste("Estimate:",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "+/-"),sep = " ")
  nmrsd=round(NRMSD(data$V2,data$V3),4)
  if(nmrsd>0.7) {kol="grey20"} else {kol="firebrick"};
  plot(x=data$V1,y=data$V2,type="p",col=kol,pch="*",xlim = c(0,20),xlab = "Genetic Distance(cM)",
       ylab = "Ancestry covariance",las=1,main =main_t ,cex=1,cex.main=0.8)
  lines(x=data$V1,y=data$V3,type = "l",col=kol,lty=1)
  len=paste(paste(paste("DATES estimate (gen)",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "±"),sep = ":"),
                  paste("DATES estimate (BCE)",paste(bce$V9,bce$V10,sep=" ± "),sep=":"),sep = "\n"),
            paste("NRMSD",nmrsd,sep="="),sep = "\n")
  legend("topright",legend = len,col = kol,bty='n',cex=0.7)
}
### Farmer spread
dates_files=read.table(file = "data/Figure3_supplement1_Neolithic_curves")
for (i in 1:nrow(dates_files)) 
{
  file=dates_files$V1[i]
  path="data/Figure3_supplement1_decay_files" ; var=paste(path,file,sep = "/")
  mm=unlist(strsplit(as.character(file),'/',fixed=TRUE))[1]
  mm1=unlist(strsplit(as.character(mm),'-',fixed=TRUE))[1]
  refm=paste(unlist(strsplit(as.character(mm),'-',fixed=TRUE))[2],"-",unlist(strsplit(as.character(mm),'-',fixed=TRUE))[3])
  data=read.table(file = var)
  jout=gsub("fit", "jout", file); time=paste(path,jout,sep = "/")
  estimate=read.table(file = time)
  name=gsub("estimate_","",gsub(".fit", "", file))
  bce=admix_dates[admix_dates$Population==mm1,]
  main_t=paste("Target:",paste(mm1,paste(paste("(n=",bce$n,sep=""),")",sep=""),sep=" "),"\n","References:",gsub("Turkey_N","AnatolianN",refm))
  len=paste("Estimate:",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "+/-"),sep = " ")
  nmrsd=round(NRMSD(data$V2,data$V3),4)
  if(nmrsd>0.7) {kol="grey20"} else {kol="orange2"};
  plot(x=data$V1,y=data$V2,type="p",col=kol,pch="*",xlim = c(0,20),xlab = "Genetic Distance(cM)",
       ylab = "Ancestry covariance",las=1,main = main_t,cex=1,cex.main=0.8)
  lines(x=data$V1,y=data$V3,type = "l",col=kol,lty=1)
  len=paste(paste(paste("DATES estimate (gen)",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "±"),sep = ":"),
                  paste("DATES estimate (BCE)",paste(bce$V9,bce$V10,sep=" ± "),sep=":"),sep = "\n"),
            paste("NRMSD",nmrsd,sep="="),sep = "\n")
  legend("topright",legend = len,col = kol,bty='n',cex=0.7)
}

### Steppe formation- EBA
dates_files=read.table(file = "data/Figure3_supplement1_Steppe_formation")
for (i in 1:nrow(dates_files)) 
{
  file=dates_files$V1[i]
  path="data/Figure3_supplement1_decay_files" ; var=paste(path,file,sep = "/")
  mm=unlist(strsplit(as.character(file),'/',fixed=TRUE))[1]
  mm1=unlist(strsplit(as.character(mm),'-',fixed=TRUE))[1]
  data=read.table(file = var)
  jout=gsub("fit", "jout", file); time=paste(path,jout,sep = "/")
  estimate=read.table(file = time)
  name=gsub("estimate_","",gsub(".fit", "", file))
  bce=admix_dates[admix_dates$Population==mm1,]
  main_t=gsub("_pub","",paste("Target:",paste(mm1,paste(paste("(n=",bce$n,sep=""),")",sep=""),sep=" "),"\n","References:Iran_N_pooled-EHG_pooled"))
  len=paste("Estimate:",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "+/-"),sep = " ")
  nmrsd=round(NRMSD(data$V2,data$V3),4)
  if(nmrsd>0.7) {kol="grey20"} else {kol="deeppink"};
  plot(x=data$V1,y=data$V2,type="p",col=kol,pch="*",xlim = c(0,20),xlab = "Genetic Distance(cM)",
       ylab = "Ancestry covariance",las=1,main = main_t,cex=1.2,cex.main=0.8)
  lines(x=data$V1,y=data$V3,type = "l",col=kol,lty=1)
  len=paste(paste(paste("DATES estimate (gen)",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "±"),sep = ":"),
                  paste("DATES estimate (BCE)",paste(bce$V9,bce$V10,sep=" ± "),sep=":"),sep = "\n"),
            paste("NRMSD",nmrsd,sep="="),sep = "\n")
  legend("topright",legend = len,col = kol,bty='n',cex=0.7)
}
### Steppe formation - MLBA
dates_files=read.table(file = "data/Figure3_supplement1_Steppe_MLBA")
for (i in 1:nrow(dates_files)) 
{
  file=dates_files$V1[i]
  path="data/Figure3_supplement1_decay_files" ; var=paste(path,file,sep = "/")
  mm=unlist(strsplit(as.character(file),'/',fixed=TRUE))[1]
  mm1=unlist(strsplit(as.character(mm),'-',fixed=TRUE))[1]
  data=read.table(file = var)
  jout=gsub("fit", "jout", file); time=paste(path,jout,sep = "/")
  estimate=read.table(file = time)
  name=gsub("estimate_","",gsub(".fit", "", file))
  bce=admix_dates[admix_dates$Population==mm1,]
  main_t=paste("Target:",paste(mm1,paste(paste("(n=",bce$n,sep=""),")",sep=""),sep=" "),"\n","References:Steppe EBA-Neolithic groups")
  len=paste("Estimate:",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "+/-"),sep = " ")
  nmrsd=round(NRMSD(data$V2,data$V3),4)
  if(nmrsd>0.7) {kol="grey20"} else {kol="lightpink"};
  plot(x=data$V1,y=data$V2,type="p",col=kol,pch="*",xlim = c(0,20),xlab = "Genetic Distance(cM)",
       ylab = "Ancestry covariance",las=1,main = main_t,cex=1,cex.main=0.8)
  lines(x=data$V1,y=data$V3,type = "l",col=kol,lty=1)
  len=paste(paste(paste("DATES estimate (gen)",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "±"),sep = ":"),
                  paste("DATES estimate (BCE)",paste(bce$V9,bce$V10,sep=" ± "),sep=":"),sep = "\n"),
            paste("NRMSD",nmrsd,sep="="),sep = "\n")
  legend("topright",legend = len,col = kol,bty='n',cex=0.7)
}
### Steppe spread
dates_files=read.table(file = "data/Figure3_supplement1_Steppe_spread")
for (i in 1:nrow(dates_files)) 
{
  file=dates_files$V1[i]
  path="data/Figure3_supplement1_decay_files/" ; var=paste(path,file,sep = "/")
  mm=unlist(strsplit(as.character(file),'/',fixed=TRUE))[1]
  mm1=unlist(strsplit(as.character(mm),'-',fixed=TRUE))[1]
  data=read.table(file = var)
  jout=gsub("fit", "jout", file); time=paste(path,jout,sep = "/")
  estimate=read.table(file = time)
  name=gsub("estimate_","",gsub(".fit", "", file))
  title_name=gsub("Afanasievo_Anatolia_N_","",mm1)
  bce=admix_dates[admix_dates$Population==title_name,]
  main_t=paste("Target:",paste(mm1,paste(paste("(n=",bce$n,sep=""),")",sep=""),sep=" "),"\n","References:Steppe groups-(WHG+AnatoliaN)")
  len=paste("Estimate:",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "+/-"),sep = " ")
  nmrsd=round(NRMSD(data$V2,data$V3),4)
  if(nmrsd>0.7) {kol="grey20"} else {kol="green3"};
  plot(x=data$V1,y=data$V2,type="p",col=kol,pch="*",xlim = c(0,20),xlab = "Genetic Distance(cM)",
       ylab = "Ancestry covariance",las=1,main = main_t,cex=1,cex.main=0.7)
  lines(x=data$V1,y=data$V3,type = "l",col=kol,lty=1)
  len=paste(paste(paste("DATES estimate (gen)",paste(round(estimate$V2,0),round(estimate$V5,0),sep = "±"),sep = ":"),
                  paste("DATES estimate (BCE)",paste(bce$V9,bce$V10,sep=" ± "),sep=":"),sep = "\n"),
            paste("NRMSD",nmrsd,sep="="),sep = "\n")
  legend("topright",legend = len,col = kol,bty='n',cex=0.8)
}

DATES (Distribution of Ancestry Tracts of Evolutionary Signals) ancestry covariance decay curves.

We show the weighted ancestry covariance decay curves generated using DATES for all the target groups analyzed in the study. Each subplot shows the decay curve for one target population with the associated reference groups shown in the title. For each target, in the legend, we show the inferred average dates of admixture (±1 SE) in generations before the individual lived, in BCE that accounts for the average age of all the individuals in the target and the mean generation time of human populations (see Materials and methods). We also show the normalized root-mean-square deviation (NRMSD) values for all fitted curves and the plots with NRMSD >0.7 are shown in gray. For consistency, we use the same colors as Figure 2.

Figure 2—figure supplement 2.

#' @width 9
#' @height 6
dd=read.table(file = "data/Figure3_supplement2")
plot(seq(1,NROW(dd$V1),1),dd$V8,col=as.character(dd$V10),pch=15, 
     ylim =c(3000,19000),las=1, xaxt='n',xlab="",ylab="Admixture time in years BCE",
     main = "Admixture dates in Iron_Gates samples grouped by c14 age in bins of 500 years"); grid()
segments(x0 =seq(1,NROW(dd$V1),1),x1 = seq(1,NROW(dd$V1),1),y0 =(dd$V8+2*dd$V9),
         y1 = (dd$V8-2*dd$V9),col=as.character(dd$V10),lty = 1)
segments(x0 =seq(1,NROW(dd$V1),1)-0.1,x1 = seq(1,NROW(dd$V1),1)+0.1,y0 =(dd$V8+2*dd$V9),
         y1 = (dd$V8+2*dd$V9),col=as.character(dd$V10))
segments(x0 =seq(1,NROW(dd$V1),1)-0.1,x1 = seq(1,NROW(dd$V1),1)+0.1,y0 =(dd$V8-2*dd$V9),
         y1 = (dd$V8-2*dd$V9),col=as.character(dd$V10))
points(seq(1,NROW(dd$V1),1),dd$V7)
segments(x0 =seq(1,NROW(dd$V1),1),x1 = seq(1,NROW(dd$V1),1),y0 =(dd$V2-1950),
         y1 = (dd$V3-1950))
segments(x0 =seq(1,NROW(dd$V1),1)-0.1,x1 = seq(1,NROW(dd$V1),1)+0.1,y0 =(dd$V2-1950),
         y1 = (dd$V2-1950))
segments(x0 =seq(1,NROW(dd$V1),1)-0.1,x1 = seq(1,NROW(dd$V1),1)+0.1,y0 =(dd$V3-1950),
         y1 = (dd$V3-1950))
axis(1, 1:NROW(dd$V1), lty = 1,col = "black",tck="y",labels = rep('', NROW(dd$V1)))
text(1:NROW(dd$V1), rep(2500, NROW(dd$V4)),
     labels= gsub("IronGates-","",dd$V1), col="black", srt=25, xpd=TRUE, adj=1,cex=1)
legend("topright",legend = c("WHG-EHG admixture per c14 bin","WHG-EHG admixture in all samples","average c14 ages"),
       col = c("blue","cyan2","grey40"),lty=c(1,1,1),pch=c(15,15,1))

Timing of western hunter-gatherer (WHG) and eastern hunter-gatherer (EHG) admixture in Iron Gates hunter-gatherer (HG) samples.

The time of admixture in Iron Gates HG samples grouped in bins of C14 age of 500 years. The C14 age is shown on X-axis and the admixture time in BCE for corresponding samples is shown on the Y-axis.

Early to middle Neolithic

Neolithic farming began in the Near East – the Levant, Anatolia, and Iran – and spread to Europe and other parts of the world 23Haak et al.201034Kılınç et al.201653Skoglund et al.2012. The first farmers of Europe were related to Anatolian farmers, whose origin remains unclear. The early Neolithic Anatolian farmers (Aceramic Anatolian farmers) had majority ancestry from AHG with some gene flow from the first farmers from Iran 14Feldman et al.2019. AHG, in turn, had ancestry from Levant HG (Natufians) and some mysterious HG group related to the ancestors of WHG individuals from central Europe – a gene flow event that likely occurred in the late Pleistocene 14Feldman et al.2019. Using qpAdm, we confirmed that early Anatolian farmers could be modeled as a mixture of AHG and Iran Neolithic farmer-related groups (Supplementary file 2C). To learn about the timing of the genetic formation of early Anatolian farmers, we applied DATES using Iran Neolithic farmer-related individuals and other reference as groups with AHG ancestry. Since there are limited samples of AHG ancestry, we instead used pooled individuals of WHG-related and Levant Neolithic farmer-related individuals to represent the main ancestry components of AHG. We note that the application of DATES to three-way admixed groups such as early Anatolian farmers can lead to intermediate dates between the first and second pulse of gene flow unless the reference populations are chosen carefully (Appendix 2—table 1). Our setup with pooled reference populations should recover the timing of the most recent event (in this case, the gene flow from CHG or Iran Neolithic-related groups) reliably. We infer the Iran Neolithic farmer-related gene flow occurred ~10,900 BCE (12,200–9600 BCE) (Figure 3), predating the origin of farming in Anatolia 9Bramanti et al.2009. During the subsequent millennia, these early farmers further admixed with Levant Neolithic groups to form Anatolian Neolithic farmers who spread towards the west to Europe and in the east to mix with Iran Neolithic farmers, forming the Chalcolithic groups of Seh Gabi and Hajji Firuz (Supplementary file 2C). Using DATES, we inferred that these Chalcolithic groups were genetically formed in ~7600–5700 BCE (Supplementary file 1B).

Figure 3.

#' @width 6
#' @height 12
dd=read.table(file = "data/Figure2_Data_map" ,header = T)
world <- ne_countries(scale = "medium", returnclass = "sf")
ggplot(world) + geom_sf() + coord_sf(xlim = c(-25,120), ylim = c(35,80), expand = FALSE) + 
  geom_point(data = dd, aes(x=long,y=lat),col=dd$col,inherit.aes = FALSE,pch=18,cex=4,show.legend = F) +
  geom_text_repel(data=dd,aes(x=long+10,y=lat),label=dd$sample,cex=4)+
  theme_bw()
# decay curves
par(mai = c(1,1.1,0.5,0.5), mfrow=c(1,2))
data=read.table(file = "data/Figure2_Data_AnatoliaFarmer.fit", header = F)
jout=read.table(file = "data/Figure2_Data_AnatoliaFarmer.jout",header = F)
plot(x=data$V1,y=data$V2,type="p",col="darkorchid",pch="*",xlim = c(0,20),xlab = "Genetic Distance(cM)",
     las=1,main = "Early Anatolian farmers",cex=2,cex.main=1.5,cex.axis=1.2,cex.lab=1.2,
     ylab ="")
title(ylab = "Ancestry covariance", mgp = c(4.2, 1, 0)) 
lines(x=data$V1,y=data$V3,type = "l",col="darkorchid",lty=2)
nrmsd=round(NRMSD(data$V2,data$V3),4)
len=paste(paste(paste("DATES estimate (gen)",paste(round(jout$V2,0),round(jout$V5,0),sep = " ± "),sep = ": "),
                paste("DATES estimate (BCE)",paste(round(jout$V2,0)*28+8071,round(jout$V5,0)*28,sep = " ± "),sep = ": "),sep = "\n"),
          paste("NRMSD",nrmsd,sep="="),sep = "\n")
legend("topright",legend = len,col = "darkorchid",lty=c(2,-1,-1), cex=1,bty='n')

data=read.table(file = "data/Figure2_Data_SteppeFarmer.fit",header = F)
jout=read.table(file = "data/Figure2_Data_SteppeFarmer.jout",header = F)
plot(x=data$V1,y=data$V2,type="p",col="deeppink",pch="*",xlim = c(0,20),xlab = "Genetic Distance(cM)",
     las=1,main = "Early Steppe Pastoralists",cex=2,cex.main=1.5,cex.axis=1.2,cex.lab=1.2,ylab="")
title(ylab = "Ancestry covariance", mgp = c(4.2, 1, 0)) 
lines(x=data$V1,y=data$V3,type = "l",col="deeppink",lty=2)
nrmsd=round(NRMSD(data$V2,data$V3),4)
len=paste(paste(paste("DATES estimate (gen)",paste(round(jout$V2,0),round(jout$V5,0),sep = " ± "),sep = ": "),
                paste("DATES estimate (BCE)",paste(round(jout$V2,0)*28+2881,round(jout$V5,0)*28,sep = " ± "),sep = ": "),sep = "\n"),
          paste("NRMSD",nrmsd,sep="="),sep = "\n")
legend("topright",legend = len,col = "deeppink",lty=c(2,-1,-1), cex=1,bty='n')

Genetic formation of early Anatolian farmers and early Bronze Age Steppe pastoralists.

The top panel shows a map with sampling locations of the target groups analyzed for admixture dating. The bottom panels show the inferred times of admixture for each target using DATES (Distribution of Ancestry Tracts of Evolutionary Signals) by fitting an exponential function with an affine term $y = A e^{- λ d} + c$ , where d is the genetic distance in Morgans and $λ$ = (t+1) is the number of generations since admixture (t) (Materials and methods). We start the fit at a genetic distance (d) >0.5 cM (centiMorgans) to minimize confounding with background LD and estimate a standard error by performing a weighted block jackknife removing one chromosome in each run. For each target, in the legend, we show the inferred average dates of admixture (±1 SE) in generations before the individual lived, in BCE accounting for the average age of all the individuals and the mean human generation time, and the normalized root-mean-square deviation (NRMSD) values to assess the fit of the exponential curve (Materials and methods). The bottom left shows the ancestry covariance decay curve for early Anatolian farmers inferred using one reference group as a set of pooled individuals of western hunter-gatherer (WHG)-related and Levant Neolithic farmers-related individuals as a proxy of Anatolian hunter-gatherer (AHG) ancestry and the second reference group containing Iran Neolithic farmer-related individuals. The bottom right shows the ancestry covariance decay curve for early Steppe pastoralists groups, including all Yamnaya and Afanasievo individuals as the target group and eastern hunter-gatherer (EHG)-related and Iran Neolithic farmer-related groups as reference populations.

In Europe, the Anatolian Neolithic farmers mixed with the local indigenous HGs contributing between ~40% and 98% of ancestry to the Neolithic Europeans. To elucidate the fine-scale patterns and regional dynamics of these mixtures, we applied DATES to time transect samples from 94 groups (n=657) sampled from 16 regions in Europe, ranging from ~6000 to 1900 BCE and encompassing individuals from the early Neolithic to Chalcolithic periods (Supplementary file 1A). Using qpAdm, we first confirmed that the Neolithic Europeans could be modeled as a mixture of European HG-related ancestry and Anatolian farmer-related ancestry and inferred their ancestry proportions (Supplementary file 2D). For most target populations (~80%), we found the model of gene flow between Anatolian farmer-related and WHG-related ancestry provided a good fit to the data (p-value > 0.05). In some populations, we found variation in the source of the HG-related ancestry and including either EHG- or GoyetQ2-related ancestry groups improved the fit of the model. In five groups, none of the models fit, despite excluding outlier individuals whose ancestry profile differed from the majority of the individuals in the group (Supplementary file 2E). To confirm that the target populations do not harbor Steppe pastoralist-related ancestry, we applied D-statistics of the form D(Mbuti, target, Anatolian farmers, Steppe pastoralists) where target = Neolithic European groups. We observed that four groups had a stronger affinity to Steppe pastoralists compared to Anatolian farmers, and hence we excluded these from further analysis (Supplementary file 2F). After filtering, we applied DATES to 86 European Neolithic groups using WHG-related individuals and Anatolian farmers as reference populations.

Earlier analysis has suggested that farming spread along two main routes in Europe, from southeast to central Europe (‘continental route’) and along the Mediterranean coastline to Iberia (‘coastal route') 20Gronenborn et al.201421Guilaine et al.200352Rivollat et al.2020. Consistent with this, we inferred one of the earliest timings of gene flow was in the Balkans around 6400 BCE. Using the most comprehensive time-transect in Hungary with 19 groups (n=63) spanning from middle Neolithic to late Chalcolithic, we inferred the admixture dates ranged from ~6100 to 4500 BCE. Under a model of a single shared gene flow event in the common ancestors of all individuals, we would expect to obtain similar dates of admixture (before present) after accounting for the age of the ancient specimens. Similar to 36Lipson et al.2017, we observed that the estimated dates in middle Neolithic individuals were substantially older than those inferred in late Neolithic or Chalcolithic individuals 7Bollongino et al.2013. This would be expected if the underlying model of gene flow involved multiple pulses of gene flow, such that the timing in the middle Neolithic samples reflects the initial two-way mixture and the timing in the Chalcolithic samples captures both recent and older events. Interestingly, 36Lipson et al.2017, and other recent studies have documented increasing HG ancestry from ~3% to 15% from the Neolithic to Chalcolithic period 24Haak et al.201536Lipson et al.201752Rivollat et al.2020, suggesting that there was additional HG gene flow after the initial mixture. This highlights that the interactions between local HGs and incoming Anatolian farmers were complex with multiple gene flow events or continuous admixture between these two groups, which explains the increasing HG ancestry and more recent dates in Chalcolithic individuals (Supplementary file 2D).

Mirroring the pattern in Hungary, we documented the resurgence of HG ancestry in the Czech Republic, France, Germany, and southern Europe. In central Europe, we inferred that the Anatolian farmer-related gene flow ranged between ~5600 and 5000 BCE across Germany and Czech Republic, with some exceptions. For instance, in the Blätterhöhle site from Germany, the inferred dates were more recent (~4000 BCE), consistent with the occupation of both HGs and farmers in this region until the late Neolithic 36Lipson et al.2017. In eastern Europe, using samples related to the Funnel Beaker culture (TRB; from German Trichterbecher) from Poland, we dated the Anatolian farmer-related gene flow occurred on average ~4700 BCE (5300–4200 BCE). Following the TRB decline, the Baden culture and the Globular Amphora culture appeared in many areas of Poland and Ukraine 15Fernandes et al.2018. These cultures had close contact with the Corded Ware complex (CWC) and Steppe pastoralists’ societies, though we found a parsimonious model without Steppe pastoralist-related ancestry provides a good fit to the GAC individuals (Supplementary file 2D). Applying DATES, we inferred the Anatolian farmer and HG-related mixture in GAC ranged between ~4700 and 3900 BCE, predating the spread of Steppe pastoralists to eastern Europe 1Allentoft et al.201524Haak et al.2015.

Along the Mediterranean route, we characterized Anatolian farmer-related gene flow in Italy, Iberia, France, and the British Isles. Using samples from five groups in Italy, we inferred the earliest dates of gene flow of ~6100 BCE, and within the millennium, the Anatolian farmer-related ancestry spread from Sardinia to Sicily (Figure 2). In Iberia, the Anatolian farmer-related mixture ranged from ~5700 to 4300 BCE and showed evidence for an increase in HG ancestry from ~9% to 20% after the initial gene flow. In France, previous studies have shown that Anatolian farmer-related ancestry came from both routes, along the continental route in the north and along the costal route in the south 52Rivollat et al.2020. This is reflected in the source of the HG ancestry, which is predominantly EHG and WHG-related in the north and includes WHG and Goyet-Q2 ancestry in the south 52Rivollat et al.2020. Consistently, we also observed that the admixture dates in France were structured along these routes, with the median estimate of ~5100 BCE in the east and much older ~5500 BCE in the south (Supplementary file 1B). In Scandinavia, we inferred markedly more recent dates of admixture of ~4300 BCE using samples from Sweden associated with the TRB culture and Ansarve Megalithic tombs, consistent with a late introduction of farming to Scandinavia 39Mittnik et al.2018.

Finally, we inferred recent dates of admixture in Neolithic samples from the British Isles (England, Scotland, and Ireland) with the median timing of ~5000 BCE across the three regions. Interestingly, unlike in western and southern Europe, we obtained overlapping dates across eight groups including early to late Neolithic samples from British Isles. This is consistent with previous studies that suggest there was no resurgence in HG ancestry during the Neolithic in Britain 8Brace et al.2019. Thus our dates can be interpreted as the time of the main mixture of HGs and Anatolian farmers in this region, implying that the farmer-related ancestry reached Britain a millennium after its arrival in continental Europe. By 4300 BCE, we find that Anatolian farmer-related ancestry is present in nearly all regions in Europe.

Late Neolithic to Bronze Age

The beginning of the Bronze Age (BA) was a period of major cultural and demographic change in Eurasia, accompanied by the spread of Yamnaya Steppe pastoralist-related ancestry from Pontic-Caspian steppes across Europe and South Asia 24Haak et al.2015. The archaeological record documents that the early Steppe pastoralists cultures of Yamnaya and Afanasievo, with characteristic burial styles and pottery, appeared ~3300–2600 BCE 42Morgunova and Khokhlova2016. These groups were formed as a mixture of EHG-related individuals and CHG-related groups associated with the first farmers from Iran 30Jones et al.201543Narasimhan et al.201956Wang et al.2019. Using qpAdm, we first tested how well this model fits the data from eight early Steppe pastoralist groups, including seven groups associated with Yamnaya culture and one group related to the Afanasievo culture (Materials and methods). For all but two Yamnaya groups (from Hungary Baden and Russia Kalmykia), we found this model provides a good fit to the data (Supplementary file 2G). We note that the samples from Kalmykia in our dataset were shotgun sequenced, and in the qpAdm analysis, we are mixing shotgun and capture data that could potentially lead to technical issues. To understand the timing of the formation of the early Steppe pastoralist-related groups, we applied DATES using pooled EHG-related and pooled Iranian Neolithic farmer-related individuals. Focusing on the groups with the largest sample sizes, Yamnaya Samara (n=10) and Afanasievo (n=19), we inferred the admixture occurred between 40 and 45 generations before the individuals lived, translating to an admixture timing of ~4100 BCE (Supplementary file 1B). We obtained qualitatively similar dates across four Yamnaya and one Afanasievo groups, consistent with the findings that these groups descend from a recent common ancestor (we note for the Ozera samples from Ukraine, the dates were not significant). This is also further supported by the insight that the genetic differentiation across early Steppe pastoralist groups is very low (F_ST ~ 0.000–0.006) (Supplementary file 2H). Thus, we combined all early Steppe pastoralist individuals in one group to obtain a more precise estimate for the genetic formation of proto-Yamnaya of ~4400–4000 BCE (Figure 3). These dates are noteworthy as they predate the archaeological evidence by more than a millennium 2Anthony2007 and have important implications for understanding the origin of proto-Pontic Caspian cultures and their spread to Europe and South Asia.

Over the following millennium, the Yamnaya-derived ancestry spread across Europe through CWC and Bell Beaker complex (BBC) cultures. Present-day Europeans derive between ~10% and 60% Steppe pastoralist-related ancestry, which was not seen in Neolithic samples. To obtain a precise chronology of the spread of Steppe pastoralist-related ancestry across Europe, we analyzed 109 late Neolithic, Chalcolithic, and BA samples dated between 3000 and 750 CE from 18 regions, including samples associated with the CWC and BBC cultures. We first confirmed that most target samples had Steppe pastoralist-related ancestry, in addition to European HG-related and Anatolian farmer-related ancestry using qpAdm. We excluded 20 groups that could not be parsimoniously modeled as a three-way mixture even after removing individual outliers. After filtering, we retained 79 groups for dating Steppe pastoralist-related gene flow across Europe (Supplementary file 2I and J). As BA Europeans have ancestry from three distinct groups, we applied DATES using the following two reference populations, one group including early Steppe pastoralists (Yamnaya and Afanasievo) and the other group that is a the proxy for the ancestral Neolithic Europe population using pooled samples of WHG-related and Anatolian farmer-related individuals.

To learn about the spread of CWC culture across Europe, we used seven late Neolithic and Bronze age groups, including five associated with CWC artifacts. Using DATES, we inferred that the oldest date of Steppe pastoralists gene flow in Europe was ~3200 BCE in Scandinavia in samples associated with Battle Axe Culture in Sweden and Single Grave Culture in Denmark that were both contemporary to CWC. The samples from Scandinavia showed large heterogeneity in ancestry, including some individuals with majority Steppe pastoralist-related ancestry (and negligible amounts of Anatolian farmer-related ancestry), consistent with patterns expected from recent gene flow 38Malmström et al.2019. Strikingly, we inferred the timing of admixture in central Europe (Germany and the Czech Republic) and eastern Europe (Estonia and Poland) to be remarkably similar. These dates fall within a narrow range of ~3000–2900 BCE across diverse regions, suggesting that the mixed population associated with the Corded Ware culture formed over a short time and spread across Europe rapidly with very little further mixture (Supplementary file 1B).

Following the Corded Ware culture, from around 2800 to 2300 BCE, Bell Beaker pottery became widespread across Europe 17Fokkens and Nicolis2012. Using 19 Chalcolithic and BA samples, including 10 associated with Beaker-complex artifacts, we inferred the dynamics of the spread of the Beaker complex across Europe. We inferred the oldest date of Steppe pastoralist-related admixture was ~3200 BCE (3600–2800 BCE) in early Bronze Age (EBA) Mallorca samples from Iberia. We note the EBA Mallorca sample is not directly associated with Beaker culture, but qpAdm modeling suggests that this individual is clade with the small subset of Iberian Beaker-complex-associated individuals who carried Steppe pastoralist-related ancestry 16Fernandes et al.2020. Most individuals from Iberia, however, had negligible Steppe pastoralist-related ancestry suggesting the Beaker culture was not accompanied by major gene flow in Iberia despite the earliest dates (Supplementary file 2I). In central and western Europe, where Steppe pastoralist gene flow was more pervasive, we inferred the median date of the mixture was ~2700 BCE with the oldest dates in the Netherlands, followed by Germany and France (Figure 2). There was, however, large heterogeneity in the dates across Europe and even within the same region. For example, comparing two BA groups from the Netherlands suggests a wide range of dates ~3000 BCE and 2500 BCE, and four groups from Germany indicate a range of ~2900–2700 BCE. From central Europe, the Steppe pastoralist-related ancestry spread quickly to the British Isles, where people with Steppe pastoralist ancestry replaced 90% of the genetic ancestry of individuals from Britain. Our estimates for the time of gene flow in Bell Beakers samples from England suggest that the gene flow occurred ~2700 BCE (2770–2550 BCE). Our estimated dates of admixture are older than the dates of arrival of this ancestry in Britain 44Olalde et al.2018 and, interestingly, overlap the dates in central Europe. Given that a significant fraction of the Beaker individuals were recent migrants from central Europe, we interpret our dates reflect the admixture into ancestors of the British Beaker people, occurring in mainland Europe 44Olalde et al.2018.

The middle to late Bronze Age (MLBA) led to the final integration of Steppe pastoralist-related ancestry in Europe. In southern Europe, EBA samples had limited Steppe pastoralist-related ancestry, though present-day individuals harbor between ~5% and 30% of this ancestry 24Haak et al.2015. Using pooled samples of MLBA from Spain, we inferred major mixture occurred ~2500 BCE in Iberia. We inferred a similar timing in Italy using individuals associated with the Bell Beaker culture and EBA samples from Sicily (Supplementary file 1B). In Sardinia, a majority of the BA samples do not have Steppe pastoralist-related ancestry. In a few individuals, we found evidence for Steppe pastoralist-related ancestry, though in most cases, this ancestry proportion overlapped 0 and the inferred dates of admixture were very noisy (Supplementary file 2I). Using Iron Age samples from Sardinia, we inferred the gene flow occurred ~2600 BCE, though there is a large uncertainty associated with this estimate (3700–1490 BCE). In other parts of continental Europe and the British Isles, the Steppe pastoralist-related ancestry got diluted over time, as evidenced by more recent dates in LBA (late Bronze Age) than EBA or MBA (middle Bronze Age) samples in Germany, England, and Scotland, and an increase in Neolithic farmer ancestry during this period (45Olalde et al.2019; Supplementary file 1B).

Finally, the CWC expanded to the east to form the archaeological complexes of Sintashta, Srubnaya, Andronovo, and the BA cultures of Kazakhstan. Samples associated with these cultures harbor mixed ancestry from the Yamnaya Steppe pastoralist-related groups (CWC, in some cases) and Neolithic individuals from central Europe (Supplementary file 2K; 43Narasimhan et al.2019). Applying DATES to eight MLBA Steppe pastoralist groups, we inferred the precise timing for the formation of these groups beginning in the third millennium BCE. These groups were formed chronologically, with the date of genetic formation of ~3200 BCE for Sintashta culture, followed by ~2900 BCE for Srubnaya and Andronovo cultures. In the central Steppe region (present-day Kazakhstan), we obtained median dates of ~2800 BCE for the expansion of Steppe pastoralist-related ancestry in four Kazakh cultures of Maitan Alakul, Aktogai, and Kairan. By ~2700 BCE, most of these cultures had almost 60–70% Yamnaya Steppe pastoralist-related ancestry (Supplementary file 1B). These groups, in turn, expanded eastwards, transforming the genetic composition of populations in South Asia.

Discussion

We developed DATES that measures ancestry covariance patterns in a single diploid individual genome to estimate the time of admixture. Using extensive simulations, we show that DATES provides accurate estimates of the timing of admixture across a range of demographic scenarios. Application of DATES to present-day samples shows that the results are concordant with published methods – ROLLOFF, ALDER, and Globetrotter. For sparse datasets, DATES outperforms published methods as it does not require phased data and works reliably with limited samples, large proportions of missing variants, as well as pseudo-haploid genotypes. This makes DATES ideally suited for the analysis of ancient DNA samples. We illustrate the application of DATES by reconstructing population movements and admixtures during the European Holocene. We confirm and extend signals that were previously identified such as the resurgence of HG ancestry during the Neolithic and provide new details about the genetic formation of the ancestral populations of Europeans and the spread of CWC and BBC cultures across Europe. Together, our analysis provides a detailed timeline and insights into the dynamics of the Neolithization of Europe and the spread of Steppe pastoralist-related ancestry across Europe.

First, we document that the Mesolithic HGs formed as a mixture of WHG and EHG ancestry ~10,200–7400 BCE. These dates are consistent with the archaeological evidence for the appearance of lithic technology associated with eastern HGs in Scandinavia and the Baltic regions 22Günther et al.201832Kashuba et al.2019. Next, we studied the timing of the genetic formation of Anatolian farmers. The earliest evidence of agriculture comes from the Fertile Crescent, the southern Levant, and the Zagros Mountains of Iran and dated to around 10,000 BCE. In central Anatolia, farming has been documented c. 8300 BCE 4Baird et al.20186Bellwood2005. It has been long debated if Neolithic farming groups from Iran and the Levant introduced agriculture to Anatolia or HGs in the region locally adopted agricultural practices. The early Anatolian farmers can be modeled as a mixture of local HGs related to Caucasus HGs or the first farmers from Iran 14Feldman et al.2019. By applying DATES (assuming a single instantaneous admixture), we inferred that the Iran Neolithic gene flow occurred around 10,900 BCE (~12,200–9600 BCE). An alternate possibility is that there was a long period of gradual gene flow between the two groups and our dates reflect intermediate dates between the start and end of the gene flow. An upper bound for such a mixture comes from the lack of Iran Neolithic ancestry in AHGs at 13,000 BCE, and a lower bound comes from the C14 dates of early Anatolian farmers, one of which is directly dated at 8269–8210 BCE 14Feldman et al.2019. In either case (instantaneous admixture or gradual gene flow), the genetic mixture that formed Anatolian farmers predates the advent of agriculture in this region 4Baird et al.20186Bellwood2005. This supports the model that AHGs locally transitioned to agricultural subsistence, and most probably, there was cultural diffusion from other regions in Near East (Iran and Levant) 14Feldman et al.2019. Future studies with more dense temporal sampling will shed light on the demographic processes that led to the transition from foraging to farming in the Near East, and in turn, elucidate the relative roles of demic and cultural diffusion in the dispersal of technologies like agriculture across populations.

Using data from 16 regions in Europe, we reconstruct a detailed chronology and dynamics of the expansion of Anatolian farmers during the Neolithic period. We infer that starting in ~6400 BCE, gene flow from Anatolian farmers became widespread across Europe, and by ~4300 BCE, it was present in almost all parts of continental Europe and the British Isles. These dates are significantly more recent than the estimates of farming based on archaeological evidence in some parts of Europe, suggesting that the local HGs and farmers coexisted for more than a millennium before the mixture occurred 24Haak et al.201536Lipson et al.2017. In many regions, after the initial mixture, there was a resurgence of HG ancestry, highlighting the complexities of these ancient interactions. We note that our results are consistent with two previous genetic studies, 36Lipson et al.2017, and 52Rivollat et al.2020, that applied genetic dating methods to a subset of samples we used in our analysis. 36Lipson et al.2017, used a modified version of ALDER to infer the timing of admixture in three regions (n=151). We obtained statistically consistent results for all overlapping samples (within two standard errors) (Appendix 1—table 6). An advantage of our approach over the modified ALDER approach is that we do not rely on helper samples (higher coverage individuals combined with the target group) for dating; unless these have a similar ancestry profile, they could bias the inferred dates. Our results are concordant with 52Rivollat et al.2020, that used a previous version of DATES to infer the timing of Neolithic gene flow in 32 groups (vs. 86 groups in our study). We find the performance of both versions of DATES is similar, though some implementation details have improved (Appendix 1—table 1).

The second major migration occurred when populations associated with the Yamnaya culture in the Pontic-Caspian steppes expanded across Europe. Our analysis reveals the precise timing of the genetic formation of the early Steppe pastoralist groups – Yamnaya and Afanasievo – occurred ~4400–4000 BCE. This estimate predates the archaeological evidence by more than a millennium 2Anthony2007 and suggests the presence of an ancient ‘ghost’ population of proto-Yamnaya around this time. Understanding the source and location of this ghost population will provide deep insights into the history of Pontic-Caspian cultures and the origin of Indo-European languages that have been associated to have spread with Steppe pastoralists ancestry to Europe and South Asia 24Haak et al.201533Kassian et al.2021. Starting in ~3200 BCE, the Yamnaya-derived cultures of CWC and BBC spread westwards, bringing Steppe pastoralist-related ancestry to Europe. Our analysis reveals striking differences in the spread of these two cultures: the CWC formation is similar across diverse regions separated by thousands of kilometers, suggesting a rapid spread after the initial formation of this group, while the spread of BBC culture was more complex and heterogeneous across regions. We find the earliest evidence of Steppe pastoralist-related ancestry in Iberia around 3200 BCE, though this ancestry only becomes widespread after 2500 BCE. In central Europe, the gene flow occurred simultaneously with archaeological evidence and was coexisting with the CWC in some parts 57Willigen and van200144Olalde et al.2018. Finally, in the British Isles, the Bell Beaker culture spreads rapidly from central Europe and replaces almost 90% of the ancestry of individuals in this region 44Olalde et al.2018.

Recent analysis has shown remarkable parallels in the history of Europe and South Asia; with both groups deriving ancestry from local indigenous HGs, Near Eastern farmers, and Steppe pastoralist-related groups 43Narasimhan et al.2019. Interestingly, however, the timing of the two major migrations events differs across the two subcontinents. Both mixtures occurred in Europe almost a millennium before they occurred in South Asia. In Europe, the Neolithic migrations primarily involved Anatolian farmers, while the source of Neolithic ancestry is closer to Iran Neolithic farmers in South Asia. The Steppe pastoralist-related gene flow occurred in the context of the spread of CWC and BBC cultures in Europe around 3200–2500 BCE; in South Asia, this ancestry arrived with Steppe MLB A cultures that were formed much later in 1800–1500 BCE 43Narasimhan et al.2019. The Steppe MLBA groups have ancestry from Steppe pastoralist derived groups and European Neolithic farmers following the eastward expansion of CWC groups between ~3200 and 2700 BCE. Understanding the origin and migration paths of the ancestral groups thus helps to illuminate the differences in the timeline of the spread of Steppe pastoralists across the two subcontinents of Eurasia.

Genomic dating methods like DATES provide an independent and complementary approach for reconstructing population history. By focusing on the genetic clock based on recombination rate, we provide an independent estimate of the timing of evolutionary events up to several thousands of years. Our analysis also has advantages over the temporal sampling of ancient DNA, in that we can obtain direct estimates of when a population was formed, rather than inferring putative bounds for the timing based on the absence/presence of a particular ancestry signature (which may be sensitive to sampling choice or density). Genetic approaches provide complementary evidence to archaeology and linguistics as they date the time of admixture and not migration. Both dates are similar in many contemporary populations like African Americans and Latinos, though this may not be generally true 27Hellenthal et al.2014. This is underscored by our dates for the Anatolian farmer-related mixture, which postdates evidence of material culture related to agriculture by almost two millennia in some regions. This suggests that European HGs and farmers resided side by side for several thousand years before mixing 7Bollongino et al.201354Skoglund et al.2014. This underscores how genetic dates can provide complementary evidence to archaeology and help to build a comprehensive picture of population origins and movements.

Materials and methods

Dataset

We analyzed 1096 ancient European samples from 152 groups restricting to data from 1,233,013 autosomal SNP positions that were genotyped using the Affymetrix Human Origins array (the V44.3 release of the AADR; https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data). We filtered this dataset to remove samples that were marked as contaminated, low coverage, outliers, duplicates, or first- or second-degree relatives. We grouped individuals together from a particular culture or region. Details of sample affiliation and grouping used is described in Supplementary file 1A.

Modeling admixture history

We applied qpAdm from ADMIXTOOLS to identify the best fitting model and estimate the ancestry proportions in a target population modeled as a mixture of n ‘reference’ populations using a set of ‘Outgroup’ populations 24Haak et al.2015. We set the details: YES parameter, which reports a normally distributed Z-score to evaluate the goodness of fit of the model (standard errors were estimated with a Block Jackknife). For each target population, we chose the most parsimonious model, that is, fitting the data with the minimum number of source populations. We excluded models where the p-value < 0.05 indicating a poor fit to the data. Details of the qpAdm analysis for each group are reported in Supplementary file 2. We also applied D-statistics in some cases using qpDstat in ADMIXTOOLS with default parameters.

DATES: model and implementation

DATES leverages the weighted ancestry covariance patterns across the genome of an admixed individual to infer the time of admixture. This method extends the idea introduced in ROLLOFF and ALDER and 41Moorjani et al.2016 to be applicable to dating admixture events between modern human populations using a single genome.

Basic model and notation

Assume we have an admixed individual C with ancestry from source populations A and B, with ancestry proportion of $α$ and $β = (1 - α)$ , respectively. This mixture occurred $t$ generations ago. First, we model the genotypes of C as a linear mix of allele frequencies of populations A and B. For any SNP i, let the genotype of C be $g_{i}$ and allele frequency in A and B be $p_{A} (i)$ and $p_{B} (i) .$ We can then infer the mixing fraction $α$ from population A by solving the simple linear regression by minimizing the residuals.

R = \sum_{i} {(g_{i} - (α p_{A} (i) + (1 - α) p_{B} (i)))}^{2}

Let $a_{i}$ be the probability of observing $g_{i}$ in C given the observed genotype in A, and $b_{i}$ be the probability of observing $g_{i}$ in C given the observed genotype in B

a_{i} = P (g_{i} | A)

b_{i} = P (g_{i} | B)

We can then compute the likelihood L_i of observing a genotype $g_{i}$ in the admixed individual

L_{i} = α a_{i} + β b_{i}

For a pair of neighboring markers S₁, S₂ located at a genetic distance of d Morgans, the probability of no recombination between the two markers is given by $θ = e^{- t d} .$ Accounting for recombination, the log-likelihood that the two markers have the same ancestry is then given by:

L = l o g [(1 - θ) L_{1} L_{2} + θ (α a_{1} a_{2} + β b_{1} b_{2})]

Let K_i represent the ancestry at marker S_i. Expanding as a power series in θ, the coefficient of $θ$ is QK₁K₂, where

Q = α β

K_{i} = \frac{(a_{i} - b_{i})}{L_{i}}

We can compute the ancestry covariance, A(d), across pairs of markers S₁, S₂ separated by distance d as

A (d) = \frac{\sum_{s (d)} (K_{1} -_{1}) (K_{2} -_{2})}{| S (d) |}

where S(d) is a set of markers S₁, S₂ located d Morgans apart.

The ancestry covariance $A (d)$ is expected to follow an exponential decay with d with the rate of decay depending on the time since admixture ( $t + 1$ ).

A (d) \sim e^{- (t + 1) d}

The factor of (t+1) comes from the insight that in the first generation following admixture, the admixed population derives one chromosome from each ancestral group. The mixing of chromosomes only begins in the following generations as the chromosomes recombine. This means that if we fit t generations, we are likely to underestimate the time of admixture. We note that previous methods like ALDER and ROLLOFF, however, incorrectly fit t generations to infer the time of mixture. In practice, however, this has little effect on the inference except maybe in case of very recent admixture dates. We infer the time of the mixture by fitting an exponential distribution with affine term using least squares. DATES is applicable for dating admixture in a single individual. When multiple individuals from an admixed population are available, DATES computes the log-likelihood by summing over all individuals.

Application to real data

We applied DATES using genome-wide SNP data from the target population and two reference populations. To infer the allele frequency in the ancestral populations more reliably, where specified, we pooled individuals deriving the majority of their ancestry from the population of interest (Supplementary file 1A). We computed the weighted ancestry covariance between 0.45 cM (centiMorgans) (to minimize the impact of background LD) and 100 cM, with a bin size of 0.1 cM. We plotted the weighted covariance with genetic distance and obtained a date by fitting an exponential function with an affine term $y = A e^{- λ d} + c$ , where d is the genetic distance in Morgans and $λ$ = (t+1) is the number of generations since admixture (t). We computed standard errors using weighted block jackknife, where one chromosome was removed in each run 10Busing et al.1999. Following 55Tournebize et al.2020, we examined the quality of the exponential fit by computing the normalized root-mean-square deviation (NRMSD) between the empirical ancestry covariance values $z$ and the fitted ones $^z$ , across all the genetic distance bins (Appendix 1).

N R M S D = \frac{1}{m a x (^z) - m i n (^z)} \sqrt{\frac{\sum^{D} (z -^z)^{2}}{N}}

The estimated dates of admixture were considered significant if the (a) Z-score > 2, (b) $λ$ < 200 generations and (c) NRMSD < 0.7. We converted the inferred dates from generations to years by assuming a mean generation time of 28 years 41Moorjani et al.2016. For ancient samples, we added the sampling age of the ancient specimen (Supplementary file 1A). When multiple individuals were available, we used the average sampling ages to offset the admixture dates. We report dates in BCE by assuming the 1950 convention.

Comparison of old and new version of DATES

An earlier version of DATES (version v753) was released in 43Narasimhan et al.2019. The current method (version 4010) released in this study differs in some key aspects of the implementation as described below.

Use of regression model vs. likelihood approach: In v753, we used a regression model to infer the residuals at each site in the genotype by conditioning on the allele frequency in the reference population and the genome-wide estimate of the admixture proportion 43Narasimhan et al.2019. In contrast, in the current version (v4010), we use a more rigorous likelihood framework where we infer the probability of ancestry from each reference population at each site in the genome (Equation 3).
Rate of decay of exponential fit: In v753, like ALDER and ROLLOFF, we fit an exponential decay with the rate of t generations. However, this assumes that mosaic chromosomes are formed in the generation when the gene flow occurs. However, in reality, the mixing of ancestry only begins in the following generations as the chromosomes of distinct ancestry recombine. To correctly account for this effect, we fit an exponential with the rate of (t+1) in DATES v4010. In practice, this has a minor effect on the dates reported earlier, as in most cases the uncertainty is much larger than one generation.
Goodness of fit test: In v4010, we implemented the NRMSD to assess the fit of the exponential curve. NRMSD computes the deviation between the empirical estimate and fitted data in order to provide a statistical way to characterize the noisiness of the fitted curve. Lower values of NRMSD suggest a better fit, however, there is no clear interpretation of the absolute value of NRMSD. Based on the empirical distribution of NRMSD values in our study samples (Appendix 1—figure 3), we infer a conservative threshold of 0.7 to define a ‘good’ fit. We caution that users should adjust this threshold based on their application and always visually inspect their exponential fits to ensure reliable results.
Support for arbitrary number of chromosomes: Unlike v753 that was optimized for parameters in humans, the new version supports an arbitrary number of chromosomes (inputted by the user) so DATES can be used in any species.

A comparison of the two version of DATES using simulated data (Appendix 1—table 1) and empirical data (Appendix 1—table 2, Appendix 1—table 3) yields qualitatively similar results.

Simulations

We constructed admixed genomes following the approach described in 40Moorjani et al.2011. This method requires phased haplotypes from two source populations and uses two key parameters to simulate data from admixed individuals, (a) the mixture proportion ( $α$ ) that represents the probability that a particular sampled haplotype comes from one of the reference panels, namely source₁ and source₂, and (b) the time of mixture ( $λ$ ) which is the number of generations since mixture. To simulate an admixed individual, we begin at the start of the chromosome and sample a haplotype from either source₁ with a probability ( $α$ ) and source₂ with a probability ( $1 - α$ ). At each subsequent marker, we check if there was a recombination event between the two neighboring markers. A recombination event occurs with a probability of ( $1 - e^{- λ g}$ ), where g is the genetic distance in Morgans. We use the time of $λ = (t + 1)$ generations to account for the fact that in the first generation following admixture, the offspring inherits one chromosome of each ancestry. In the next generation, the crossovers lead to a mixing of ancestry. Thus, when a recombination event occurs, we resample the ancestry between source₁ and source₂, otherwise, we copy the haplotype from the same source population. (Note, a recombination event can lead to a switch to a haplotype of the same ancestry.) Once the ancestry is chosen, we randomly pick a haplotype from the ancestral pool (without replacement) and copy its sequence to the genome of the admixed individual. This process is continued until we reach the end of the chromosome. Using this approach, we generate the genomes of $n$ admixed individuals. The simulated haploid chromosomes are merged at random to construct diploid admixed individuals. This algorithm requires more than $2 n$ ancestral haplotypes for generating data for $n$ diploid admixed individuals 40Moorjani et al.2011. For more than two reference populations, the same algorithm is repeated iteratively.

We used 111 CEU and 112 YRI phased 1000 genomes phase 3 dataset 3Auton et al.2015 for generating 10 admixed genomes for ~380,000 SNPs (unless otherwise stated). For the inference, we used French and Yoruba from HGDP 35Li et al.2008. We generated data for various demographic scenarios, where we varied the time the admixture ( $λ$ ), proportion of mixture ( $α$ ), sample size in the reference and target populations, divergence between the ancestral and reference populations used and studied their impact on the estimated dates. We also characterized the impact of features of ancient DNA such as missing data, pseudo-haploid genotypes, and limited sample size. In order to simulate pseudo-haploid genotypes, we randomly sampled an allele at each heterozygous site and assigned it as the homozygous genotype at that site 26Harney et al.2021. To generate missing data, we set the genotype call at a site as ‘missing’ or ‘unknown’ (in eigenstrat format as 9) where the proportion of missing genotypes ranged between 5% and 60% in our simulations. We also evaluated the impact of the choice of reference populations used in DATES in case of simple and multiple pulses of admixture.

To study the impact of complex scenarios of admixture involving founder events and continuous gene flow, we used a coalescent simulator, MaCs 12Chen et al.2009. We simulated 100 Mb of three populations with an effective population size of 12,500, mutation rate of 1.2 × 10^–8 and recombination rate 1 × 10^–8 per base pair per generation, respectively 25Halldorsson et al.201931Jónsson et al.2017. We assumed the admixture occurred continuously over a period of time or was followed by the bottleneck. In case of the latter, the duration of the bottleneck was 1–10 generations with reduction in effective population size from 12,500 to 10–1000 and the population recovered to its original size after the bottleneck or maintained a small size until present (no recovery founder event). For each simulation, we generated data for two haploid chromosomes and combined these to generate one diploid chromosome.

Software availability

The executable and source code for DATES will be available on GitHub: https://github.com/MoorjaniLab/DATES_v4010 (copy archived at swh:1:rev:e034dc0d6fe8d41a828796f07791d50011b6bb04; 13Chintalapati et al.2022).

References

1. MEAllentoft
2. MSikora
3. K-GSjögren
4. SRasmussen
5. MRasmussen
6. JStenderup
7. PBDamgaard
8. HSchroeder
9. TAhlström
10. LVinner
11. A-SMalaspinas
12. AMargaryan
13. THigham
14. DChivall
15. NLynnerup
16. LHarvig
17. JBaron
18. PDella Casa
19. PDąbrowski
20. PRDuffy
21. AVEbel
22. AEpimakhov
23. KFrei
24. MFurmanek
25. TGralak
26. AGromov
27. SGronkiewicz
28. GGrupe
29. THajdu
30. RJarysz
31. VKhartanovich
32. AKhokhlov
33. VKiss
34. JKolář
35. AKriiska
36. ILasak
37. CLonghi
38. GMcGlynn
39. AMerkevicius
40. IMerkyte
41. MMetspalu
42. RMkrtchyan
43. VMoiseyev
44. LPaja
45. GPálfi
46. DPokutta
47. ŁPospieszny
48. TDPrice
49. LSaag
50. MSablin
51. NShishlina
52. VSmrčka
53. VISoenov
54. VSzeverényi
55. GTóth
56. SVTrifanova
57. LVarul
58. MVicze
59. LYepiskoposyan
60. VZhitenev
61. LOrlando
62. TSicheritz-Pontén
63. SBrunak
64. RNielsen
65. KKristiansen
66. EWillerslev
2015Population genomics of Bronze Age Eurasia522Nature167172
1. DWAnthony
2007The Horse, the Wheel, and LanguagePrinceton University Press
1. AAuton
2. GRAbecasis
3. DMAltshuler
4. RMDurbin
5. GRAbecasis
6. DRBentley
7. AChakravarti
8. AGClark
9. PDonnelly
10. EEEichler
11. PFlicek
12. SBGabriel
13. RAGibbs
14. EDGreen
15. MEHurles
16. BMKnoppers
17. JOKorbel
18. ESLander
19. CLee
20. HLehrach
21. ERMardis
22. GTMarth
23. GAMcVean
24. DANickerson
25. JPSchmidt
26. STSherry
27. JWang
28. RKWilson
29. RAGibbs
30. EBoerwinkle
31. HDoddapaneni
32. YHan
33. VKorchina
34. CKovar
35. SLee
36. DMuzny
37. JGReid
38. YZhu
39. JWang
40. YChang
41. QFeng
42. XFang
43. XGuo
44. MJian
45. HJiang
46. XJin
47. TLan
48. GLi
49. JLi
50. YLi
51. SLiu
52. XLiu
53. YLu
54. XMa
55. MTang
56. BWang
57. GWang
58. HWu
59. RWu
60. XXu
61. YYin
62. DZhang
63. WZhang
64. JZhao
65. MZhao
66. XZheng
67. ESLander
68. DMAltshuler
69. SBGabriel
70. NGupta
71. NGharani
72. LHToji
73. NPGerry
74. AMResch
75. PFlicek
76. JBarker
77. LClarke
78. LGil
79. SEHunt
80. GKelman
81. EKulesha
82. RLeinonen
83. WMMcLaren
84. RRadhakrishnan
85. ARoa
86. DSmirnov
87. RESmith
88. IStreeter
89. AThormann
90. IToneva
91. BVaughan
92. XZheng-Bradley
93. DRBentley
94. RGrocock
95. SHumphray
96. TJames
97. ZKingsbury
98. HLehrach
99. RSudbrak
100. MWAlbrecht
101. VSAmstislavskiy
102. TABorodina
103. MLienhard
104. FMertes
105. MSultan
106. BTimmermann
107. MLYaspo
108. ERMardis
109. RKWilson
110. LFulton
111. RFulton
112. STSherry
113. VAnaniev
114. ZBelaia
115. DBeloslyudtsev
116. NBouk
117. CChen
118. DChurch
119. RCohen
120. CCook
121. JGarner
122. THefferon
123. MKimelman
124. CLiu
125. JLopez
126. PMeric
127. CO’Sullivan
128. YOstapchuk
129. LPhan
130. SPonomarov
131. VSchneider
132. EShekhtman
133. KSirotkin
134. DSlotta
135. HZhang
136. GAMcVean
137. RMDurbin
138. SBalasubramaniam
139. JBurton
140. PDanecek
141. TMKeane
142. AKolb-Kokocinski
143. SMcCarthy
144. JStalker
145. MQuail
146. JPSchmidt
147. CJDavies
148. JGollub
149. TWebster
150. BWong
151. YZhan
152. AAuton
153. CLCampbell
154. YKong
155. AMarcketta
156. RAGibbs
157. FYu
158. LAntunes
159. MBainbridge
160. DMuzny
161. ASabo
162. ZHuang
163. JWang
164. LJMCoin
165. LFang
166. XGuo
167. XJin
168. GLi
169. QLi
170. YLi
171. ZLi
172. HLin
173. BLiu
174. RLuo
175. HShao
176. YXie
177. CYe
178. CYu
179. FZhang
180. HZheng
181. HZhu
182. CAlkan
183. EDal
184. FKahveci
185. GTMarth
186. EPGarrison
187. DKural
188. WPLee
189. WFung Leong
190. MStromberg
191. ANWard
192. JWu
193. MZhang
194. MJDaly
195. MADePristo
196. REHandsaker
197. DMAltshuler
198. EBanks
199. GBhatia
200. GAngel
201. SBGabriel
202. GGenovese
203. NGupta
204. HLi
205. SKashin
206. ESLander
207. SAMcCarroll
208. JCNemesh
209. REPoplin
210. SCYoon
211. JLihm
212. VMakarov
213. AGClark
214. SGottipati
215. AKeinan
216. JLRodriguez-Flores
217. JOKorbel
218. TRausch
219. MHFritz
220. AMStütz
221. PFlicek
222. KBeal
223. LClarke
224. ADatta
225. JHerrero
226. WMMcLaren
227. GRSRitchie
228. RESmith
229. DZerbino
230. XZheng-Bradley
231. PCSabeti
232. IShlyakhter
233. SFSchaffner
234. JVitti
235. DNCooper
236. EVBall
237. PDStenson
238. DRBentley
239. BBarnes
240. MBauer
241. RKeira Cheetham
242. ACox
243. MEberle
244. SHumphray
245. SKahn
246. LMurray
247. JPeden
248. RShaw
249. EEKenny
250. MABatzer
251. MKKonkel
252. JAWalker
253. DGMacArthur
254. MLek
255. RSudbrak
256. VSAmstislavskiy
257. RHerwig
258. ERMardis
259. LDing
260. DCKoboldt
261. DLarson
262. KYe
263. SGravel
2015The 1000 Genomes Project Consortium, Corresponding authors, Steering committee526Nature6874
1. DBaird
2. AFairbairn
3. EJenkins
4. LMartin
5. CMiddleton
6. JPearson
7. EAsouti
8. YEdwards
9. CKabukcu
10. GMustafaoğlu
11. NRussell
12. OBar-Yosef
13. GJacobsen
14. XWu
15. ABaker
16. SElliott
2018Agricultural origins on the Anatolian plateau115PNAS
1. DMBehar
2. BYunusbayev
3. MMetspalu
4. EMetspalu
5. SRosset
6. JParik
7. SRootsi
8. GChaubey
9. IKutuev
10. GYudkovsky
11. EKKhusnutdinova
12. OBalanovsky
13. OSemino
14. LPereira
15. DComas
16. DGurwitz
17. BBonne-Tamir
18. TParfitt
19. MFHammer
20. KSkorecki
21. RVillems
2010The genome-wide structure of the Jewish people466Nature238242
1. PBellwood
2005First Farmers: The Origins of Agricultural SocietiesWiley-Blackwell
1. RBollongino
2. ONehlich
3. MPRichards
4. JOrschiedt
5. MGThomas
6. CSell
7. ZFajkosová
8. APowell
9. JBurger
20132000 years of parallel societies in Stone Age Central Europe342Science (New York, N.Y.)479481
1. SBrace
2. YDiekmann
3. TJBooth
4. LDorp
5. ZFaltyskova
6. NRohland
7. SMallick
8. IOlalde
9. MFerry
10. MMichel
11. JOppenheimer
12. NBroomandkhoshbacht
13. KStewardson
14. RMartiniano
15. SWalsh
16. MKayser
17. SCharlton
18. GHellenthal
19. IArmit
20. RSchulting
21. OECraig
22. ASheridan
23. MParker Pearson
24. CStringer
25. DReich
26. MGThomas
27. IBarnes
2019Ancient genomes indicate population replacement in Early Neolithic Britain3Nature Ecology & Evolution765771
1. BBramanti
2. MGThomas
3. WHaak
4. MUnterlaender
5. PJores
6. KTambets
7. IAntanaitis-Jacobs
8. MNHaidle
9. RJankauskas
10. CJKind
11. FLueth
12. TTerberger
13. JHiller
14. SMatsumura
15. PForster
16. JBurger
2009Genetic discontinuity between local hunter-gatherers and central Europe’s first farmers326Science (New York, N.Y.)137140
1. FBusing
2. EMeijer
3. RVDLeeden
1999Delete-m Jackknife for Unequal m9Statistics and Computing38
1. RChakraborty
2. KMWeiss
1988Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci85PNAS91199123
1. GKChen
2. PMarjoram
3. JDWall
2009Fast and flexible simulation of DNA sequence data19Genome Research136142
1. MChintalapati
2. NPatterson
3. PMoorjani
2022DATES_v4010https://archive.softwareheritage.org/swh:1:dir:0cd95e97a6e3186515f8812dca928ba30829d132;origin=https://github.com/MoorjaniLab/DATES_v4010;visit=swh:1:snp:bb2c8ef48a7ffe4366f654ee057c257c7780e88c;anchor=swh:1:rev:e034dc0d6fe8d41a828796f07791d50011b6bb04
1. MFeldman
2. EFernández-Domínguez
3. LReynolds
4. DBaird
5. JPearson
6. IHershkovitz
7. HMay
8. NGoring-Morris
9. MBenz
10. JGresky
11. RABianco
12. AFairbairn
13. GMustafaoğlu
14. PWStockhammer
15. CPosth
16. WHaak
17. CJeong
18. JKrause
2019Late Pleistocene human genome suggests a local origin for the first farmers of central Anatolia10Nature Communications
1. DMFernandes
2. DStrapagiel
3. PBorówka
4. BMarciniak
5. EŻądzińska
6. KSirak
7. VSiska
8. RGrygiel
9. JCarlsson
10. AManica
11. WLorkiewicz
12. RPinhasi
2018A genomic Neolithic time transect of hunter-farmer admixture in central Poland8Scientific Reports
1. DMFernandes
2. AMittnik
3. IOlalde
4. ILazaridis
5. OCheronet
6. NRohland
7. SMallick
8. RBernardos
9. NBroomandkhoshbacht
10. JCarlsson
11. BJCulleton
12. MFerry
13. BGamarra
14. MLari
15. MMah
16. MMichel
17. AModi
18. MNovak
19. JOppenheimer
20. KASirak
21. KStewardson
22. KMandl
23. CSchattke
24. KTÖzdoğan
25. MLucci
26. GGasperetti
27. FCandilio
28. GSalis
29. SVai
30. ECamarós
31. CCalò
32. GCatalano
33. MCueto
34. VForgia
35. MLozano
36. EMarini
37. MMicheletti
38. RMMiccichè
39. MRPalombo
40. DRamis
41. VSchimmenti
42. PSureda
43. LTeira
44. MTeschler-Nicola
45. DJKennett
46. CLalueza-Fox
47. NPatterson
48. LSineo
49. ACoppa
50. DCaramelli
51. RPinhasi
52. DReich
2020The spread of steppe and Iranian-related ancestry in the islands of the western Mediterranean4Nature Ecology & Evolution334345
1. HFokkens
2. FNicolis
2012Background to Beakers. Inquiries into Regional Cultural Backgrounds of the Bell Beaker ComplexSidestone Press
1. QFu
2. CPosth
3. MHajdinjak
4. MPetr
5. SMallick
6. DFernandes
7. AFurtwängler
8. WHaak
9. MMeyer
10. AMittnik
11. BNickel
12. APeltzer
13. NRohland
14. VSlon
15. STalamo
16. ILazaridis
17. MLipson
18. IMathieson
19. SSchiffels
20. PSkoglund
21. APDerevianko
22. NDrozdov
23. VSlavinsky
24. ATsybankov
25. RGCremonesi
26. FMallegni
27. BGély
28. EVacca
29. MRGMorales
30. LGStraus
31. CNeugebauer-Maresch
32. MTeschler-Nicola
33. SConstantin
34. OTMoldovan
35. SBenazzi
36. MPeresani
37. DCoppola
38. MLari
39. SRicci
40. ARonchitelli
41. FValentin
42. CThevenet
43. KWehrberger
44. DGrigorescu
45. HRougier
46. ICrevecoeur
47. DFlas
48. PSemal
49. MAMannino
50. CCupillard
51. HBocherens
52. NJConard
53. KHarvati
54. VMoiseyev
55. DGDrucker
56. JSvoboda
57. MPRichards
58. DCaramelli
59. RPinhasi
60. JKelso
61. NPatterson
62. JKrause
63. SPääbo
64. DReich
2016The genetic history of Ice Age Europe534Nature200205
1. SGravel
2012Population genetics models of local ancestry191Genetics607619
1. DGronenborn
2. VCummings
3. PJordan
4. MZvelebil
2014The persistence of hunting and gathering: Neolithic western temperate and Central EuropeIn The Oxford Handbook of the Archaeology and Anthropology of Hunter-GatherersOxford University Press
1. JGuilaine
2. AAmmerman
3. PBiagi
2003Aspects de la néolithisation en Méditerranée et en FranceIn The Widening Harvest. The Neolithic Transition in Europe: Looking Back, Looking ForwardArchaeological Institute of America
1. TGünther
2. HMalmström
3. EMSvensson
4. AOmrak
5. FSánchez-Quinto
6. GMKılınç
7. MKrzewińska
8. GEriksson
9. MFraser
10. HEdlund
11. ARMunters
12. ACoutinho
13. LGSimões
14. MVicente
15. ASjölander
16. BJansen Sellevold
17. RJørgensen
18. PClaes
19. MDShriver
20. CValdiosera
21. MGNetea
22. JApel
23. KLidén
24. BSkar
25. JStorå
26. AGötherström
27. MJakobsson
2018Population genomics of Mesolithic Scandinavia: Investigating early postglacial migration routes and high-latitude adaptation16PLOS Biology
1. WHaak
2. OBalanovsky
3. JJSanchez
4. SKoshel
5. VZaporozhchenko
6. CJAdler
7. CSIDer Sarkissian
8. GBrandt
9. CSchwarz
10. NNicklisch
11. VDresely
12. BFritsch
13. EBalanovska
14. RVillems
15. HMeller
16. KWAlt
17. ACooper
18. Members of the Genographic Consortium
2010Ancient DNA from European early neolithic farmers reveals their near eastern affinities8PLOS Biology
1. WHaak
2. ILazaridis
3. NPatterson
4. NRohland
5. SMallick
6. BLlamas
7. GBrandt
8. SNordenfelt
9. EHarney
10. KStewardson
11. QFu
12. AMittnik
13. EBánffy
14. CEconomou
15. MFrancken
16. SFriederich
17. RGPena
18. FHallgren
19. VKhartanovich
20. AKhokhlov
21. MKunst
22. PKuznetsov
23. HMeller
24. OMochalov
25. VMoiseyev
26. NNicklisch
27. SLPichler
28. RRisch
29. MARojo Guerra
30. CRoth
31. ASzécsényi-Nagy
32. JWahl
33. MMeyer
34. JKrause
35. DBrown
36. DAnthony
37. ACooper
38. KWAlt
39. DReich
2015Massive migration from the steppe was a source for Indo-European languages in Europe522Nature207211
1. BVHalldorsson
2. GPalsson
3. OAStefansson
4. HJonsson
5. MTHardarson
6. HPEggertsson
7. BGunnarsson
8. AOddsson
9. GHHalldorsson
10. FZink
11. SAGudjonsson
12. MLFrigge
13. GThorleifsson
14. ASigurdsson
15. SNStacey
16. PSulem
17. GMasson
18. AHelgason
19. DFGudbjartsson
20. UThorsteinsdottir
21. KStefansson
2019Characterizing mutagenic effects of recombination through a sequence-level genetic map363Science (New York, N.Y.)
1. ÉHarney
2. NPatterson
3. DReich
4. JWakeley
2021Assessing the performance of qpAdm: a statistical tool for studying population admixture217Genetics
1. GHellenthal
2. GBJBusby
3. GBand
4. JFWilson
5. CCapelli
6. DFalush
7. SMyers
2014A genetic atlas of human admixture history343Science (New York, N.Y.)747751
1. BMHenn
2. LRBotigué
3. SGravel
4. WWang
5. ABrisbin
6. JKByrnes
7. KFadhlaoui-Zid
8. PAZalloua
9. AMoreno-Estrada
10. JBertranpetit
11. CDBustamante
12. DComas
2012Genomic ancestry of North Africans supports back-to-Africa migrations8PLOS Genetics
1. ZHofmanová
2. SKreutzer
3. GHellenthal
4. CSell
5. YDiekmann
6. DDíez-Del-Molino
7. LDorp
8. SLópez
9. AKousathanas
10. VLink
11. KKirsanow
12. LMCassidy
13. RMartiniano
14. MStrobel
15. AScheu
16. KKotsakis
17. PHalstead
18. STriantaphyllou
19. NKyparissi-Apostolika
20. DUrem-Kotsou
21. CZiota
22. FAdaktylou
23. SGopalan
24. DMBobo
25. LWinkelbach
26. JBlöcher
27. MUnterländer
28. CLeuenberger
29. ÇÇilingiroğlu
30. BHorejs
31. FGerritsen
32. SJShennan
33. DGBradley
34. MCurrat
35. KRVeeramah
36. DWegmann
37. MGThomas
38. CPapageorgopoulou
39. JBurger
2016Early farmers from across Europe directly descended from Neolithic Aegeans113PNAS68866891
1. ERJones
2. GGonzalez-Fortes
3. SConnell
4. VSiska
5. AEriksson
6. RMartiniano
7. RLMcLaughlin
8. MGallego Llorente
9. LMCassidy
10. CGamba
11. TMeshveliani
12. OBar-Yosef
13. WMüller
14. ABelfer-Cohen
15. ZMatskevich
16. NJakeli
17. TFGHigham
18. MCurrat
19. DLordkipanidze
20. MHofreiter
21. AManica
22. RPinhasi
23. DGBradley
2015Upper Palaeolithic genomes reveal deep roots of modern Eurasians6Nature Communications
1. HJónsson
2. PSulem
3. BKehr
4. SKristmundsdottir
5. FZink
6. EHjartarson
7. MTHardarson
8. KEHjorleifsson
9. HPEggertsson
10. SAGudjonsson
11. LDWard
12. GAArnadottir
13. EAHelgason
14. HHelgason
15. AGylfason
16. AJonasdottir
17. AJonasdottir
18. TRafnar
19. MFrigge
20. SNStacey
21. OTh Magnusson
22. UThorsteinsdottir
23. GMasson
24. AKong
25. BVHalldorsson
26. AHelgason
27. DFGudbjartsson
28. KStefansson
2017Parental influence on human germline de novo mutations in 1,548 trios from Iceland549Nature519522
1. NKashuba
2. EKırdök
3. HDamlien
4. MAManninen
5. BNordqvist
6. PPersson
7. AGötherström
2019Ancient DNA from mastics solidifies connection between material culture and genetics of mesolithic hunter-gatherers in Scandinavia2Communications Biology
1. ASKassian
2. MZhivlov
3. GStarostin
4. AATrofimov
5. PAKocharov
6. AKuritsyna
7. MNSaenko
2021Rapid radiation of the inner Indo-European languages: an advanced approach to Indo-European lexicostatistics59Linguistics949979
1. GMKılınç
2. AOmrak
3. FÖzer
4. TGünther
5. AMBüyükkarakaya
6. EBıçakçı
7. DBaird
8. HMDönertaş
9. AGhalichi
10. RYaka
11. DKoptekin
12. SCAçan
13. PParvizi
14. MKrzewińska
15. EADaskalaki
16. EYüncü
17. NDDağtaş
18. AFairbairn
19. JPearson
20. GMustafaoğlu
21. YSErdal
22. YGÇakan
23. İTogan
24. MSomel
25. JStorå
26. MJakobsson
27. AGötherström
2016The Demographic Development of the First Farmers in Anatolia26Current Biology26592666
1. JZLi
2. DMAbsher
3. HTang
4. AMSouthwick
5. AMCasto
6. SRamachandran
7. HMCann
8. GSBarsh
9. MFeldman
10. LLCavalli-Sforza
11. RMMyers
2008Worldwide human relationships inferred from genome-wide patterns of variation319Science (New York, N.Y.)11001104
1. MLipson
2. ASzécsényi-Nagy
3. SMallick
4. APósa
5. BStégmár
6. VKeerl
7. NRohland
8. KStewardson
9. MFerry
10. MMichel
11. JOppenheimer
12. NBroomandkhoshbacht
13. EHarney
14. SNordenfelt
15. BLlamas
16. BGusztáv Mende
17. KKöhler
18. KOross
19. MBondár
20. TMarton
21. AOsztás
22. JJakucs
23. TPaluch
24. FHorváth
25. PCsengeri
26. JKoós
27. KSebők
28. AAnders
29. PRaczky
30. JRegenye
31. JPBarna
32. SFábián
33. GSerlegi
34. ZToldi
35. EGyöngyvér Nagy
36. JDani
37. EMolnár
38. GPálfi
39. LMárk
40. BMelegh
41. ZBánfai
42. LDomboróczki
43. JFernández-Eraso
44. JAntonio Mujika-Alustiza
45. CAlonso Fernández
46. JJiménez Echevarría
47. RBollongino
48. JOrschiedt
49. KSchierhold
50. HMeller
51. ACooper
52. JBurger
53. EBánffy
54. KWAlt
55. CLalueza-Fox
56. WHaak
57. DReich
2017Parallel palaeogenomic transects reveal complex genetic history of early European farmers551Nature368372
1. PRLoh
2. MLipson
3. NPatterson
4. PMoorjani
5. JKPickrell
6. DReich
7. BBerger
2013Inferring admixture histories of human populations using linkage disequilibrium193Genetics12331254
1. HMalmström
2. TGünther
3. EMSvensson
4. AJuras
5. MFraser
6. ARMunters
7. ŁPospieszny
8. MTõrv
9. JLindström
10. AGötherström
11. JStorå
12. MJakobsson
2019The genomic ancestry of the Scandinavian Battle Axe Culture people and their relation to the broader Corded Ware horizon286Proceedings. Biological Sciences
1. AMittnik
2. CCWang
3. SPfrengle
4. MDaubaras
5. GZariņa
6. FHallgren
7. RAllmäe
8. VKhartanovich
9. VMoiseyev
10. MTõrv
11. AFurtwängler
12. AAndrades Valtueña
13. MFeldman
14. CEconomou
15. MOinonen
16. AVasks
17. EBalanovska
18. DReich
19. RJankauskas
20. WHaak
21. SSchiffels
22. JKrause
2018The genetic prehistory of the Baltic Sea region9Nature Communications
1. PMoorjani
2. NPatterson
3. JNHirschhorn
4. AKeinan
5. LHao
6. GAtzmon
7. EBurns
8. HOstrer
9. ALPrice
10. DReich
2011The history of African gene flow into Southern Europeans, Levantines, and Jews7PLOS Genetics
1. PMoorjani
2. SSankararaman
3. QFu
4. MPrzeworski
5. NPatterson
6. DReich
2016A genetic method for dating ancient genomes provides A direct estimate of human generation interval in the last 45,000 years113PNAS56525657
1. NLMorgunova
2. OSKhokhlova
2016Chronology and Periodization of the Pit-Grave Culture in the Region Between the Volga and Ural Rivers Based on Radiocarbon Dating and Paleopedological Research55Radiocarbon12861296
1. VMNarasimhan
2. NPatterson
3. PMoorjani
4. NRohland
5. RBernardos
6. SMallick
7. ILazaridis
8. NNakatsuka
9. IOlalde
10. MLipson
11. AMKim
12. LMOlivieri
13. ACoppa
14. MVidale
15. JMallory
16. VMoiseyev
17. EKitov
18. JMonge
19. NAdamski
20. NAlex
21. NBroomandkhoshbacht
22. FCandilio
23. KCallan
24. OCheronet
25. BJCulleton
26. MFerry
27. DFernandes
28. SFreilich
29. BGamarra
30. DGaudio
31. MHajdinjak
32. ÉHarney
33. TKHarper
34. DKeating
35. AMLawson
36. MMah
37. KMandl
38. MMichel
39. MNovak
40. JOppenheimer
41. NRai
42. KSirak
43. VSlon
44. KStewardson
45. FZalzala
46. ZZhang
47. GAkhatov
48. ANBagashev
49. ABagnera
50. BBaitanayev
51. JBendezu-Sarmiento
52. AABissembaev
53. GLBonora
54. TTChargynov
55. TChikisheva
56. PKDashkovskiy
57. ADerevianko
58. MDobeš
59. KDouka
60. NDubova
61. MNDuisengali
62. DEnshin
63. AEpimakhov
64. AVFribus
65. DFuller
66. AGoryachev
67. AGromov
68. SPGrushin
69. BHanks
70. MJudd
71. EKazizov
72. AKhokhlov
73. APKrygin
74. EKupriyanova
75. PKuznetsov
76. DLuiselli
77. FMaksudov
78. AMMamedov
79. TBMamirov
80. CMeiklejohn
81. DCMerrett
82. RMicheli
83. OMochalov
84. SMustafokulov
85. ANayak
86. DPettener
87. RPotts
88. DRazhev
89. MRykun
90. SSarno
91. TMSavenkova
92. KSikhymbaeva
93. SMSlepchenko
94. OASoltobaev
95. NStepanova
96. SSvyatko
97. KTabaldiev
98. MTeschler-Nicola
99. AATishkin
100. VVTkachev
101. SVasilyev
102. PVelemínský
103. DVoyakin
104. AYermolayeva
105. MZahir
106. VSZubkov
107. AZubova
108. VSShinde
109. CLalueza-Fox
110. MMeyer
111. DAnthony
112. NBoivin
113. KThangaraj
114. DJKennett
115. MFrachetti
116. RPinhasi
117. DReich
2019The formation of human populations in South and Central Asia365Science (New York, N.Y.)
1. IOlalde
2. SBrace
3. MEAllentoft
4. IArmit
5. KKristiansen
6. TBooth
7. NRohland
8. SMallick
9. ASzécsényi-Nagy
10. AMittnik
11. EAltena
12. MLipson
13. ILazaridis
14. TKHarper
15. NPatterson
16. NBroomandkhoshbacht
17. YDiekmann
18. ZFaltyskova
19. DFernandes
20. MFerry
21. EHarney
22. PKnijff
23. MMichel
24. JOppenheimer
25. KStewardson
26. ABarclay
27. KWAlt
28. CLiesau
29. PRíos
30. CBlasco
31. JVMiguel
32. RMGarcía
33. AAFernández
34. EBánffy
35. MBernabò-Brea
36. DBilloin
37. CBonsall
38. LBonsall
39. TAllen
40. LBüster
41. SCarver
42. LCNavarro
43. OECraig
44. GTCook
45. BCunliffe
46. ADenaire
47. KEDinwiddy
48. NDodwell
49. MErnée
50. CEvans
51. MKuchařík
52. JFFarré
53. CFowler
54. MGazenbeek
55. RGPena
56. MHaber-Uriarte
57. EHaduch
58. GHey
59. NJowett
60. TKnowles
61. KMassy
62. SPfrengle
63. PLefranc
64. OLemercier
65. ALefebvre
66. CHMartínez
67. VGOlmo
68. ABRamírez
69. JLMaurandi
70. TMajó
71. JIMcKinley
72. KMcSweeney
73. BGMende
74. AModi
75. GKulcsár
76. VKiss
77. ACzene
78. RPatay
79. AEndrődi
80. KKöhler
81. THajdu
82. TSzeniczey
83. JDani
84. ZBernert
85. MHoole
86. OCheronet
87. DKeating
88. PVelemínský
89. MDobeš
90. FCandilio
91. FBrown
92. RFFernández
93. A-MHerrero-Corral
94. STusa
95. ECarnieri
96. LLentini
97. AValenti
98. AZanini
99. CWaddington
100. GDelibes
101. EGuerra-Doce
102. BNeil
103. MBrittain
104. MLuke
105. RMortimer
106. JDesideri
107. MBesse
108. GBrücken
109. MFurmanek
110. AHałuszko
111. MMackiewicz
112. ARapiński
113. SLeach
114. ISoriano
115. KTLillios
116. JLCardoso
117. MPPearson
118. PWłodarczak
119. TDPrice
120. PPrieto
121. P-JRey
122. RRisch
123. MARojo Guerra
124. ASchmitt
125. JSerralongue
126. AMSilva
127. VSmrčka
128. LVergnaud
129. JZilhão
130. DCaramelli
131. THigham
132. MGThomas
133. DJKennett
134. HFokkens
135. VHeyd
136. ASheridan
137. K-GSjögren
138. PWStockhammer
139. JKrause
140. RPinhasi
141. WHaak
142. IBarnes
143. CLalueza-Fox
144. DReich
2018The Beaker phenomenon and the genomic transformation of northwest Europe555Nature190196
1. IOlalde
2. SMallick
3. NPatterson
4. NRohland
5. VVillalba-Mouco
6. MSilva
7. KDulias
8. CJEdwards
9. FGandini
10. MPala
11. PSoares
12. MFerrando-Bernal
13. NAdamski
14. NBroomandkhoshbacht
15. OCheronet
16. BJCulleton
17. DFernandes
18. AMLawson
19. MMah
20. JOppenheimer
21. KStewardson
22. ZZhang
23. JMJiménez Arenas
24. IJToro Moyano
25. DCSalazar-García
26. PCastanyer
27. MSantos
28. JTremoleda
29. MLozano
30. PGarcía Borja
31. JFernández-Eraso
32. JAMujika-Alustiza
33. CBarroso
34. FJBermúdez
35. EViguera Mínguez
36. JBurch
37. NCoromina
38. DVivó
39. ACebrià
40. JMFullola
41. OGarcía-Puchol
42. JIMorales
43. FXOms
44. TMajó
45. JMVergès
46. ADíaz-Carvajal
47. IOllich-Castanyer
48. FJLópez-Cachero
49. AMSilva
50. CAlonso-Fernández
51. GCastro
52. JJiménez Echevarría
53. AMoreno-Márquez
54. GPascual Berlanga
55. PRamos-García
56. JRamos-Muñoz
57. EVijande Vila
58. GAguilella Arzo
59. ÁEsparza Arroyo
60. KTLillios
61. JMack
62. JVelasco-Vázquez
63. AWaterman
64. LLugo Enrich
65. MBenito Sánchez
66. BAgustí
67. FCodina
68. GPrado
69. AEstalrrich
70. ÁFernández Flores
71. CFinlayson
72. GFinlayson
73. SFinlayson
74. FGiles-Guzmán
75. ARosas
76. VBarciela González
77. GGarcía Atiénzar
78. MSHernández Pérez
79. ALlanos
80. YCarrión Marco
81. ICollado Beneyto
82. DLópez-Serrano
83. MSanz Tormo
84. ACValera
85. CBlasco
86. CLiesau
87. PRíos
88. JDaura
89. MJPedro Michó
90. AADiez-Castillo
91. RFlores Fernández
92. JFrancès Farré
93. RGarrido-Pena
94. VSGonçalves
95. EGuerra-Doce
96. AMHerrero-Corral
97. JJuan-Cabanilles
98. DLópez-Reyes
99. SBMcClure
100. MMerino Pérez
101. AOliver Foix
102. MSanz Borràs
103. ACSousa
104. JMVidal Encinas
105. DJKennett
106. MBRichards
107. KWerner Alt
108. WHaak
109. RPinhasi
110. CLalueza-Fox
111. DReich
2019The genomic history of the Iberian Peninsula over the past 8000 years363Science (New York, N.Y.)12301234
1. LOrlando
2. RAllaby
3. PSkoglund
4. CDer Sarkissian
5. PWStockhammer
6. MCArcos
7. QFu
8. JKrause
9. EWillerslev
10. ACStone
11. CWarinner
2021Ancient DNA analysis1Nature Reviews Methods Primers
1. MROsborne
2. GKSmyth
2016An algorithm for exponential fitting revisited23Journal of Applied Probability419430
1. LPagani
2. VColonna
3. CTyler-Smith
4. QAyub
2017An Ethnolinguistic and Genetic Perspective on the Origins of the Dravidian-Speaking Brahui in Pakistan97Man in India267278
1. NPatterson
2. PMoorjani
3. YLuo
4. SMallick
5. NRohland
6. YZhan
7. TGenschoreck
8. TWebster
9. DReich
2012Ancient admixture in human history192Genetics10651093
1. JKPickrell
2. DReich
2014Toward a new history and geography of human genes informed by ancient DNA30Trends in Genetics377389
1. ALPrice
2. ATandon
3. NPatterson
4. KCBarnes
5. NRafaels
6. IRuczinski
7. THBeaty
8. RMathias
9. DReich
10. SMyers
2009Sensitive detection of chromosomal segments of distinct ancestry in admixed populations5PLOS Genetics
1. MRivollat
2. CJeong
3. SSchiffels
4. İKüçükkalıpçı
5. M-HPemonge
6. ABRohrlach
7. KWAlt
8. DBinder
9. SFriederich
10. EGhesquière
11. DGronenborn
12. LLaporte
13. PLefranc
14. HMeller
15. HRéveillas
16. ERosenstock
17. SRottier
18. CScarre
19. LSoler
20. JWahl
21. JKrause
22. M-FDeguilloux
23. WHaak
2020Ancient genome-wide DNA from France highlights the complexity of interactions between Mesolithic hunter-gatherers and Neolithic farmers6Science Advances
1. PSkoglund
2. HMalmström
3. MRaghavan
4. JStorå
5. PHall
6. EWillerslev
7. MTPGilbert
8. AGötherström
9. MJakobsson
2012Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe336Science (New York, N.Y.)466469
1. PSkoglund
2. HMalmström
3. AOmrak
4. MRaghavan
5. CValdiosera
6. TGünther
7. PHall
8. KTambets
9. JParik
10. K-GSjögren
11. JApel
12. EWillerslev
13. JStorå
14. AGötherström
15. MJakobsson
2014Genomic diversity and admixture differs for Stone-Age Scandinavian foragers and farmers344Science (New York, N.Y.)747750
1. RTournebize
2. GChu
3. PMoorjani
2020Reconstructing the History of Founder Events Using Genome-Wide Patterns of Allele Sharing across Individuals1Evolutionary Biology
1. C-CWang
2. SReinhold
3. AKalmykov
4. AWissgott
5. GBrandt
6. CJeong
7. OCheronet
8. MFerry
9. EHarney
10. DKeating
11. SMallick
12. NRohland
13. KStewardson
14. ARKantorovich
15. VEMaslov
16. VGPetrenko
17. VRErlikh
18. BCAtabiev
19. RGMagomedov
20. PLKohl
21. KWAlt
22. SLPichler
23. CGerling
24. HMeller
25. BVardanyan
26. LYeganyan
27. ADRezepkin
28. DMariaschk
29. NBerezina
30. JGresky
31. KFuchs
32. CKnipper
33. SSchiffels
34. EBalanovska
35. OBalanovsky
36. IMathieson
37. THigham
38. YBBerezin
39. ABuzhilova
40. VTrifonov
41. RPinhasi
42. ABBelinskij
43. DReich
44. SHansen
45. JKrause
46. WHaak
2019Ancient human genome-wide data from a 3000-year interval in the Caucasus corresponds with eco-geographic regions10Nature Communications
1. MJWilligen
2. Svan
2001New radiocarbon evidence for european Bell Beakers and the consequences for the diffusion of the Bell Beaker Phenomenon1Bell Beakers Today