The date of interbreeding between Neandertals and modern humans

The date of interbreeding between Neandertals and modern humans

Sriram Sankararaman, Nick Patterson, Heng Li, Svante Pääbo, David Reich
(Submitted on 10 Aug 2012)

Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000-86,000 years before the present (BP), and most likely 47,000-65,000 years ago. This supports the recent interbreeding hypothesis, and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa.

Genealogies of rapidly adapting populations

Genealogies of rapidly adapting populations
Richard A. Neher, Oskar Hallatschek
(Submitted on 15 Aug 2012)

The genetic diversity of a species is shaped by its recent evolutionary history and can be used to infer demographic events or selective sweeps. Most inference methods are based on the null hypothesis that natural selection is a weak evolutionary force. However, many species, particularly pathogens, are under continuous pressure to adapt in response to changing environments. A statistical framework for inference from diversity data of such populations is currently lacking. Toward this goal, we explore the properties of genealogies that emerge from models of continual adaptation. We show that lineages trace back to a small pool of highly fit ancestors, in which simultaneous coalescence of more than two lineages frequently occurs. While such multiple mergers are unlikely under the neutral coalescent, they create a unique genetic footprint in adapting populations. The site frequency spectrum of derived neutral alleles, for example, is non-monotonic and has a peak at high frequencies, whereas Tajima’s D becomes more and more negative with increasing sample size. Since multiple merger coalescents emerge in various evolutionary scenarios characterized by sustained selection pressures, we argue that they should be considered as null-models for adapting populations.

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations.

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations.

Michael M. Desai, Aleksandra M. Walczak, Daniel S. Fisher
(Submitted on 16 Aug 2012)
Positive selection distorts the structure of genealogies and hence alters patterns of genetic variation within a population. Most analyses of these distortions focus on the signatures of hitchhiking due to hard or soft selective sweeps at a single genetic locus. However, in linked regions of rapidly adapting genomes, multiple beneficial mutations at different loci can segregate simultaneously within the population, an effect known as clonal interference. This leads to a subtle interplay between hitchhiking and interference effects, which leads to a unique signature of rapid adaptation on genetic variation both at the selected sites and at linked neutral loci. Here, we introduce an effective coalescent theory (a “fitness-class coalescent”) that describes how positive selection at many perfectly linked sites alters the structure of genealogies. We use this theory to calculate several simple statistics describing genetic variation within a rapidly adapting population, and to implement efficient backwards-time coalescent simulations which can be used to predict how clonal interference alters the expected patterns of molecular evolution.

Transposable sequence evolution is driven by gene context

Transposable sequence evolution is driven by gene context

Anna-Sophie Fiston-Lavier, Charles E. Vejnar, Hadi Quesneville
(Submitted on 2 Sep 2012)

Transposable elements (TEs) in eukaryote genomes are quantitatively the main components affecting genome size, structure and expression. The dynamics of their insertion and deletion depend on diverse factors varying in strength and nature along the genome. We address here how TE sequence evolution is affected by neighboring genes and the chromatin status (euchromatin or heterochromatin) at their insertion site. We estimated the rates of evolution of TE sequences in Arabidopsis thaliana, and found that they depend on the distance to the nearest genes: TEs located close to genes evolve faster than those that are more distant. Consequently, TE sequences in heterochromatic regions, which are gene-poor regions, are surprisingly younger and longer than those elsewhere. We present a model of TE sequence dynamics in TE-rich genomes, such as maize and wheat, and in TE-poor genomes such as fly and A. thaliana.

Evolutionary genomics of transposable elements in Saccharomyces cerevisiae

Evolutionary genomics of transposable elements in Saccharomyces cerevisiae

Martin Carr, Douda Bensasson, Casey M. Bergman
(Submitted on 1 Sep 2012)

Saccharomyces cerevisiae is one of the premier model systems for studying the genomics and evolution of transposable elements. The availability of the S. cerevisiae genome led to many insights into its five known transposable element families (Ty1-Ty5) in the years shortly after its completion. However, subsequent advances in bioinformatics tools for analysing transposable elements and the recent availability of genome sequences for multiple strains and species of yeast motivates new investigations into Ty evolution in S. cerevisiae. Here we provide a comprehensive phylogenetic and population genetic analysis of Ty families in S. cerevisiae based on a reannotation of Ty elements in the S288c reference genome. We show that previous annotation efforts have underestimated the total copy number of Ty elements for all known families. In addition, we identify a new family of Ty3-like elements related to the S. paradoxus Ty3p which is composed entirely of degenerate solo LTRs. Phylogenetic analyses of LTR sequences identified three families with short-branch, recently active clades nested among long branch, inactive insertions (Ty1, Ty3, Ty4), one family with essentially all recently active elements (Ty2) and two families with only inactive elements (Ty3p and Ty5). Population genomic data from 38 additional strains of S. cerevisiae show that elements present in active clades are predominantly polymorphic, whereas most of the inactive elements are fixed. Finally, we use comparative genomic data to provide evidence that the Ty2 and Ty3p families have arisen in the S. cerevisiae genome by horizontal transfer. Our results demonstrate that the genome of a single individual contains important information about the state of TE population dynamics within a species and suggest that horizontal transfer may play an important role in shaping the diversity of transposable elements in unicellular eukaryotes.

Our paper: Inference of population splits and mixtures from genome-wide allele frequency data

[This author post is by Joe Pickrell (@joe_pickrell) on Inference of population splits and mixtures from genome-wide allele frequency data, available from arXiv here]

Early last year, I began working (with Jonathan Pritchard) on methods for using genetics to understand population history. As we describe in our preprint, our approach was to build a parameterized model to describe the patterns of correlation in allele frequencies across populations. This type of approach dates back to brilliant work on building population trees by Luca Cavalli-Sforza, AWF Edwards, and Joe Felsenstein from around 40 years ago. The key to our work is that instead of representing history as a bifurcating tree, we additionally allow “migration events” to model admixture between populations. The output from our model (called TreeMix, and available here) is something like that shown below.

A graph of human population history, allowing 10 migration events. Populations are colored according to geographic region.

We applied this method to both human and dog history, with a mix of both known and novel historical results. I thought here I’d speculate about a couple of the novel results:

1. In the human data (see the graph above), one of the more surprising things to me was the arrow to the Cambodian population. The Cambodians appear to be an admixed population, with ~85% of their ancestry related to other southeast Asian populations (like the Dai) and ~15% of their ancestry from…it’s not totally clear. As you can see in the graph, the source of this admixture appears to be a population not particularly closely related to any other population in these data. So who was this population? A speculation is that this represents ancestry from a population related to the “Ancestral South Indian” population described by Reich et al. (2009), though other sources (e.g. Oceania) are plausible.

2. In the dog data (see Figures 5 and 6 in the pre-print), the most overwhelming signal in the data is that the Basenji, a central African dog breed, appears to trace ~25% of its ancestry to admixture with wolves since domestication. This signal is made somewhat surprising by the fact that there are no wolf populations currently living in Africa, which would seem to be a formidable barrier to admixture with an African dog breed. A hint for what’s going on here is provided by vonHoldt et al. (2010), who show that the basenji have an unusual amount of shared variation with wolves from the Middle East. One speculation, then, is that as the ancestors of the Basenji moved into Africa, they came into contact with Middle Eastern wolves and admixed with them.

Other suggestions for scenarios to explain these results are of course welcome. Overall, I’m hopeful that approaches like TreeMix will eventually supplant “standard” tree-building algorithms for situations in which gene flow is known to occur, though of course further development is necessary before this becomes reality.

Joe Pickrell

How to infer relative fitness from a sample of genomic sequences

How to infer relative fitness from a sample of genomic sequences
Adel Dayarian, Boris I Shraiman
(Submitted on 29 Aug 2012)

Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman’s coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we shall demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test {\it in silico} using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator which identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with the actual fitness, with top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3 depending on the mutation/selection parameters. The ranking also enables to predict the common genotype of the future population. While the inference accuracy increases monotonically with sample size, sample sizes of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks.

The impact of deleterious passenger mutations on cancer progression.

The impact of deleterious passenger mutations on cancer progression. (arXiv:1208.6068v1 [q-bio.PE])
by Christopher D McFarland, Gregory V Kryukov, Shamil Sunyaev, Leonid Mirny

Cancer progression is driven by a small number of genetic alterations accumulating in a neoplasm. These few driver alterations reside in a cancer genome alongside tens of thousands of other mutations that are widely believed to have no role in cancer and termed passengers. Many passengers, however, fall within protein coding genes and other functional elements and can possibly have deleterious effects on cancer cells. Here we investigate a potential of mildly deleterious passengers to accumulate and alter the course of neoplastic progression. Our approach combines evolutionary simulations of cancer progression with the analysis of cancer sequencing data. In our simulations, individual cells stochastically divide, acquire advantageous driver and deleterious passenger mutations, or die. Surprisingly, despite selection against them, passengers accumulate and largely evade selection during progression. Although individually weak, the collective burden of passengers alters the course of progression leading to several phenomena observed in oncology that cannot be explained by a traditional driver-centric view. We tested predictions of the model using cancer genomic data. We find that many passenger mutations are likely to be damaging and that, in agreement with the model, they have largely evaded purifying selection. Finally, we used our model to explore cancer treatments that exploit the load of passengers by either 1) increasing the mutation rate; or 2) exacerbating their deleterious effects. While both approaches lead to cancer regression, the later leads to less frequent relapse. Our results suggest a new framework for understanding cancer progression as a balance of driver and passenger mutations.

The genetic prehistory of southern Africa

The genetic prehistory of southern Africa

Joseph K. Pickrell, Nick Patterson, Chiara Barbieri, Falko Berthold, Linda Gerlach, Mark Lipson, Po-Ru Loh, Tom Güldemann, Blesswell Kure, Sununguko Wata Mpoloka, Hirosi Nakagawa, Christfried Naumann, Joanna L. Mountain, Carlos D. Bustamante, Bonnie Berger, Brenna M. Henn, Mark Stoneking, David Reich, Brigitte Pakendorf
(Submitted on 23 Jul 2012)

The hunter-gatherer populations of southern and eastern Africa are known to harbor some of the most ancient human lineages, but their historical relationships are poorly understood. We report data from 22 populations analyzed at over half a million single nucleotide polymorphisms (SNPs), using a genome-wide array designed for studies of history. The southern Africans-here called Khoisan-fall into two groups, loosely corresponding to the northwestern and southeastern Kalahari, which we show separated within the last 30,000 years. All individuals derive at least a few percent of their genomes from admixture with non-Khoisan populations that began 1,200 years ago. In addition, the Hadza, an east African hunter-gatherer population that speaks a language with click consonants, derive about a quarter of their ancestry from admixture with a population related to the Khoisan, implying an ancient genetic link between southern and eastern Africa.

The geography of recent genetic ancestry across Europe

The geography of recent genetic ancestry across Europe

Peter Ralph, Graham Coop
(Submitted on 16 Jul 2012 (v1), last revised 19 Jul 2012 (this version, v2))

The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.