Our paper: Genealogies of rapidly adapting populations

[This author post is by Richard Neher on his paper with Oskar Hallatschek: Genealogies of rapidly adapting populations arXived here.


That selection distorts genealogies is a well-known fact, but properties of genealogies shaped by selection are poorly understood. We set out to investigate genealogies in a simple model of rapid adaptation in asexuals: The fitness of individuals is changed by small amounts though frequent mutation, while the overall population size is kept constant by a carrying capacity. We simulated the model and tracked genealogies.

The genealogies we found have two striking features incompatible with the standard neutral coalescent: (i) Many lineages merge almost simultaneously. (ii) Forward in time, the trees often branch very asymmetrically, i.e., almost the entire population descends from one branch while the other branches share the remaining minority. Using branching process approximations and a mapping to range expansion problems (see Brunet et al, (2007)), we show that the genealogies are similar to those expected from the Bolthausen-Sznitman coalescent (BSC), a special case of multiple merger coalescents. Very similar conclusions have been reached in another recent preprint by Desai, Walczak and Fisher. The BSC is well studied and we can build on many results from the mathematical literature.

The difference between Kingman and multiple merger coalescence is closely related to the distinct stochastic properties of genetic drift and draft. While drift describes short term fluctuations in offspring number which are bounded, draft refers to stochasticity through linked selection. Draft can result in fluctuations of the same order as the population size. Even if very rare, such large fluctuations are important. Lumping drift and draft together and labeling the result as effective population size is rarely helpful and often confusing.

Why should we care? We often want to learn about past dynamics from snapshots of populations (sequence samples). To this end, we compare the diversity patterns in the sample to model predictions and infer model parameters. If we use an inappropriate model, we get meaningless answers. Furthermore, some events that are very unlikely under Kingman’s coalescent are quite common when multiple mergers are allowed. Consider for example a lone haplotype in a large sample that connects to the root of the tree. This is very unlikely in neutral coalescent models and one might take it as evidence for immigration from a diverged population. If multiple mergers dominate coalescence, this does not come as a surprise. Similarly, an excess of singletons is not necessarily evidence for expanding populations or deleterious mutations but might be due to draft. I wonder whether more potential pitfalls of this sort exist.

Richard Neher

The date of interbreeding between Neandertals and modern humans

The date of interbreeding between Neandertals and modern humans

Sriram Sankararaman, Nick Patterson, Heng Li, Svante Pääbo, David Reich
(Submitted on 10 Aug 2012)

Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000-86,000 years before the present (BP), and most likely 47,000-65,000 years ago. This supports the recent interbreeding hypothesis, and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa.

Genealogies of rapidly adapting populations

Genealogies of rapidly adapting populations
Richard A. Neher, Oskar Hallatschek
(Submitted on 15 Aug 2012)

The genetic diversity of a species is shaped by its recent evolutionary history and can be used to infer demographic events or selective sweeps. Most inference methods are based on the null hypothesis that natural selection is a weak evolutionary force. However, many species, particularly pathogens, are under continuous pressure to adapt in response to changing environments. A statistical framework for inference from diversity data of such populations is currently lacking. Toward this goal, we explore the properties of genealogies that emerge from models of continual adaptation. We show that lineages trace back to a small pool of highly fit ancestors, in which simultaneous coalescence of more than two lineages frequently occurs. While such multiple mergers are unlikely under the neutral coalescent, they create a unique genetic footprint in adapting populations. The site frequency spectrum of derived neutral alleles, for example, is non-monotonic and has a peak at high frequencies, whereas Tajima’s D becomes more and more negative with increasing sample size. Since multiple merger coalescents emerge in various evolutionary scenarios characterized by sustained selection pressures, we argue that they should be considered as null-models for adapting populations.

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations.

Genetic Diversity and the Structure of Genealogies in Rapidly Adapting Populations.

Michael M. Desai, Aleksandra M. Walczak, Daniel S. Fisher
(Submitted on 16 Aug 2012)
Positive selection distorts the structure of genealogies and hence alters patterns of genetic variation within a population. Most analyses of these distortions focus on the signatures of hitchhiking due to hard or soft selective sweeps at a single genetic locus. However, in linked regions of rapidly adapting genomes, multiple beneficial mutations at different loci can segregate simultaneously within the population, an effect known as clonal interference. This leads to a subtle interplay between hitchhiking and interference effects, which leads to a unique signature of rapid adaptation on genetic variation both at the selected sites and at linked neutral loci. Here, we introduce an effective coalescent theory (a “fitness-class coalescent”) that describes how positive selection at many perfectly linked sites alters the structure of genealogies. We use this theory to calculate several simple statistics describing genetic variation within a rapidly adapting population, and to implement efficient backwards-time coalescent simulations which can be used to predict how clonal interference alters the expected patterns of molecular evolution.

Transposable sequence evolution is driven by gene context

Transposable sequence evolution is driven by gene context

Anna-Sophie Fiston-Lavier, Charles E. Vejnar, Hadi Quesneville
(Submitted on 2 Sep 2012)

Transposable elements (TEs) in eukaryote genomes are quantitatively the main components affecting genome size, structure and expression. The dynamics of their insertion and deletion depend on diverse factors varying in strength and nature along the genome. We address here how TE sequence evolution is affected by neighboring genes and the chromatin status (euchromatin or heterochromatin) at their insertion site. We estimated the rates of evolution of TE sequences in Arabidopsis thaliana, and found that they depend on the distance to the nearest genes: TEs located close to genes evolve faster than those that are more distant. Consequently, TE sequences in heterochromatic regions, which are gene-poor regions, are surprisingly younger and longer than those elsewhere. We present a model of TE sequence dynamics in TE-rich genomes, such as maize and wheat, and in TE-poor genomes such as fly and A. thaliana.

Evolutionary genomics of transposable elements in Saccharomyces cerevisiae

Evolutionary genomics of transposable elements in Saccharomyces cerevisiae

Martin Carr, Douda Bensasson, Casey M. Bergman
(Submitted on 1 Sep 2012)

Saccharomyces cerevisiae is one of the premier model systems for studying the genomics and evolution of transposable elements. The availability of the S. cerevisiae genome led to many insights into its five known transposable element families (Ty1-Ty5) in the years shortly after its completion. However, subsequent advances in bioinformatics tools for analysing transposable elements and the recent availability of genome sequences for multiple strains and species of yeast motivates new investigations into Ty evolution in S. cerevisiae. Here we provide a comprehensive phylogenetic and population genetic analysis of Ty families in S. cerevisiae based on a reannotation of Ty elements in the S288c reference genome. We show that previous annotation efforts have underestimated the total copy number of Ty elements for all known families. In addition, we identify a new family of Ty3-like elements related to the S. paradoxus Ty3p which is composed entirely of degenerate solo LTRs. Phylogenetic analyses of LTR sequences identified three families with short-branch, recently active clades nested among long branch, inactive insertions (Ty1, Ty3, Ty4), one family with essentially all recently active elements (Ty2) and two families with only inactive elements (Ty3p and Ty5). Population genomic data from 38 additional strains of S. cerevisiae show that elements present in active clades are predominantly polymorphic, whereas most of the inactive elements are fixed. Finally, we use comparative genomic data to provide evidence that the Ty2 and Ty3p families have arisen in the S. cerevisiae genome by horizontal transfer. Our results demonstrate that the genome of a single individual contains important information about the state of TE population dynamics within a species and suggest that horizontal transfer may play an important role in shaping the diversity of transposable elements in unicellular eukaryotes.

Our paper: Inference of population splits and mixtures from genome-wide allele frequency data

[This author post is by Joe Pickrell (@joe_pickrell) on Inference of population splits and mixtures from genome-wide allele frequency data, available from arXiv here]

Early last year, I began working (with Jonathan Pritchard) on methods for using genetics to understand population history. As we describe in our preprint, our approach was to build a parameterized model to describe the patterns of correlation in allele frequencies across populations. This type of approach dates back to brilliant work on building population trees by Luca Cavalli-Sforza, AWF Edwards, and Joe Felsenstein from around 40 years ago. The key to our work is that instead of representing history as a bifurcating tree, we additionally allow “migration events” to model admixture between populations. The output from our model (called TreeMix, and available here) is something like that shown below.

A graph of human population history, allowing 10 migration events. Populations are colored according to geographic region.

We applied this method to both human and dog history, with a mix of both known and novel historical results. I thought here I’d speculate about a couple of the novel results:

1. In the human data (see the graph above), one of the more surprising things to me was the arrow to the Cambodian population. The Cambodians appear to be an admixed population, with ~85% of their ancestry related to other southeast Asian populations (like the Dai) and ~15% of their ancestry from…it’s not totally clear. As you can see in the graph, the source of this admixture appears to be a population not particularly closely related to any other population in these data. So who was this population? A speculation is that this represents ancestry from a population related to the “Ancestral South Indian” population described by Reich et al. (2009), though other sources (e.g. Oceania) are plausible.

2. In the dog data (see Figures 5 and 6 in the pre-print), the most overwhelming signal in the data is that the Basenji, a central African dog breed, appears to trace ~25% of its ancestry to admixture with wolves since domestication. This signal is made somewhat surprising by the fact that there are no wolf populations currently living in Africa, which would seem to be a formidable barrier to admixture with an African dog breed. A hint for what’s going on here is provided by vonHoldt et al. (2010), who show that the basenji have an unusual amount of shared variation with wolves from the Middle East. One speculation, then, is that as the ancestors of the Basenji moved into Africa, they came into contact with Middle Eastern wolves and admixed with them.

Other suggestions for scenarios to explain these results are of course welcome. Overall, I’m hopeful that approaches like TreeMix will eventually supplant “standard” tree-building algorithms for situations in which gene flow is known to occur, though of course further development is necessary before this becomes reality.

Joe Pickrell