Using haplotype differentiation among hierarchically structured populations for the detection of selection signatures

Using haplotype differentiation among hierarchically structured populations for the detection of selection signatures

Marìa Inès Fariello, Simon Boitard, Hugo Naya, Magali SanCristobal, Bertrand Servin
(Submitted on 29 Oct 2012)

The detection of molecular signatures of selection is one of the major concerns of modern population genetics. A widely used strategy in this context is to compare samples from several populations, and to look for genomic regions with outstanding genetic differentiation between these populations. Genetic differentiation is generally based on allele frequency differences between populations, which are measured by Fst or related statistics. Here we introduce a new statistic, denoted hapFLK, which focuses instead on the differences of haplotype frequencies between populations. In contrast to most existing statistics, hapFLK accounts for the hierarchical structure of the sampled populations. Using computer simulations, we show that each of these two features – the use of haplotype information and of the hierarchical structure of populations – significantly improves the detection power of selected loci, and that combining them in the hapFLK statistic provides even greater power. We also show that hapFLK is robust with respect to bottlenecks and migration and improves over existing approaches in many situations. Finally, we apply hapFLK to a set of six sheep breeds from Northern Europe, and identify seven regions under selection, which include already reported regions but also several new ones. We propose a method to help identifying the population(s) under selection in a detected region, which reveals that in many of these regions selection most likely occurred in more than one population. Furthermore, several of the detected regions correspond to incomplete sweeps, where the favourable haplotype is only at intermediate frequency in the population(s) under selection.


Asexual Evolution Waves: Fluctuations and Universality

Asexual Evolution Waves: Fluctuations and Universality
Daniel S. Fisher
(Submitted on 23 Oct 2012)

In large asexual populations, multiple beneficial mutations arise in the population, compete, interfere with each other, and accumulate on the same genome, before any of them fix. The resulting dynamics, although studied by many authors, is still not fully understood, fundamentally because the effects of fluctuations due to the small numbers of the fittest individuals are large even in enormous populations. In this paper, branching processes and various asymptotic methods for analyzing the stochastic dynamics are further developed and used to obtain information on fluctuations, time dependence, and the distributions of sizes of subpopulations, jumps in the mean fitness, and other properties. The focus is on the behavior of a broad class of models: those with a distribution of selective advantages of available beneficial mutations that falls off more rapidly than exponentially. For such distributions, many aspects of the dynamics are universal – quantitatively so for extremely large populations. On the most important time scale that controls coalescent properties and fluctuations of the speed, the dynamics is reduced to a simple stochastic model that couples the peak and the high-fitness “nose” of the fitness distribution. Extensions to other models and distributions of available mutations are discussed briefly.

The Baldwin effect under multi-peaked fitness landscapes: Phenotypic fluctuation accelerates evolutionary rate

The Baldwin effect under multi-peaked fitness landscapes: Phenotypic fluctuation accelerates evolutionary rate

Nen Saito, Shuji Ishihara, Kunihiko Kaneko
(Submitted on 19 Oct 2012)

Phenotypic fluctuations and plasticity can generally affect the course of evolution, a process known as the Baldwin effect. Several studies have recast this effect and claimed that phenotypic plasticity acceler- ates evolutionary rate (the Baldwin expediting effect); however, the validity of this claim is still controversial. In this study, we investi- gate the evolutionary population dynamics of a quantitative genetic model under a multi-peaked fitness landscape, in order to evaluate the validity of the effect. We provide analytical expressions for the evolutionary rate and average population fitness. Our results indicate that under a multi-peaked fitness landscape, phenotypic fluctuation always accelerates evolutionary rate, but it decreases the average fit- ness. As an extreme case of the trade-off between the rate of evolution and average fitness, phenotypic fluctuation is shown to accelerate the error catastrophe, in which a population fails to sustain a high-fitness peak. In the context of our findings, we discuss the role of phenotypic plasticity in adaptive evolution.

Plump Cutthroat Trout and Thin Rainbow Trout in a Lentic Ecosystem

Plump Cutthroat Trout and Thin Rainbow Trout in a Lentic Ecosystem

Joshua Courtney, Jessica Abbott, Kerri Schmidt, Michael Courtney
(Submitted on 17 Oct 2012)

Background: Much has been written about introduced rainbow trout (Oncorhynchus mykiss) interbreeding and outcompeting cutthroat trout (Oncorhynchus clarkii). However, the specific mechanisms by which rainbow trout and their hybrids outcompete cutthroat trout have not been thoroughly explored, and the published data is limited to lotic ecosystems. Materials and Methods: Samples of rainbow trout and cutthroat trout were obtained from a lentic ecosystem by angling. The total length and weight of each fish was measured and the relative weight of each fish was computed (Anderson R.O., Neumann R.M. 1996. Length, Weight, and Associated Structural Indices, Pp. 447-481. In: Murphy B.E. and Willis D.W. (eds.) Fisheries Techniques, second edition. American Fisheries Society.), along with the mean and uncertainty in the mean for each species. Data from an independent source (K.D. Carlander, 1969. Handbook of Freshwater Fishery Biology, Volume One, Iowa University Press, Ames.) was also used to generate mean weight-length curves, as well as 25th and 75th percentile curves for each species to allow further comparison. Results: The mean relative weight of the rainbow trout was 72.5 (+/- 2.1); whereas, the mean relative weight of the cutthroat trout was 101.0 (+/- 4.9). The rainbow trout were thin; 80% weighed below the 25th percentile. The cutthroat trout were plump; 86% weighed above the 75th percentile, and 29% were above the heaviest recorded specimens at a given length in the Carlander (1969) data set. Conclusion: This data casts doubt on the hypothesis that rainbow trout are strong food competitors with cutthroat trout in lentic ecosystems. On the contrary, in the lake under study, the cutthroat trout seem to be outcompeting rainbow trout for the available food.

Our paper: Integrative modeling of eQTLs and cis-regulatory elements suggest mechanisms underlying cell type specificity of eQTLs

This guest post is by Christopher Brown, Lara Mangravite, and Barbara Engelhardt on their paper: Integrative modeling of eQTLs and cis-regulatory elements suggest mechanisms underlying cell type specificity of eQTLs arXived here.

Why do we study eQTLs? Why don’t we count bristles?

The genetic dissection of complex trait models, independent of the particular phenotype, is useful for improving our understanding of the genetic architecture underlying the biochemical function that regulates complex traits in general. In the last ten years, gene expression levels themselves have emerged as useful phenotypes amenable to genetic dissection with several advantages, most notably that it is easy to accurately quantify tens of thousands of traits simultaneously (indeed even more when we address splicing and promoter usage). While the identification of SNPs that are associated with variation in gene expression (eQTLs) is certainly interesting at this basic level, an additional critical use for eQTL data has emerged. Because the majority of common human phenotypic variation appears to be driven by non-coding sequence variants, eQTL analyses are beginning to help with the mechanistic interpretation of GWAS results. In light of these interests and applications, we believe that eQTL analyses are hampered by (at least) three important limitations, which we have attempted to address in our recent preprint:

(1) Methodological (non) uniformity. Most eQTL studies have been performed by different groups, on different genotyping and gene expression platforms, with different association methods, and using different criteria for defining significance. This lack of uniformity complicates even simple cross study comparisons; for example, what fraction of genes has one or more independently associated eQTL when analyzed across tissues? We address this issue by testing for eQTL associations across a diverse set of cell types using a uniform pipeline with standardized analysis parameters to perform all analytical steps starting from raw data. As a fairly trivial example, our analyses across the eleven studies demonstrated that nearly all of the variation in the proportion of genes with significant eQTL associations identified within each study can be explained by just two factors: study size and replicate gene expression measurements. The proportion of genes with one or more independently associated eQTLs, then, is probably not 5-10% as has been hypothesized, but most or all of them, which we can get a better picture of when we design studies with sufficient power.

(2) Undercharacterized cell specificity. It is generally agreed upon that some eQTLs regulate gene expression in a cell type specific manner. When using eQTLs to interpret the genetic contribution to complex clinical traits, it is important to consider the cell type(s) most relevant to the trait of interest. However, if we don’t know what cell type is responsible for a phenotype or if we don’t have eQTL data for the cell type of interest, we are forced to extrapolate inferences about eQTLs derived from other cell types. By enabling the simultaneous comparison of within and between cell type eQTL replication for multiple cell type combinations and integrating these results with cis-regulatory element (CRE) mapping data from ENCODE, we have addressed several unresolved questions concerning the nature of cell type specific and ubiquitous eQTL SNPs. We find that eQTL-CRE overlap is frequently cell type specific and that this information can be used to predict cell specificity of eQTLs in the absence of additional gene expression data from the cell type of interest. While these results are certainly preliminary (and indeed we see many possible improvements), we hope this will improve the utility of eQTL-GWAS comparisons, particularly in situations where the GWAS cell type of interest lacks eQTL data.

(3) Resolution, causality, and mechanism. Lead tag SNPs are probably causal variants less than 30% of the time. While larger and more diverse genomic sample sets are essential to improve the resolution for identifying causal variants, this is not always possible due to time or budget constraints. However, the application of orthogonal genomic data also has the potential to considerably refine resolution with the added benefit of providing insight into the mechanism through which a causal variant acts. We approach this (as a few other groups have – notably Dan Gaffney et al.) by integrating CRE data into our analyses, because it appears that genetic variants that overlap certain types of CREs are much more likely to be functional than those that do not. We believe that this hypothesis, and the methods used to address it, need to be validated with directed functional assays, but we see no reason to doubt the principle of understanding heritable phenotypes using genotype functional analyses. Furthermore, the analysis of cell specific eQTL data in the context of cell specific CRE data, which is now possible, enables predictions about the regulatory mechanisms that are affected by a specific eQTL, which will allow us to place GWAS hits into pathways or provide other meaningful biological insights.

Why did we submit the paper to arXiv and Haldane’s Sieve?

We are big proponents of open access publication, open data, and transparent methods and analysis. At least part of what we’ve done here is to create a resource that we hope will be useful to the broader community. We are open to pre and post publication review of and commentary on our motivations and methods. Furthermore, we have submitted all of the eQTLs we identify to a database of eQTLs (, and we are currently securing funding to develop open access, online tools to help GWAS researchers follow up specific functional variants using our methods.

Christopher Brown, Lara Mangravite, Barbara Engelhardt

Integrated Nested Laplace Approximation for Bayesian Nonparametric Phylodynamics

Integrated Nested Laplace Approximation for Bayesian Nonparametric Phylodynamics

Julia A. Palacios, Vladimir N. Minin
(Submitted on 16 Oct 2012)

The goal of phylodynamics, an area on the intersection of phylogenetics and population genetics, is to reconstruct population size dynamics from genetic data. Recently, a series of nonparametric Bayesian methods have been proposed for such demographic reconstructions. These methods rely on prior specifications based on Gaussian processes and proceed by approximating the posterior distribution of population size trajectories via Markov chain Monte Carlo (MCMC) methods. In this paper, we adapt an integrated nested Laplace approximation (INLA), a recently proposed approximate Bayesian inference for latent Gaussian models, to the estimation of population size trajectories. We show that when a genealogy of sampled individuals can be reliably estimated from genetic data, INLA enjoys high accuracy and can replace MCMC entirely. We demonstrate significant computational efficiency over the state-of-the-art MCMC methods. We illustrate INLA-based population size inference using simulations and genealogies of hepatitis C and human influenza viruses.

Quantitative analyses of empirical fitness landscapes

Quantitative analyses of empirical fitness landscapes

Ivan G. Szendro, Martijn F. Schenk, Jasper Franke, Joachim Krug, J. Arjan G. M. de Visser
(Submitted on 20 Feb 2012 (v1), last revised 17 Oct 2012 (this version, v2))

The concept of a fitness landscape is a powerful metaphor that offers insight into various aspects of evolutionary processes and guidance for the study of evolution. Until recently, empirical evidence on the ruggedness of these landscapes was lacking, but since it became feasible to construct all possible genotypes containing combinations of a limited set of mutations, the number of studies has grown to a point where a classification of landscapes becomes possible. The aim of this review is to identify measures of epistasis that allow a meaningful comparison of fitness landscapes and then apply them to the empirical landscapes to discern factors that affect ruggedness. The various measures of epistasis that have been proposed in the literature appear to be equivalent. Our comparison shows that the ruggedness of the empirical landscape is affected by whether the included mutations are beneficial or deleterious and by whether intra- or intergenic epistasis is involved. Finally, the empirical landscapes are compared to landscapes generated with the Rough Mt. Fuji model. Despite the simplicity of this model, it captures the features of the experimental landscapes remarkably well.

The equivalence between weak and strong purifying selection

The equivalence between weak and strong purifying selection
Benjamin H Good, Michael M Desai
(Submitted on 16 Oct 2012)

Weak purifying selection, acting on many linked mutations, may play a major role in shaping patterns of molecular evolution in natural populations. Yet efforts to infer these effects from DNA sequence data are limited by our incomplete understanding of weak selection on local genomic scales. Here, we demonstrate a natural symmetry between weak and strong selection, in which the effects of many weakly selected mutations on patterns of molecular evolution are equivalent to a smaller number of more strongly selected mutations. By introducing a coarse-grained “effective selection coefficient,” we derive an explicit mapping between weakly selected populations and their strongly selected counterparts, which allows us to make accurate and efficient predictions across the full range of selection strengths. This suggests that an effective selection coefficient and effective mutation rate — not an effective population size — is the most accurate summary of the effects of selection over locally linked regions. Moreover, this correspondence places fundamental limits on our ability to resolve the effects of weak selection from contemporary sequence data alone.

Fluctuations in fitness distributions and the effects of weak linked selection on sequence evolution

Fluctuations in fitness distributions and the effects of weak linked selection on sequence evolution

Benjamin H. Good, Michael M. Desai
(Submitted on 15 Oct 2012)

Evolutionary dynamics and patterns of molecular evolution are strongly influenced by selection on linked regions of the genome, but our quantitative understanding of these effects remains incomplete. Recent work has focused on predicting the distribution of fitness within an evolving population, and this forms the basis for several methods that leverage the fitness distribution to predict the patterns of genetic diversity when selection is strong. However, in weakly selected populations random fluctuations due to genetic drift are more severe, and neither the distribution of fitness nor the sequence diversity within the population are well understood. Here, we briefly review the motivations behind the fitness-distribution picture, and summarize the general approaches that have been used to analyze this distribution in the strong-selection regime. We then extend these approaches to the case of weak selection, by outlining a perturbative treatment of selection at a large number of linked sites. This allows us to quantify the stochastic behavior of the fitness distribution and yields exact analytical predictions for the sequence diversity and substitution rate in the limit that selection is weak.

A 454 survey of the community composition and core microbiome of the common bed bug, Cimex lectularius, reveals significant microbial community structure across an urban landscape

A 454 survey of the community composition and core microbiome of the common bed bug, Cimex lectularius, reveals significant microbial community structure across an urban landscape

Matthew Meriweather, Sara Matthews, Rita Rio, Regina S Baucom
(Submitted on 13 Oct 2012)

Elucidating the spatial dynamic and core constituents of the microbial communities found in association with arthropod hosts is of crucial importance for insects that may vector human or agricultural pathogens. The hematophagous Cimex lectularius, known as the common bed bug, has made a recent resurgence in North America, as well as worldwide, potentially owing to increased travel and resistance to insecticides. A comprehensive survey of the bed bug microbiome has not been performed to date, nor has an assessment of the spatial dynamics of its microbiome. Here we present a survey of bed bug microbial communities by amplifying the V4-V6 hypervariable region of the 16S rDNA gene region followed by 454 Titanium sequencing using 31 individuals from eight natural populations collected from residences in Cincinnati, OH. Across all samples, 97% of the microbial community is made up of two dominant OTUs identified as the \alpha-proteobacterium Wolbachia and an unnamed \gamma-proteobacterium from the Enterobacteriaceae. Microbial communities varied among host populations for measures of community diversity and exhibited significant population structure. We also uncovered a strong negative correlation in the abundance of the two dominant OTUs, suggesting they may fulfill similar roles as nutritional mutualists. This broad survey represents the most comprehensive assessment, to date, of the microbes that associate with bed bugs, and uncovers evidence for potential antagonism between the two dominant members of the bed bug microbiome.