A genomic map of the effects of linked selection in Drosophila

A genomic map of the effects of linked selection in Drosophila

Eyal Elyashiv, Shmuel Sattath, Tina T. Hu, Alon Strustovsky, Graham McVicker, Peter Andolfatto, Graham Coop, Guy Sella
(Submitted on 23 Aug 2014)

Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of ‘linked selection’ on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of linked selection from non-classic sweeps. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.

Robust Population Structure Inference and Correction in the Presence of Known or Cryptic Relatedness

Robust Population Structure Inference and Correction in the Presence of Known or Cryptic Relatedness

Matthew P Conomos, Michael B Miller, Timothy A Thornton
doi: http://dx.doi.org/10.1101/008276

Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multi-dimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using ten (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.

The impact of macroscopic epistasis on long-term evolutionary dynamics

The impact of macroscopic epistasis on long-term evolutionary dynamics

Benjamin H. Good, Michael M. Desai
(Submitted on 18 Aug 2014)

Genetic interactions can strongly influence the fitness effects of individual mutations, yet the impact of these epistatic interactions on evolutionary dynamics remains poorly understood. Here we investigate the evolutionary role of epistasis over 50,000 generations in a well-studied laboratory evolution experiment in E. coli. The extensive duration of this experiment provides a unique window into the effects of epistasis during long-term adaptation to a constant environment. Guided by analytical results in the weak-mutation limit, we develop a computational framework to assess the compatibility of a given epistatic model with the observed patterns of fitness gain and mutation accumulation through time. We find that the average fitness trajectory alone provides little power to distinguish between competing models, including those that lack any direct epistatic interactions between mutations. However, when combined with the mutation trajectory, these observables place strong constraints on the set of possible models of epistasis, ruling out most existing explanations of the data. Instead, we find the strongest support for a “two-epoch” model of adaptation, in which an initial burst of diminishing returns epistasis is followed by a steady accumulation of mutations under a constant distribution of fitness effects. Our results highlight the need for additional DNA sequencing of these populations, as well as for more sophisticated models of epistasis that are compatible with all of the experimental data.

Understanding Admixture Fractions

Understanding Admixture Fractions

Mason Liang, Rasmus Nielsen
doi: http://dx.doi.org/10.1101/008078

Estimation of admixture fractions has become one of the most commonly used computational tools in population genomics. However, there is remarkably little population genetic theory on their statistical properties. We develop theoretical results that can accurately predict means and variances of admixture proportions within a population using models with recombination and genetic drift. Based on established theory on measures of multilocus disequilibrium, we show that there is a set of recurrence relations that can be used to derive expectations for higher moments of the admixture fraction distribution. We obtain closed form solutions for some special cases. Using these results, we develop a method for estimating admixture parameters from estimated admixture proportion obtained from programs such as Structure or Admixture. We apply this method to HapMap data and find that the population history of African Americans, as expected, is not best explained by a single admixture event between people of European and African ancestry. A model of constant gene flow for the past 11 generations until 2 generations ago gives a better fit.

inPHAP: Interactive visualization of genotype and phased haplotype data

inPHAP: Interactive visualization of genotype and phased haplotype data
Günter Jäger, Alexander Peltzer, Kay Nieselt
Comments: BioVis 2014 conference
Subjects: Graphics (cs.GR); Genomics (q-bio.GN)

Background: To understand individual genomes it is necessary to look at the variations that lead to changes in phenotype and possibly to disease. However, genotype information alone is often not sufficient and additional knowledge regarding the phase of the variation is needed to make correct interpretations. Interactive visualizations, that allow the user to explore the data in various ways, can be of great assistance in the process of making well informed decisions. But, currently there is a lack for visualizations that are able to deal with phased haplotype data. Results: We present inPHAP, an interactive visualization tool for genotype and phased haplotype data. inPHAP features a variety of interaction possibilities such as zooming, sorting, filtering and aggregation of rows in order to explore patterns hidden in large genetic data sets. As a proof of concept, we apply inPHAP to the phased haplotype data set of Phase 1 of the 1000 Genomes Project. Thereby, inPHAP’s ability to show genetic variations on the population as well as on the individuals level is demonstrated for several disease related loci. Conclusions: As of today, inPHAP is the only visual analytical tool that allows the user to explore unphased and phased haplotype data interactively. Due to its highly scalable design, inPHAP can be applied to large datasets with up to 100 GB of data, enabling users to visualize even large scale input data. inPHAP closes the gap between common visualization tools for unphased genotype data and introduces several new features, such as the visualization of phased data.

Improved genome inference in the MHC using a population reference graph

Improved genome inference in the MHC using a population reference graph
Alexander Dilthey, Charles J Cox, Zamin Iqbal, Matthew R Nelson, Gil McVean

In humans and many other species, while much is known about the extent and structure of genetic variation, such information is typically not used in assembling novel genomes. Rather, a single reference is used against which to map reads, which can lead to poor characterisation of regions of high sequence or structural diversity. Here, we introduce a population reference graph, which combines multiple reference sequences as well as catalogues of SNPs and short indels. The genomes of novel samples are reconstructed as paths through the graph using an efficient hidden Markov Model, allowing for recombination between different haplotypes and variants. By applying the method to the 4.5Mb extended MHC region on chromosome 6, combining eight assembled haplotypes, sequences of known classical HLA alleles and 87,640 SNP variants from the 1000 Genomes Project, we demonstrate, using simulations, SNP genotyping, short-read and long-read data, how the method improves the accuracy of genome inference. Moreover, the analysis reveals regions where the current set of reference sequences is substantially incomplete, particularly within the Class II region, indicating the need for continued development of reference-quality genome sequences.

Bayesian Coalescent Epidemic Inference: Comparison of Stochastic and Deterministic SIR Population Dynamics


Bayesian Coalescent Epidemic Inference: Comparison of Stochastic and Deterministic SIR Population Dynamics

Alex Popinga, Tim Vaughan, Tanja Stadler, Alexei Drummond
Comments: Submitted
Subjects: Populations and Evolution (q-bio.PE)

Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingmans coalescent as well as from birth death branching processes. The development of alternative approaches merits investigation of their characteristics and differences. Here we use recently described coalescent theory for epidemic dynamics to develop stochastic and deterministic coalescent SIR tree priors. We implement these in a Bayesian phylogenetic inference framework to permit joint estimation of SIR epidemic parameters and the sample genealogy. We assess the models performance and contrast results obtained with a recently published birth death sampling model for epidemic inference. Comparisons are made by analyzing sets of genealogies simulated under precisely known epidemiological parameters. We also compare results of analyses using published HIV1 sequence data obtained from known UK infection clusters. We show that the coalescent SIR model is effective at estimating epidemiological parameters from data with large fundamental reproductive number R0 and large population size S0. We find that the stochastic variant generally outperforms its deterministic counterpart. However, each of these Bayesian estimators are shown to have undesirable properties in certain circumstances, especially for epidemic outbreaks with R0 close to one or with small susceptible populations.