A tug-of-war between driver and passenger mutations in cancer and other adaptive processes

A tug-of-war between driver and passenger mutations in cancer and other adaptive processes

Christopher McFarland, Leonid Mirny, Kirill S. Korolev
(Submitted on 25 Feb 2014)

Cancer progression is an example of a rapid adaptive process where evolving new traits is essential for survival and requires a high mutation rate. Precancerous cells acquire a few key mutations that drive rapid population growth and carcinogenesis. Cancer genomics demonstrates that these few ‘driver’ mutations occur alongside thousands of random ‘passenger’ mutations-a natural consequence of cancer’s elevated mutation rate. Some passengers can be deleterious to cancer cells, yet have been largely ignored in cancer research. In population genetics, however, the accumulation of mildly deleterious mutations has been shown to cause population meltdown. Here we develop a stochastic population model where beneficial drivers engage in a tug-of-war with frequent mildly deleterious passengers. These passengers present a barrier to cancer progression that is described by a critical population size, below which most lesions fail to progress, and a critical mutation rate, above which cancers meltdown. We find support for the model in cancer age-incidence and cancer genomics data that also allow us to estimate the fitness advantage of drivers and fitness costs of passengers. We identify two regimes of adaptive evolutionary dynamics and use these regimes to rationalize successes and failures of different treatment strategies. We find that a tumor’s load of deleterious passengers can explain previously paradoxical treatment outcomes and suggest that it could potentially serve as a biomarker of response to mutagenic therapies. Collective deleterious effect of passengers is currently an unexploited therapeutic target. We discuss how their effects might be exacerbated by both current and future therapies.

Implications of uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al

Implications of uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al

Jamie R. Oaks, Charles W. Linkem, Jeet Sukumaran
(Submitted on 26 Feb 2014)

Biogeographers often seek to explain speciation on geographical phenomena. Establishing that a set of population splitting events occurred at the same time can be a persuasive argument that a set of taxa were affected by the same geographic events. Huang et al. (2011) introduced an approximate Bayesian approach (implemented in the software msBayes) to estimate the probabilities of models in which multiple sets of taxa diverge simultaneously. Oaks et al. (2013) used this model-choice framework to study 22 pairs of vertebrates distributed across the Philippines; they also studied the behavior of the approach using simulations. Oaks et al. (2013) found the model was very sensitive to the prior and had low power to detect variation in divergences times. This was not surprising in light of a rich statistical literature showing the marginal likelihood of a model is sensitive to vague priors. Because this sensitivity to prior assumptions affects the crucial insights a researcher who employs msBayes seeks to gain, Oaks et al. (2013) recommended users of the approach carefully assess the robustness of their conclusions to different priors. According to Hickerson et al. (2014), the lack of robustness was due to broad priors leading to inadequate numbers of simulations. They proposed a model-averaging approach using narrow, empirically informed uniform priors. Here, we demonstrate their approach is dangerous in the sense that the empirically-derived priors often exclude the true values of the parameters. We question the value of adopting an empirical-Bayesian stance for this problem, because it can mislead model posterior probabilities. The robust approach of conducting analyses under a variety of priors can reveal sensitivity and communicate assumptions underlying inference. Furthermore, simulations provide insight into the temporal resolution of the method and guide interpretation of results.

Approaching allelic probabilities and Genome-Wide Association Studies from beta distributions

Approaching allelic probabilities and Genome-Wide Association Studies from beta distributions

José Santiago García-Cremades, Angel del Río, José A. García, Javier Gayán, Antonio González-Pérez, Agustín Ruiz, O. Sotolongo-Grau, Manuel Ruiz-Marín
(Submitted on 25 Feb 2014)

In this paper we have proposed a model for the distribution of allelic probabilities for generating populations as reliably as possible. Our objective was to develop such a model which would allow simulating allelic probabilities with different observed truncation and de- gree of noise. In addition, we have also introduced here a complete new approach to analyze a genome-wide association study (GWAS) dataset, starting from a new test of association with a statistical distribution and two effect sizes of each genotype. The new methodologi- cal approach was applied to a real data set together with a Monte Carlo experiment which showed the power performance of our new method. Finally, we compared the new method based on beta distribution with the conventional method (based on Chi-Squared distribu- tion) using the agreement Kappa index and a principal component analysis (PCA). Both the analyses show found differences existed between both the approaches while selecting the single nucleotide polymorphisms (SNPs) in association.

Neanderthals had our de novo genes.

Neanderthals had our de novo genes

John Stewart Taylor

In 2009 Knowles and McLysaght reported the discovery of three human genes derived from non-coding DNA. They provided evidence that these genes, CLUU1, C22orf45, and DNAH10OS, were transcribed and translated, they identified orthologous non-coding DNA in chimpanzee (Pan troglodytes) and macaque (Macaca mulatta), and for each gene they located the critical ?enabler? mutations that extended the open reading frames (ORFs) allowing the production of a protein. These genes had no BLASTp hits in any other genome and were considered to be novel human genes, possibly responsible for human-specific traits. Since the discovery of these genes, new high quality Denisovan and Neanderthal genomes have been reported. I used these resources in an effort to determine whether or not CLUU1, C22orf45, and DNAH10OS were truly human-specific.

Genetic drift suppresses bacterial conjugation in spatially structured populations

Genetic drift suppresses bacterial conjugation in spatially structured populations

Peter D. Freese, Kirill S. Korolev, Jose I. Jimenez, Irene A. Chen
(Submitted on 24 Feb 2014)

Conjugation is the primary mechanism of horizontal gene transfer that spreads antibiotic resistance among bacteria. Although conjugation normally occurs in surface-associated growth (e.g., biofilms), it has been traditionally studied in well-mixed liquid cultures lacking spatial structure, which is known to affect many evolutionary and ecological processes. Here we visualize spatial patterns of gene transfer mediated by F plasmid conjugation in a colony of Escherichia coli growing on solid agar, and we develop a quantitative understanding by spatial extension of traditional mass-action models. We found that spatial structure suppresses conjugation in surface-associated growth because strong genetic drift leads to spatial isolation of donor and recipient cells, restricting conjugation to rare boundaries between donor and recipient strains. These results suggest that ecological strategies, such as enforcement of spatial structure and enhancement of genetic drift, could complement molecular strategies in slowing the spread of antibiotic resistance genes.

An Improved Approximate-Bayesian Model-choice Method for Estimating Shared Evolutionary History

An Improved Approximate-Bayesian Model-choice Method for Estimating Shared Evolutionary History

Jamie R. Oaks
(Submitted on 25 Feb 2014)

To understand the processes that generate biodiversity, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes. By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa. The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency of the model to incorrectly estimate biogeographically interesting models with strong support.

Genetic drift opposes mutualism during spatial population expansion

Genetic drift opposes mutualism during spatial population expansion

Melanie JI Muller, Beverly I Neugeboren, David R Nelson, Andrew W Murray
(Submitted on 24 Feb 2014)

Mutualistic interactions benefit both partners, promoting coexistence and genetic diversity. Spatial structure can promote cooperation, but spatial expansions may also make it hard for mutualistic partners to stay together, since genetic drift at the expansion front creates regions of low genetic and species diversity. To explore the antagonism between mutualism and genetic drift, we grew cross-feeding strains of the budding yeast S. cerevisiae on agar surfaces as a model for mutualists undergoing spatial expansions. By supplying varying amounts of the exchanged nutrients, we tuned strength and symmetry of the mutualistic interaction. Strong mutualism suppresses genetic demixing during spatial expansions and thereby maintains diversity, but weak or asymmetric mutualism is overwhelmed by genetic drift even when mutualism is still beneficial, slowing growth and reducing diversity. Theoretical modeling using experimentally measured parameters predicts the size of demixed regions and how strong mutualism must be to survive a spatial expansion.

Strong selective sweeps associated with ampliconic regions in great ape X chromosomes

Strong selective sweeps associated with ampliconic regions in great ape X chromosomes

Kiwoong Nam, Kasper Munch, Asger Hobolth, Julien Y. Dutheil, Krishna Veeramah, August Woerner, Michael F. Hammer, Great Ape Genome Diversity Project, Thomas Mailund, Mikkel H. Schierup
(Submitted on 24 Feb 2014)

The unique inheritance pattern of X chromosomes makes them preferential targets of adaptive evolution. We here investigate natural selection on the X chromosome in all species of great apes. We find that diversity is more strongly reduced around genes on the X compared with autosomes, and that a higher proportion of substitutions results from positive selection. Strikingly, the X exhibits several megabase long regions where diversity is reduced more than five fold. These regions overlap significantly among species, and have a higher singleton proportion, population differentiation, and nonsynonymous to synonymous substitution ratio. We rule out background selection and soft selective sweeps as explanations for these observations, and conclude that several strong selective sweeps have occurred independently in similar regions in several species. Since these regions are strongly associated with ampliconic sequences we propose that intra-genomic conflict between the X and the Y chromosomes is a major driver of X chromosome evolution.

Author post: Genome scans for detecting footprints of local adaptation using a Bayesian factor model

This guest post is by Michael Blum, Eric Bazin, and Nicolas Duforet-Frebourg on their preprint Genome scans for detecting footprints of local adaptation using a Bayesian factor model, available from the arXiv here.

Finding genomic regions subject to local adaptation is a central part of population genomics, which is based on genotyping numerous molecular markers and looking for outlier loci. Most common approaches use measures of genetic differentiation such as Fst. There are many software implementing genome scans based on statistics related to Fst (BayeScan, DetSel, FDist2 , Lositan), and they contribute to the popularity of this approach in population genomics.

However, there are different statistical and computational problems that may arise with approaches based on Fst or related measures. The first problem arises because methods related to Fst assume the so-called F-model, which corresponds to a particular covariance structure for gene frequencies among populations (Bierne et al. 2013). When spatial structure departs from the assumption of the F-model, it can generate many false positives. A second potential problem concerns the computational burden of some Bayesian approaches, which can become an obstacle with large number of SNPs. The last problem is that individuals should be grouped into populations in advance whereas working at the scale of individuals is desirable because it avoids defining populations.

Using a Bayesian factor model, we address the three aforementioned problems. Factor models capture population structure by inferring latent variables called factors. Factor models have already been proposed to ascertain population structure (Engelhardt and Stephens 2010). Here we extend the framework of factor model in order to identify outlier loci in addition to the ascertainment of population structure. Our approach is not the first one to account for deviations to the assumptions of the F-model (Bonhomme et al. 2010, Günther and Coop 2013) but it does not require to define populations by contrast to the previous approaches. Using simulations, we show that factor model can achieve a 2-fold or more reduction of false discovery rate compared to the Fst-related approaches. We also analyze the HGDP human dataset to provide an example of how factor models can be used to detect local adaptation with a large number of SNPs. The Bayesian factor model is implemented in the PCAdapt software and we would be happy to answer to comments or questions regarding the software.

To explain why the factor model generates less false discoveries, we can introduce the notions of mechanistic and phenomenological models. Mechanistic models aim to mimic the biological processes that are thought to have given rise to the data whereas phenomenological models seek only to best describe the data using a statistical model. In the spectrum between mechanistic and phenomenological model, the F-model would stand close to mechanistic models whereas factor models would be closer to the phenomenological ones. Mechanistic models are appealing because they provide quantitative measures that can be related to biologically meaningful parameters. For instance, the parameters of the F-model measures genetic drift that can be related to migration rates, divergence times or population sizes. By contrast, phenomenological models work with mathematical abstractions such as latent factors that can be difficult to interpret biologically. The downside of mechanistic models is that violation of the modeling assumption can invalidate the proposed framework and generate many false discoveries in the context of selection scan. The F-model assumes a particular covariance matrix between populations which is found with star-like population trees for instance. However, more complex models of population structure can arise for various reasons including non-instantaneous divergence or isolation-by-distance, and they will violate the mechanistic assumptions and make phenomenological models preferable.

Michael Blum, Eric Bazin, and Nicolas Duforet-Frebourg

Genetic Analysis of Transformed Phenotypes

Genetic Analysis of Transformed Phenotypes

Nicolo Fusi, Christoph Lippert, Neil D. Lawrence, Oliver Stegle
(Submitted on 21 Feb 2014)

Linear mixed models (LMMs) are a powerful and established tool for studying the genetics of phenotypic variation. A limiting assumption of LMMs is that the phenotype is Gaussian distributed under the model, a requirement that rarely holds in practice. Since violations of this assumption can lead to false conclusions and losses in power, it’s common practice to pre-process the phenotypic values, for instance by applying logarithmic transformations. Unfortunately, these are not appropriate in every situation, and choosing a “good” transformation is in general challenging and subjective. Here, we present an extension of the LMM that estimates an optimal transformation from the data. We show in extensive simulations and real data from human, mouse and yeast that application of these optimal transformations leads to increased power in genome-wide association studies and higher accuracy in heritability estimates and phenotype predictions.