Complexity of evolutionary equilibria in static fitness landscapes

Complexity of evolutionary equilibria in static fitness landscapes
Artem Kaznatcheev
(Submitted on 23 Aug 2013)

A fitness landscape is a genetic space — with two genotypes adjacent if they differ in a single locus — and a fitness function. Evolutionary dynamics produce a flow on this landscape from lower fitness to higher; reaching equilibrium only if a local fitness peak is found. I use computational complexity to question the common assumption that evolution on static fitness landscapes can quickly reach a local fitness peak. I do this by showing that the popular NK model of rugged fitness landscapes is PLS-complete for K >= 2; the reduction from Weighted 2SAT is a bijection on adaptive walks, so there are NK fitness landscapes where every adaptive path from some vertices is of exponential length. Alternatively — under the standard complexity theoretic assumption that there are problems in PLS not solvable in polynomial time — this means that there are no evolutionary dynamics (known, or to be discovered, and not necessarily following adaptive paths) that can converge to a local fitness peak on all NK landscapes with K = 2. Applying results from the analysis of simplex algorithms, I show that there exist single-peaked landscapes with no reciprocal sign epistasis where the expected length of an adaptive path following strong selection weak mutation dynamics is $e^{O(n^{1/3})}$ even though an adaptive path to the optimum of length less than n is available from every vertex. The technical results are written to be accessible to mathematical biologists without a computer science background, and the biological literature is summarized for the convenience of non-biologists with the aim to open a constructive dialogue between the two disciplines.

Journal policy change: MBE will consider preprints

Molecular biology and evolution (MBE) has updated its policy to allow the submission of papers previously submitted to the arXiv:

All manuscripts published in arXiv are considered unpublished works. Manuscripts that appear on arXiv may be submitted to MBE for consideration for publication.

It is unclear whether these policies extend to other preprint sites, but presumably it may well do. It is great to see this policy change, and well done to Melissa Wilson Sayres, Antonio Marco [@amarcobio], and others for encouraging MBE to affect this change.

However, one less encouraging feature of this change is that MBE has also implemented a policy where preprint papers have to be cited as unpublished data in the text rather than as a citation appearing in the reference section. It is yet unclear how citation search engines, such as Google scholar, will interact with this form of reference. Will they register, and count, them as citations or go unnoticed?

One of the many appealing features of preprints is that they allow papers to being to be acknowledged and cited earlier. It is unclear why MBE feels that this policy is necessary, but in our view it seems counter-productive. Hopefully, this is something that can be changed, given time, and encouragement from MBE’s community.

Thoughts on MBE’s preprint citation policy

This guest post is by Graham Coop [@graham_coop] on the journal Molecular Biology and Evolution’s new preprint policy.

We had an interesting discussion via twitter on the potential reasons for MBE’s policy of not allowing a full citation of preprint articles. I thought I’d writeup some of my thoughts as shaped by that conversation.

Following on from this discussion, I thought I’d lay out some of the arguments that we discussed and my thoughts on these points. We do not know MBE’s reasoning on this, so I may have missed some obvious practical reason for this citation policy (if so, it would be great if it could be explained). Also I note that other journals may well have similar policies about preprint citations, so this is not an argument specifically against MBE. It is great that MBE is now allowing preprints, so this is a somewhat minor quibble compared to that step.

One of my main reasons for disliking this policy, other than it singling out preprints for special treatment, is that it may well disrupt how preprints accumulate citations (via tools like google scholar). I view one of the key advantages of preprints that they allow the early recognition and acknowledgement of good ideas (with bad ones being allowed to sink out of view). This is particularly important for young researchers, where preprints can potentially allow people on the job market to escape some of the randomness of how long the publication process takes. Allowing young scholars to have their work critiqued, and cited, early to me seems an important step in allowing young researchers to get a headstart in an increasingly difficult job market.

Potential arguments against treating preprint citations like any other citation:
1) Allowing full citation of preprints may lose the journal (or the authors) citations.

It is slightly hard to see the logic of (1). If I cite a preprint, which has yet to appear in a journal, then by its very nature the journal couldn’t possibly have benefited from that citation. I’m hardly going to delay my own submission/publication to wait for a paper to appear merely so I can cite it (unless I have some prior commitment to a colleague). The same argument seem to hold for the author, citations of the preprint are citations that you would not have received if you did not distribute the article early. Now, a fair concern is that journals/authors may lose citations of the published article, if after the article appears people accidentally cite the arXived paper instead of the final article. However, MBE’s system doesn’t avoid this problem, and it seems like it could be addressed simply by asking the authors to do a pubmed search for each arXived paper to avoid this oversight.

2) Another potential concern is that preprints are, by their nature, subject to change.

Preprints can be updated, so that information contained in them could change, or even be removed. However, preprint sites like arXiv (as well as peerJ and figshare) keep all previous versions of the paper, and these are clearly labeled and can be cited separately. So I can clearly indicate which version I am citing, and this citation is a permanent entry. While this information may have changed in subsequent versions, this is really no different than the fact that subsequent publications can overturn existing results. What is different with versioning of preprints is that we get to see more of this process in the open, which feels like a good thing overall.

3) Authors should acknowledge that arXived preprints have to not been through peer review.

At first sight there is more validity to this point, but I think it is also weak. As an author, and as a reviewer (and indeed as a reader), you have a responsibility to question whether a citation really supports a particular point. As an author I invest a lot of time in trying to track done the right citations and to carefully read, and test, the papers I rely heavily on. As a reviewer I regularly question authors’ use of particular citations and point them toward additional work or ask them to change the wording around a citation. Published papers are not immune from problems, any more than preprints are. If I, and the reviewers of my article, think it is appropriate for me to cite a preprint then I should be allowed to do so as I would any other article.

Also this argument seems somewhat strange; MBE already allows the normal citation of PhD theses and [potentially unpeer-reviewed] books (as pointed out by Antonio Marco). So it is really quite unclear why preprints have been singled out in this way.

All of my articles have benefited greatly from the comments of colleagues and from peer review. I also have a lot of respect for the work done by editors of various journals, including MBE. However, it is unclear to me who this policy serves. Journal policies should always be a light hand; they should ideally allow the authors freedom to fully acknowledge their sources. I see no strong argument for this policy other than it prevents the further blurring of the line between journals and preprints. In my view the only sustainable way forward for journals and scientific societies is to be innovative focal points for collating peer-review and peer-recognition. Only by adapting quickly can journals hope to stay relevant in an age where increasingly (to steal Mike Eisen’s phrase) publishing is pushing a button.

Graham Coop

Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model

Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model
Denise Kühnert, Tanja Stadler, Timothy G. Vaughan, Alexei J. Drummond
(Submitted on 23 Aug 2013)

Evolution of RNA viruses such as HIV, Hepatitis C and Influenza virus occurs so rapidly that the viruses’ genomes contain information on past ecological dynamics. The interaction of ecological and evolutionary processes demands their joint analysis. Here we adapt a birth-death-sampling model, which allows for serially sampled data and rate changes over time to estimate epidemiological parameters of the underlying population dynamics in terms of a compartmental susceptible-infected-removed (SIR) model. Our proposed approach results in a phylodynamic method that enables the joint estimation of epidemiological parameters and phylogenetic history. In contrast to standard coalescent process approaches this method provides separate information on incidence and prevalence of infections. Detailed information on the interaction of host population dynamics and evolutionary history can inform decisions on how to contain or entirely avoid disease outbreaks.
We apply our Birth-Death SIR method (BDSIR) to five human immunodeficiency virus type 1 clusters sampled in the United Kingdom (UK) between 1999 and 2003. The estimated basic reproduction ratio ranges from 1.9 to 3.2 among the clusters. Our results imply that these local epidemics arose from introduction of infected individuals into populations of between 900 and 3000 effectively susceptible individuals, albeit with wide margins of uncertainty. All clusters show a decline in the growth rate of the local epidemic in the middle or end of the 90’s. The effective reproduction ratio of cluster 1 drops below one around 1994, with the local epidemic having almost run its course by the end of the sampled period. For the other four clusters the effective reproduction ratio also decreases over time, but stays above 1. The method is implemented as a BEAST2 package.

Genome wide signals of pervasive positive selection in human evolution

Genome wide signals of pervasive positive selection in human evolution
David Enard, Philipp W. Messer, Dmitri Petrov
(Submitted on 22 Aug 2013)

The role of positive selection in human evolution remains controversial. On the one hand, scans for positive selection have identified hundreds of candidate loci and the genome-wide patterns of polymorphism show signatures consistent with frequent positive selection. On the other hand, recent studies have argued that many of the candidate loci are false positives and that most apparent genome-wide signatures of adaptation are in fact due to reduction of neutral diversity by linked recurrent deleterious mutations, known as background selection. Here we analyze human polymorphism data from the 1,000 Genomes project (Abecasis et al. 2012) and detect signatures of pervasive positive selection once we correct for the effects of background selection. We show that levels of neutral polymorphism are lower near amino acid substitutions, with the strongest reduction observed specifically near functionally consequential amino acid substitutions. Furthermore, amino acid substitutions are associated with signatures of recent adaptation that should not be generated by background selection, such as the presence of unusually long and frequent haplotypes and specific distortions in the site frequency spectrum. We use forward simulations to show that the observed signatures require a high rate of strongly adaptive substitutions in the vicinity of the amino acid changes. We further demonstrate that the observed signatures of positive selection correlate more strongly with the presence of regulatory sequences, as predicted by ENCODE (Gerstein et al. 2012), than the positions of amino acid substitutions. Our results establish that adaptation was frequent in human evolution and provide support for the hypothesis of King and Wilson (King and Wilson 1975) that adaptive divergence is primarily driven by regulatory changes.

The standing pool of genomic structural variation in a natural population of Mimulus guttatus

The standing pool of genomic structural variation in a natural population of Mimulus guttatus
Lex E. Flagel, John H. Willis, Todd J. Vision
(Submitted on 19 Aug 2013)

Major unresolved questions in evolutionary genetics include determining the contributions of different mutational sources to the total pool of genetic variation in a species, and understanding how these different forms of genetic variation interact with natural selection. Recent work has shown that structural variants (insertions, deletions, inversions and transpositions) are a major source of genetic variation, often out-numbering single nucleotide variants in terms of total bases affected. Despite the near ubiquity of structural variants, major questions about their interaction with natural selection remain. For example, how does the allele frequency spectrum of structural variants differ when compared to single nucleotide variants? How often do structural variants affect genes, and what are the consequences? To begin to address these questions, we have systematically identified and characterized a large set submicroscopic insertion and deletion (indel) variants (between 1 kb to 200 kb in length) among ten individuals from a single natural population of the plant species Mimulus guttatus. After extensive computational filtering, we focused on a set of 4,142 high-confidence indels that showed an experimental validation rate of 73%. All but one of these indels were < 200 kb. While the largest were generally at lower frequencies in the population, a surprising number of large indels are at intermediate frequencies. While indels overlapping with genes were much rarer than expected by chance, nearly 600 genes were affected by an indel. NBS-LRR defense response genes were the most enriched among the gene families affected. Most indels associated with genes were rare and appeared to be under purifying selection, though we do find four high-frequency derived insertion alleles that show signatures of recent positive selection.

Gene and Gene-Set Analysis for Genome-Wide Association Studies

Gene and Gene-Set Analysis for Genome-Wide Association Studies
Inti Pedroso
(Submitted on 19 Aug 2013)

Genome-wide association studies (GWAS) have identified hundreds of loci at very stringent levels of statistical significance across many different human traits. However, it is now clear that very large samples (n~10^4-10^5) are needed to find the majority of genetic variants underlying risk for most human diseases. Therefore, the field has engaged itself in a race to increase study sample sizes with some studies yielding very successful results but also studies which provide little or no new insights. This project started early on in this new wave of studies and I decided to use an alternative approach that uses prior biological knowledge to improve both interpretation and power of GWAS. The project aimed to a) implement and develop new gene-based methods to derive gene-level statistics to use GWAS in well established system biology tools; b) use of these gene-level statistics in networks and gene-set analyses of GWAS data; c) mine GWAS of neuropsychiatric disorders using gene, gene-sets and integrative biology analyses with gene-expression studies; and d) explore the ability of these methods to improve the analysis GWAS on disease sub-phenotypes which usually suffer of very small sample sizes.

Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms

Sailfish: Alignment-free Isoform Quantification from RNA-seq Reads using Lightweight Algorithms
Rob Patro (1), Stephen M. Mount (2), Carl Kingsford (1) ((1) Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, (2) Department of Cell Biology and Molecular Genetics and Center for Bioinformatics and Computational Biology, University of Maryland)
(Submitted on 16 Aug 2013)

RNA-seq has rapidly become the de facto technique to measure gene expression. However, the time required for analysis has not kept up with the pace of data generation. Here we introduce Sailfish, a novel computational method for quantifying the abundance of previously annotated RNA isoforms from RNA-seq data. Sailfish entirely avoids mapping reads, which is a time-consuming step in all current methods. Sailfish provides quantification estimates much faster than existing approaches (typically 20-times faster) without loss of accuracy.

Realistic simulations reveal extensive sample-specificity of RNA-seq biases

Realistic simulations reveal extensive sample-specificity of RNA-seq biases
Botond Sipos, Greg Slodkowicz, Tim Massingham, Nick Goldman
(Submitted on 14 Aug 2013)

In line with the importance of RNA-seq, the bioinformatics community has produced numerous data analysis tools incorporating methods to correct sample-specific biases. However, few advanced simulation tools exist to enable benchmarking of competing correction methods. We introduce the first framework to reproduce the properties of individual RNA-seq runs and, by applying it on several datasets, we demonstrate the importance of accounting for sample-specificity in realistic simulations.

On the sympatric evolution of coexistence by relative nonlinearity of competition

On the sympatric evolution of coexistence by relative nonlinearity of competition
Florian Hartig, Tamara Münkemüller, Karin Johst, Ulf Dieckmann
(Submitted on 14 Aug 2013)

If two species show different nonlinear responses to a single shared resource, and if each species modifies resource dynamics such that it favors its competitor, they may stably coexist. While the mechanism behind this phenomenon, known as relative nonlinearity of competition, is well understood, less is known about its evolutionary properties and its prevalence in real communities. We address this challenge by using the adaptive dynamics framework as well as individual-based simulations to compare dynamic and evolutionary stability of communities coexisting through relative nonlinearity. Evolution operates on the species’ density compensation strategies, and a trade-off between growth at high versus low resource availability (population density) is assumed. We confirm previous findings that, irrespective of the particular model of density-dependence, there are usually broad ranges of coexistence between overcompensating and undercompensating density-compensation strategies. We show that most of these strategies, however, are not evolutionarily stable and will be outcompeted by a single compensatory strategy. Only very specific evolutionary trade-offs allow evolutionary stability of strategies that coexist through relative nonlinearity. As we find no reason why these particular trade-offs should be abundant in nature, we conclude that sympatric evolution of relative nonlinearity seems possible, but rather unlikely. We speculate that this may explain why relative nonlinearity has seldom been observed, although we note that a low probability of sympatric evolution does not exclude the possibility that this mechanism of coexistence might still frequently occur when species with different evolutionary histories meet in the same community. Our results highlight the need for combining ecological and evolutionary perspectives for understanding community assembly and biogeographical patterns.