Fluctuating selection models and McDonald-Kreitman type analyses

Fluctuating selection models and McDonald-Kreitman type analyses
Toni I. Gossmann, David Waxman, Adam Eyre-Walker
(Submitted on 25 Aug 2013)

It is likely that the strength of selection acting upon a mutation varies through time due to changes in the environment. However, most population genetic theory assumes that the strength of selection remains constant. Here we investigate the consequences of fluctuating selection pressures on the quantification of adaptive evolution using McDonald-Kreitman (MK) style approaches. In agreement with previous work, we show that fluctuating selection can generate evidence of adaptive evolution even when the expected strength of selection on a mutation is zero. However, we also find that the mutations, which contribute to both polymorphism and divergence tend, on average, to be positively selected during their lifetime, under fluctuating selection models. This is because mutations that fluctuate, by chance, to positive selected values, tend to reach higher frequencies in the population than those that fluctuate towards negative values. Hence the evidence of positive adaptive evolution detected under a fluctuating selection model by MK type approaches is genuine since fixed mutations tend to be advantageous on average during their lifetime. Never-the-less we show that methods tend to underestimate the rate of adaptive evolution when selection fluctuates.

Target capture and massively parallel sequencing of ultraconserved elements (UCEs) for comparative studies at shallow evolutionary time scales

Target capture and massively parallel sequencing of ultraconserved elements (UCEs) for comparative studies at shallow evolutionary time scales
Brian Tilston Smith, Michael G. Harvey, Brant C. Faircloth, Travis C. Glenn, Robb T. Brumfield
(Submitted on 24 Aug 2013)

Comparative genetic studies of non-model organisms are transforming rapidly due to major advances in sequencing technology. A limiting factor in these studies has been the identification and screening of orthologous loci across an evolutionarily distant set of taxa. Here, we evaluate the efficacy of genomic markers targeting ultraconserved DNA elements (UCEs) for analyses at shallow evolutionary timescales. Using sequence capture and massively parallel sequencing to generate UCE data for five co-distributed Neotropical rainforest bird species, we recovered 776-1,516 UCE loci across the five species. Across species, 53-77 percent of the loci were polymorphic, containing between 2.0 and 3.2 variable sites per polymorphic locus, on average. We performed species tree construction, coalescent modeling, and species delimitation, and we found that the five co-distributed species exhibited discordant phylogeographic histories. We also found that species trees and divergence times estimated from UCEs were similar to those obtained from mtDNA. The species that inhabit the understory had older divergence times across barriers, contained a higher number of cryptic species, and exhibited larger effective population sizes relative to species inhabiting the canopy. Because orthologous UCEs can be obtained from a wide array of taxa, are polymorphic at shallow evolutionary time scales, and can be generated rapidly at low cost, they are effective genetic markers for studies investigating evolutionary patterns and processes at shallow time scales.

A network approach to analyzing highly recombinant malaria parasite genes

A network approach to analyzing highly recombinant malaria parasite genes
Daniel B. Larremore, Aaron Clauset, Caroline O. Buckee
(Submitted on 23 Aug 2013)

The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-{\alpha} (DBL{\alpha}) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBL{\alpha} classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.

Complexity of evolutionary equilibria in static fitness landscapes

Complexity of evolutionary equilibria in static fitness landscapes
Artem Kaznatcheev
(Submitted on 23 Aug 2013)

A fitness landscape is a genetic space — with two genotypes adjacent if they differ in a single locus — and a fitness function. Evolutionary dynamics produce a flow on this landscape from lower fitness to higher; reaching equilibrium only if a local fitness peak is found. I use computational complexity to question the common assumption that evolution on static fitness landscapes can quickly reach a local fitness peak. I do this by showing that the popular NK model of rugged fitness landscapes is PLS-complete for K >= 2; the reduction from Weighted 2SAT is a bijection on adaptive walks, so there are NK fitness landscapes where every adaptive path from some vertices is of exponential length. Alternatively — under the standard complexity theoretic assumption that there are problems in PLS not solvable in polynomial time — this means that there are no evolutionary dynamics (known, or to be discovered, and not necessarily following adaptive paths) that can converge to a local fitness peak on all NK landscapes with K = 2. Applying results from the analysis of simplex algorithms, I show that there exist single-peaked landscapes with no reciprocal sign epistasis where the expected length of an adaptive path following strong selection weak mutation dynamics is $e^{O(n^{1/3})}$ even though an adaptive path to the optimum of length less than n is available from every vertex. The technical results are written to be accessible to mathematical biologists without a computer science background, and the biological literature is summarized for the convenience of non-biologists with the aim to open a constructive dialogue between the two disciplines.

Journal policy change: MBE will consider preprints

Molecular biology and evolution (MBE) has updated its policy to allow the submission of papers previously submitted to the arXiv:

All manuscripts published in arXiv are considered unpublished works. Manuscripts that appear on arXiv may be submitted to MBE for consideration for publication.

It is unclear whether these policies extend to other preprint sites, but presumably it may well do. It is great to see this policy change, and well done to Melissa Wilson Sayres, Antonio Marco [@amarcobio], and others for encouraging MBE to affect this change.

However, one less encouraging feature of this change is that MBE has also implemented a policy where preprint papers have to be cited as unpublished data in the text rather than as a citation appearing in the reference section. It is yet unclear how citation search engines, such as Google scholar, will interact with this form of reference. Will they register, and count, them as citations or go unnoticed?

One of the many appealing features of preprints is that they allow papers to being to be acknowledged and cited earlier. It is unclear why MBE feels that this policy is necessary, but in our view it seems counter-productive. Hopefully, this is something that can be changed, given time, and encouragement from MBE’s community.

Thoughts on MBE’s preprint citation policy

This guest post is by Graham Coop [@graham_coop] on the journal Molecular Biology and Evolution’s new preprint policy.

We had an interesting discussion via twitter on the potential reasons for MBE’s policy of not allowing a full citation of preprint articles. I thought I’d writeup some of my thoughts as shaped by that conversation.

Following on from this discussion, I thought I’d lay out some of the arguments that we discussed and my thoughts on these points. We do not know MBE’s reasoning on this, so I may have missed some obvious practical reason for this citation policy (if so, it would be great if it could be explained). Also I note that other journals may well have similar policies about preprint citations, so this is not an argument specifically against MBE. It is great that MBE is now allowing preprints, so this is a somewhat minor quibble compared to that step.

One of my main reasons for disliking this policy, other than it singling out preprints for special treatment, is that it may well disrupt how preprints accumulate citations (via tools like google scholar). I view one of the key advantages of preprints that they allow the early recognition and acknowledgement of good ideas (with bad ones being allowed to sink out of view). This is particularly important for young researchers, where preprints can potentially allow people on the job market to escape some of the randomness of how long the publication process takes. Allowing young scholars to have their work critiqued, and cited, early to me seems an important step in allowing young researchers to get a headstart in an increasingly difficult job market.

Potential arguments against treating preprint citations like any other citation:
1) Allowing full citation of preprints may lose the journal (or the authors) citations.

It is slightly hard to see the logic of (1). If I cite a preprint, which has yet to appear in a journal, then by its very nature the journal couldn’t possibly have benefited from that citation. I’m hardly going to delay my own submission/publication to wait for a paper to appear merely so I can cite it (unless I have some prior commitment to a colleague). The same argument seem to hold for the author, citations of the preprint are citations that you would not have received if you did not distribute the article early. Now, a fair concern is that journals/authors may lose citations of the published article, if after the article appears people accidentally cite the arXived paper instead of the final article. However, MBE’s system doesn’t avoid this problem, and it seems like it could be addressed simply by asking the authors to do a pubmed search for each arXived paper to avoid this oversight.

2) Another potential concern is that preprints are, by their nature, subject to change.

Preprints can be updated, so that information contained in them could change, or even be removed. However, preprint sites like arXiv (as well as peerJ and figshare) keep all previous versions of the paper, and these are clearly labeled and can be cited separately. So I can clearly indicate which version I am citing, and this citation is a permanent entry. While this information may have changed in subsequent versions, this is really no different than the fact that subsequent publications can overturn existing results. What is different with versioning of preprints is that we get to see more of this process in the open, which feels like a good thing overall.

3) Authors should acknowledge that arXived preprints have to not been through peer review.

At first sight there is more validity to this point, but I think it is also weak. As an author, and as a reviewer (and indeed as a reader), you have a responsibility to question whether a citation really supports a particular point. As an author I invest a lot of time in trying to track done the right citations and to carefully read, and test, the papers I rely heavily on. As a reviewer I regularly question authors’ use of particular citations and point them toward additional work or ask them to change the wording around a citation. Published papers are not immune from problems, any more than preprints are. If I, and the reviewers of my article, think it is appropriate for me to cite a preprint then I should be allowed to do so as I would any other article.

Also this argument seems somewhat strange; MBE already allows the normal citation of PhD theses and [potentially unpeer-reviewed] books (as pointed out by Antonio Marco). So it is really quite unclear why preprints have been singled out in this way.

All of my articles have benefited greatly from the comments of colleagues and from peer review. I also have a lot of respect for the work done by editors of various journals, including MBE. However, it is unclear to me who this policy serves. Journal policies should always be a light hand; they should ideally allow the authors freedom to fully acknowledge their sources. I see no strong argument for this policy other than it prevents the further blurring of the line between journals and preprints. In my view the only sustainable way forward for journals and scientific societies is to be innovative focal points for collating peer-review and peer-recognition. Only by adapting quickly can journals hope to stay relevant in an age where increasingly (to steal Mike Eisen’s phrase) publishing is pushing a button.

Graham Coop

Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model

Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model
Denise K├╝hnert, Tanja Stadler, Timothy G. Vaughan, Alexei J. Drummond
(Submitted on 23 Aug 2013)

Evolution of RNA viruses such as HIV, Hepatitis C and Influenza virus occurs so rapidly that the viruses’ genomes contain information on past ecological dynamics. The interaction of ecological and evolutionary processes demands their joint analysis. Here we adapt a birth-death-sampling model, which allows for serially sampled data and rate changes over time to estimate epidemiological parameters of the underlying population dynamics in terms of a compartmental susceptible-infected-removed (SIR) model. Our proposed approach results in a phylodynamic method that enables the joint estimation of epidemiological parameters and phylogenetic history. In contrast to standard coalescent process approaches this method provides separate information on incidence and prevalence of infections. Detailed information on the interaction of host population dynamics and evolutionary history can inform decisions on how to contain or entirely avoid disease outbreaks.
We apply our Birth-Death SIR method (BDSIR) to five human immunodeficiency virus type 1 clusters sampled in the United Kingdom (UK) between 1999 and 2003. The estimated basic reproduction ratio ranges from 1.9 to 3.2 among the clusters. Our results imply that these local epidemics arose from introduction of infected individuals into populations of between 900 and 3000 effectively susceptible individuals, albeit with wide margins of uncertainty. All clusters show a decline in the growth rate of the local epidemic in the middle or end of the 90’s. The effective reproduction ratio of cluster 1 drops below one around 1994, with the local epidemic having almost run its course by the end of the sampled period. For the other four clusters the effective reproduction ratio also decreases over time, but stays above 1. The method is implemented as a BEAST2 package.