Modes of migration and multilevel selection in evolutionary multiplayer games

Modes of migration and multilevel selection in evolutionary multiplayer games

Yuriy Pichugin, Chaitanya S. Gokhale, Julián Garcia, Arne Traulsen, Paul B. Rainey

The site frequency spectrum for general coalescents

The site frequency spectrum for general coalescents
Jeffrey P. Spence, John A. Kamm, Yun S. Song

We present an efficient method for computing the expected site frequency spectrum (SFS) for general Λ- and Ξ-coalescents. For time-homogeneous coalescents, the runtime of our algorithm is O(n^2), where n is the sample size. This is a factor of n^2 faster than the state-of-the-art method. Furthermore, in contrast to existing methods, our method generalizes to time-inhomogeneous Λ- and Ξ-coalescents with measures that factorize as Λ(dx)/ζ(t) and Ξ(dx)/ζ(t), respectively, where ζ denotes a strictly positive function of time. The runtime of our algorithm in this setting is O(n^3). We also obtain general theoretical results for the identifiability of the Λ measure when ζ is a constant function, as well as for the identifiability of the function ζ under a fixed Ξ measure.

Author post: Trees, Population Structure, F-statistics!

This guest post is by Benjamin Peter on his preprint Trees, Population Structure, F-statistics!

I began thinking about this paper more than a year ago, when Joe Pickrell and David Reich posted their perspective paper on human genetic history on biorxiv. In that paper, they presented a very critical perspective of the serial founder model, the model I happened to be working on at the time. Needless to say, my perspective on the use (and usefulness) of the model was, and still is, quite different.

Part of their argument was based on the usage of the F3-statistic, and the fact that it is negative for many human populations, indicating admixture. Now, at that time, I was familiar with the basic idea of the statistic and had convinced myself – following the algebraic argument in Patterson et al. (2012) – that it should be positive under models of no admixture. However, I still had many open questions that this paper did not answer. Why should we use F2 as a measure of genetic drift to begin with? Why does F3 have this positivity property? How robust is this to other structure models? The ‘path’-diagrams that Patterson et al. (2012) used personally did not help me, because I am not familiar with Feynman diagrams, and I did not understand how drift could have ‘opposite’ directions.

The other primary sources did not help me, partly because they are buried in supplements and repetitive. Unfortunately, I initially missed what I now find the most comprehensive resource – the Supplementary Material of Reich et al. (2009), which did not help my understanding. However at that time – early summer last year – I had a thesis to finish, and so the F-statistics left my mind.

I finished my Ph. D. in July, moved to Chicago in October 2014 and forgot about F-statistics in the meantime. When I started my postdoc, John Novembre proposed that I have a look at EEMS, a program one of Matthew’s former students, Desi Petkova, had developed to visualize migration patterns. Strikingly, Desi also used a matrix of squared difference in allele frequency, but she did so in a coalescence framework and for diploid samples, as opposed to the diffusion framework and population sample used for the F-statistics. However, the connection is immediately obvious, and it took only a few pages of algebra to figure out what is now Equation 5 in the paper; namely that F2 has a very easy interpretation under the coalescent.

This was a very useful result, and was what eventually made me decide to start writing a paper, and research the other issues I did not understand about F-statistics. It takes very little algebra (or some digging through supplementary materials) to figure out that F3 and F4 can be written in terms of F2. The interesting bit, however, is the form of these expressions – they immediately reminded me of quantities that are used in distance-based phylogenetics – the Gromov product and tree splits, and made it obvious, that the statistics should be interpreted in that context as tests of treeness, with admixture as the alternative model, and that F3 and F4 are just lengths of external and internal branches on a tree, and that the workings of the tests can be neatly explained using that phylogenetic theory.

Now, essentially a year later, I finished a version of my paper that I am comfortable with sharing. Because of my initial difficulties with the subject – and my suspicion I might not be the only one that only has a vague understanding of the statistics – I kept the first part as basic as possible, starting with how drift is measured as decay in heterozygosity, as increase in uncertainty or relatedness, then explore in depth the phylogenetic theory underlying the null model of the admixture tests, and briefly talk about the path interpretation of the admixture model. Only then I present my main result, the interpretation in terms of coalescent times and internal branch lengths, some small simulations as sanity checks and some applications and population structure models.

A big challenge has been to attribute ideas correctly, sometimes because sources were sometimes difficult to find, and sometimes because key ideas were only implicitly stated. So if parts are unclear, or if I misattributed anything, please let me know, and I am happy to fix it. Similarly, if there are parts of the manuscript that are hard to understand, please contact me, the aim of this paper is meant to serve both as an useful introduction to the topic, and to present some interesting results.

Path Weights, Networked Partial Correlations and their Application to the Analysis of Genetic Interactions

Path Weights, Networked Partial Correlations and their Application to the Analysis of Genetic Interactions
Alberto Roverato, Robert Castelo

Gene coexpression is a common feature employed in predicting buffering relationships that explain genetic interactions, which constitute an important mechanism behind the robustness of cells to genetic perturbations. The complete removal of such buffering connections impacts the entire molecular circuitry, ultimately leading to cellular death. Coexpression is commonly measured through Pearson correlation coefficients. However, Pearson correlation values are sensitive to indirect effects and often partial correlations are used instead. Yet, partial correlation values convey no information on the (linear) influence of the association within the entire multivariate system or, in other words, of the represented edge within the entire network. Jones and West (2005) showed that covariance can be decomposed into the weights of the paths that connect two variables within the corresponding undirected network. Here we provide a precise interpretation of path weights and show that, in the particular case of single-edge paths, this interpretation leads to a quantity we call networked partial correlation whose value depends on both the partial correlation between the intervening variables and their association with the rest of the multivariate system. We show that this new quantity correlates better with quantitative genetic interactions in yeast than classical coexpression measures.

Disruption of endosperm development is a major cause of hybrid seed inviability between Mimulus guttatus and M. nudatus

Disruption of endosperm development is a major cause of hybrid seed inviability between Mimulus guttatus and M. nudatus

Elen Oneal, John H. Willis, Robert Franks

A practical guide to de novo genome assembly using long reads

A practical guide to de novo genome assembly using long reads

Mahul Chakraborty, James G. Baldwin-Brown, Anthony D. Long, J.J. Emerson

Viruses are a dominant driver of protein adaptation in mammals

Viruses are a dominant driver of protein adaptation in mammals

David Enard, Le Cai, Carina Gwenapp, Dmitri A Petrov

Gene tree discordance causes apparent substitution rate variation

Gene tree discordance causes apparent substitution rate variation

Fabio K. Mendes, Matthew W. Hahn

Reconstructing Genetic History of Siberian and Northeastern European Populations

Reconstructing Genetic History of Siberian and Northeastern European Populations

Anton Valouev, Emily HM Wong, Andrey Khrunin, Larissa Nichols, Dmitry Pushkarev, Denis Khokhrin, Dmitry Verbenko, Oleg Evgrafov, James Knowles, John Novembre, Svetlana Limborska

Quantification of the effect of mutations using a global probability model of natural sequence variation

Quantification of the effect of mutations using a global probability model of natural sequence variation
Thomas A. Hopf, John B. Ingraham, Frank J. Poelwijk, Michael Springer, Chris Sander, Debora S. Marks

Modern biomedicine is challenged to predict the effects of genetic variation. Systematic functional assays of point mutants of proteins have provided valuable empirical information, but vast regions of sequence space remain unexplored. Fortunately, the mutation-selection process of natural evolution has recorded rich information in the diversity of natural protein sequences. Here, building on probabilistic models for correlated amino-acid substitutions that have been successfully applied to determine the three-dimensional structures of proteins, we present a statistical approach for quantifying the contribution of residues and their interactions to protein function, using a statistical energy, the evolutionary Hamiltonian. We find that these probability models predict the experimental effects of mutations with reasonable accuracy for a number of proteins, especially where the selective pressure is similar to the evolutionary pressure on the protein, such as antibiotics.