There are no caterpillars in a wicked forest

There are no caterpillars in a wicked forest
James H. Degnan, John A. Rhodes

Species trees represent the historical divergences of populations or species, while gene trees trace the ancestry of individual gene copies sampled within those populations. In cases involving rapid speciation, gene trees with topologies that differ from that of the species tree can be most probable under the standard multispecies coalescent model, making species tree inference more difficult. Such anomalous gene trees are not well understood except for some small cases. In this work, we establish one constraint that applies to trees of any size: gene trees with “caterpillar” topologies cannot be anomalous. The proof of this involves a new combinatorial object, called a population history, which keeps track of the number of coalescent events in each ancestral population.

Estimating Reproducibility in Genome-Wide Association Studies

Estimating Reproducibility in Genome-Wide Association Studies
Wei Jiang, Jing-Hao Xue, Weichuan Yu

Genome-wide association studies (GWAS) are widely used to discover genetic variants associated with diseases. To control false positives, all findings from GWAS need to be verified with additional evidences, even for associations discovered from a high power study. Replication study is a common verification method by using independent samples. An association is regarded as true positive with a high confidence when it can be identified in both primary study and replication study. Currently, there is no systematic study on the behavior of positives in the replication study when the positive results of primary study are considered as the prior information.
In this paper, two probabilistic measures named Reproducibility Rate (RR) and False Irreproducibility Rate (FIR) are proposed to quantitatively describe the behavior of primary positive associations (i.e. positive associations identified in the primary study) in the replication study. RR is a conditional probability measuring how likely a primary positive association will also be positive in the replication study. This can be used to guide the design of replication study, and to check the consistency between the results of primary study and those of replication study. FIR, on the contrary, measures how likely a primary positive association may still be a true positive even when it is negative in the replication study. This can be used to generate a list of potentially true associations in the irreproducible findings for further scrutiny. The estimation methods of these two measures are given. Simulation results and real experiments show that our estimation methods have high accuracy and good prediction performance.

Vaccine escape in 2013-4 and the hydropathic evolution of glycoproteins of A/H3N2 viruses

Vaccine escape in 2013-4 and the hydropathic evolution of glycoproteins of A/H3N2 viruses
J. C. Phillips

More virulent strains of influenza virus subtypes H1N1 appeared widely in 2007 and H3N2 in 2011, and especially 2013-4, when the effectiveness of the H3N2 vaccine decreased nearly to zero. The amino acid differences of neuraminidase from prior less virulent strains appear to be small (<1%) when tabulated through sequence alignments and counting site identities and similarities. Here we show how analyzing fractal hydropathic forces responsible for neuraminidase globular compaction and modularity quantifies the mutational origins of increased virulence. It also predicts vaccine escape and specifies optimized targets for the 2015 H3N2 vaccine different from the WHO target. Unlike some earlier methods based on measuring hemagglutinin antigenic drift and ferret sera, which take several years, cover only a few candidate strains, and are ambiguous, the new methods are timely and can be completed, using NCBI and GISAID amino acid sequences only, in a few days.

Speciation, ecological opportunity, and latitude

Speciation, ecological opportunity, and latitude
Dolph Schluter

Evolutionary hypotheses to explain the greater numbers of species in the tropics than the temperate zone include greater age and area, higher temperature and metabolic rates, and greater ecological opportunity. These ideas make contrasting predictions about the relationship between speciation processes and latitude, which I elaborate and evaluate. Available data suggest that per capita speciation rates are currently highest in the temperate zone, and that diversification rates (speciation minus extinction) are similar between latitudes. In contrast, clades whose oldest analyzed dates precede the Eocene thermal maximum, when the extent of the tropics was much greater than today, tend to show highest speciation and diversification rates in the tropics. These findings are consistent with age and area, which is alone among hypotheses in predicting a time trend. Higher recent speciation rates in the temperate zone than the tropics suggest an additional response to high ecological opportunity associated with low species diversity. These broad patterns are compelling but provide limited insights into underlying mechanisms, arguing that studies of speciation processes along the latitudinal gradient will be vital. Using threespine stickleback in depauperate northern lakes as an example, I show how high ecological opportunity can lead to rapid speciation. The results support a role for ecological opportunity in speciation, but its importance in the evolution of the latitudinal gradient remains uncertain. I conclude that per-capita evolutionary rates are no longer higher in the tropics than the temperate zone. Nevertheless, the vast numbers of species that have already accumulated in the tropics ensure that total rate of species production remains highest there. Thus, tropical evolutionary momentum helps to perpetuate the steep latitudinal biodiversity gradient.

Combining exome and gene expression datasets in one graphical model of disease to empower the discovery of disease mechanisms

Combining exome and gene expression datasets in one graphical model of disease to empower the discovery of disease mechanisms
Aziz M. Mezlini, Fabio Fuligni, Adam Shlien, Anna Goldenberg

Identifying genes associated with complex human diseases is one of the main challenges of human genetics and computational medicine. To answer this question, millions of genetic variants get screened to identify a few of importance. To increase the power of identifying genes associated with diseases and to account for other potential sources of protein function aberrations, we propose a novel factor-graph based model, where much of the biological knowledge is incorporated through factors and priors. Our extensive simulations show that our method has superior sensitivity and precision compared to variant-aggregating and differential expression methods. Our integrative approach was able to identify important genes in breast cancer, identifying genes that had coding aberrations in some patients and regulatory abnormalities in others, emphasizing the importance of data integration to explain the disease in a larger number of patients.

An in-host model of HIV incorporating latent infection and viral mutation

An in-host model of HIV incorporating latent infection and viral mutation
Stephen Pankavich, Deborah Shutt

We construct a seven-component model of the in-host dynamics of the Human Immunodeficiency Virus Type-1 (i.e, HIV) that accounts for latent infection and the propensity of viral mutation. A dynamical analysis is conducted and a theorem is presented which characterizes the long time behavior of the model. Finally, we study the effects of an antiretroviral drug and treatment implications.

Fundamental Properties of the Evolution of Mutational Robustness

Fundamental Properties of the Evolution of Mutational Robustness
Lee Altenberg

Evolution on neutral networks of genotypes has been found in models to concentrate on genotypes with high mutational robustness, to a degree determined by the topology of the network. Here analysis is generalized beyond neutral networks to arbitrary selection and parent-offspring transmission. In this larger realm, geometric features determine mutational robustness: the alignment of fitness with the orthogonalized eigenvectors of the mutation matrix weighted by their eigenvalues. “House of cards” mutation is found to preclude the evolution of mutational robustness. Genetic load is shown to increase with increasing mutation in arbitrary single and multiple locus fitness landscapes. The rate of decrease in population fitness can never grow as mutation rates get higher, showing that “error catastrophes” for genotype frequencies never cause precipitous losses of population fitness. The “inclusive inheritance” approach taken here naturally extends these results to a new concept of dispersal robustness.