Variation of nonsynonymous/synonymous rate ratios at HLA genes over time and phylogenetic context

Variation of nonsynonymous/synonymous rate ratios at HLA genes over time and phylogenetic context

B&aacuterbara D Bitarello, Rodrigo dos Santos Francisco, Diogo Meyer
doi: http://dx.doi.org/10.1101/008342

Many HLA loci show an excess of nonsynonymous (dN) with respect to synonymous (dS) substitutions at codons of the antigen recognition site (ARS), a hallmark of adaptive evolution. However, it remains unclear how these changes are distributed over time and across branches of the HLA phylogeny. In particular, although HLA alleles can be assigned to functionally and phylogenetically defined groups (“lineages”), a test for differences in ω (ω = dN/dS) within and between lineages is lacking. We analysed variation of ω across divergence times and phylogenetic contexts (placement of branches in the phylogeny). We found a significant positive correlation between ω at ARS codons and divergence time, and that branches between lineages have higher ω than those within lineages. The excess of nonsynonymous hanges between lineages attained significance when we used non-ARS codons to account for the fact that, even under purifying selection, ω is inflated for recently diverged alleles. Although less intensely selected, within-lineage variation at ARS codons bears evidence of selection, in the form of higher ω than those of non-ARS codons. Our results show that ω ratios of class I HLA genes vary over time, and are higher in branches connecting alleles from distinct lineages. These results suggest that although within-lineage variation bears evidence of balancing selection, the between-lineage changes have been more intensely selected. Our findings indicate the importance of considering the effect of timescale when analysing ω values over a wide spectrum of divergences, and the value of using additional markers (in our case the tightly linked non-ARS codons) to account for the temporal dynamics of ω.

Dead or just asleep? Variance of microsatellite allele distributions in the human Y-chromosome.

Dead or just asleep? Variance of microsatellite allele distributions in the human Y-chromosome.

Joe Flood
doi: http://dx.doi.org/10.1101/008227

Several different methods confirm that a number of micro-satellites on the human Y-chromosome have allele distributions with different variances in different haplogroups, after adjusting for coalescent times. This can be demonstrated through both heteroscedasticity tests and by poor correlation of the variance vectors in different subclades. The most convincing demonstration however is the complete inactivity of some markers in certain subclades – “microsatellite death”, while they are still active in companion subclades. Many microsatellites have declined in activity as they proceed down through descendant subclades. This appears to confirm the theory of microsatellite life cycles, in which point mutations cause a steady decay in activity. However, the changes are too fast to be caused by point mutations alone, and slippage events may be implicated. The rich microsatellite terrain exposed in our large single-haplotype samples provides new opportunities for genotyping and analysis.

Theoretical Foundations of Equitability and the Maximal Information Coefficient

Theoretical Foundations of Equitability and the Maximal Information Coefficient

Yakir A. Reshef, David N. Reshef, Pardis C. Sabeti, Michael Mitzenmacher
(Submitted on 21 Aug 2014)

The maximal information coefficient (MIC) is a tool for finding the strongest pairwise relationships in a data set with many variables (Reshef et al., 2011). MIC is useful because it gives similar scores to equally noisy relationships of different types. This property, called {\em equitability}, is important for analyzing high-dimensional data sets.
Here we formalize the theory behind both equitability and MIC in the language of estimation theory. This formalization has a number of advantages. First, it allows us to show that equitability is a generalization of power against statistical independence. Second, it allows us to compute and discuss the population value of MIC, which we call MIC_*. In doing so we generalize and strengthen the mathematical results proven in Reshef et al. (2011) and clarify the relationship between MIC and mutual information. Introducing MIC_* also enables us to reason about the properties of MIC more abstractly: for instance, we show that MIC_* is continuous and that there is a sense in which it is a canonical “smoothing” of mutual information. We also prove an alternate, equivalent characterization of MIC_* that we use to state new estimators of it as well as an algorithm for explicitly computing it when the joint probability density function of a pair of random variables is known. Our hope is that this paper provides a richer theoretical foundation for MIC and equitability going forward.
This paper will be accompanied by a forthcoming companion paper that performs extensive empirical analysis and comparison to other methods and discusses the practical aspects of both equitability and the use of MIC and its related statistics.

Robust Population Structure Inference and Correction in the Presence of Known or Cryptic Relatedness

Robust Population Structure Inference and Correction in the Presence of Known or Cryptic Relatedness

Matthew P Conomos, Michael B Miller, Timothy A Thornton
doi: http://dx.doi.org/10.1101/008276

Population structure inference with genetic data has been motivated by a variety of applications in population genetics and genetic association studies. Several approaches have been proposed for the identification of genetic ancestry differences in samples where study participants are assumed to be unrelated, including principal components analysis (PCA), multi-dimensional scaling (MDS), and model-based methods for proportional ancestry estimation. Many genetic studies, however, include individuals with some degree of relatedness, and existing methods for inferring genetic ancestry fail in related samples. We present a method, PC-AiR, for robust population structure inference in the presence of known or cryptic relatedness. PC-AiR utilizes genome-screen data and an efficient algorithm to identify a diverse subset of unrelated individuals that is representative of all ancestries in the sample. The PC-AiR method directly performs PCA on the identified ancestry representative subset and then predicts components of variation for all remaining individuals based on genetic similarities. In simulation studies and in applications to real data from Phase III of the HapMap Project, we demonstrate that PC-AiR provides a substantial improvement over existing approaches for population structure inference in related samples. We also demonstrate significant efficiency gains, where a single axis of variation from PC-AiR provides better prediction of ancestry in a variety of structure settings than using ten (or more) components of variation from widely used PCA and MDS approaches. Finally, we illustrate that PC-AiR can provide improved population stratification correction over existing methods in genetic association studies with population structure and relatedness.

A Consistent Estimator of the Evolutionary Rate

A Consistent Estimator of the Evolutionary Rate

Krzysztof Bartoszek, Serik Sagitov
(Submitted on 21 Aug 2014)

We consider a branching particle system where particles reproduce according to the pure birth Yule process with the birth rate L, conditioned on the observed number of particles to be equal n. Particles are assumed to move independently on the real line according to the Brownian motion with the local variance s2. In this paper we treat n particles as a sample of related species. The spatial Brownian motion of a particle describes the development of a trait value of interest (e.g. log-body-size). We propose an unbiased estimator Rn2 of the evolutionary rate r2=s2/L. The estimator Rn2 is proportional to the sample variance Sn2 computed from n trait values. We find an approximate formula for the standard error of Rn2 based on a neat asymptotic relation for the variance of Sn2.

Emergent speciation by multiple Dobzhansky-Muller incompatibilities

Emergent speciation by multiple Dobzhansky-Muller incompatibilities

Tiago , Kevin E. Bassler, Ricardo B. R. Azevedo
doi: http://dx.doi.org/10.1101/008268

The Dobzhansky-Muller model posits that incompatibilities between alleles at different loci cause speciation. However, it is known that if the alleles involved in a Dobzhansky-Muller incompatibility (DMI) between two loci are neutral, the resulting reproductive isolation cannot be maintained in the presence of either mutation or gene flow. Here we propose that speciation can emerge through the collective effects of multiple neutral DMIs that cannot, individually, cause speciation-a mechanism we call emergent speciation. We investigate emergent speciation using a haploid neutral network model with recombination. We find that certain combinations of multiple neutral DMIs can lead to speciation. Complex DMIs and high recombination rate between the DMI loci facilitate emergent speciation. These conditions are likely to occur in nature. We conclude that the interaction between DMIs may be a root cause of the origin of species.

Coordinated Evolution of Influenza A Surface Proteins

Coordinated Evolution of Influenza A Surface Proteins

Alexey D. Neverov, Sergey Kryazhimskiy, Joshua B. Plotkin, Georgii A. Bazykin
doi: http://dx.doi.org/10.1101/008235

Surface proteins hemagglutinin (HA) and neuraminidase (NA) of the human influenza A virus evolve under selection pressure to escape the human adaptive immune response and antiviral drug treatments. In addition to these external selection pressures, some mutations in HA are known to affect the adaptive landscape of NA, and vice versa, because these two proteins are physiologically interlinked. However, the extent to which evolution of one protein affects the evolution of the other is unknown. Here we develop a novel phylogenetic method for detecting the signatures of such genetic interactions between mutations in different genes, that is, inter-gene epistasis. Using this method, we show that influenza surface proteins evolve in a coordinated way, with substitutions in HA affecting substitutions in NA and vice versa, at many sites. Of particular interest is our finding that the oseltamivir-resistance mutations in NA in subtype H1N1 were likely facilitated by prior mutations in HA. Our results illustrate that the adaptive landscape of a viral protein is remarkably sensitive to its genomic context and, more generally, imply that the evolution of any single protein must be understood within the context of the entire evolving genome.