High Genetic Diversity and Adaptive Potential of Two Simian Hemorrhagic Fever Viruses in a Wild Primate Population

High Genetic Diversity and Adaptive Potential of Two Simian Hemorrhagic Fever Viruses in a Wild Primate Population
Adam L. Bailey, Michael Lauck, Andrea Weiler, Samuel D. Sibley, Jorge M. Dinis, Zachary Bergman, Chase W. Nelson, Michael Correll, Michael Gleicher, David Hyeroba, Alex Tumukunde, Geoffrey Weny, Colin Chapman, Jens Kuhn, Austin Hughes, Thomas C. Friedrich, Tony L. Goldberg, David H. O’Connor

Key biological properties such as high genetic diversity and high evolutionary rate enhance the potential of certain RNA viruses to adapt and emerge. Identifying viruses with these properties in their natural hosts could dramatically improve disease forecasting and surveillance. Recently, we discovered two novel members of the viral family Arteriviridae: simian hemorrhagic fever virus (SHFV)-krc1 and SHFV-krc2, infecting a single wild red colobus (Procolobus rufomitratus tephrosceles) in Kibale National Park, Uganda. Nearly nothing is known about the biological properties of SHFVs in nature, although the SHFV type strain, SHFV-LVR, has caused devastating outbreaks of viral hemorrhagic fever in captive macaques. Here we detected SHFV-krc1 and SHFV-krc2 in 40% and 47% of 60 wild red colobus tested, respectively. We found viral loads in excess of 1×10^6-1×10^7 RNA copies per milliliter of blood plasma for each of these viruses. SHFV-krc1 and SHFV-krc2 also showed high genetic diversity at both the inter- and intra-host levels. Analyses of synonymous and non-synonymous nucleotide diversity across viral genomes revealed patterns suggestive of positive selection in SHFV open reading frames (ORF) 5 (SHFV-krc2 only) and 7 (SHFV-krc1 and SHFV-krc2). Thus, these viruses share several important properties with some of the most rapidly evolving, emergent RNA viruses.

Variational Inference of Population Structure in Large SNP Datasets

Variational Inference of Population Structure in Large SNP Datasets
Anil Raj, Matthew Stephens, Jonathan K Pritchard

Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a dataset and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data, and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias towards detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

A Robust Model-free Approach for Rare Variants Association Studies Incorporating Gene-Gene and Gene-Environmental Interactions

A Robust Model-free Approach for Rare Variants Association Studies Incorporating Gene-Gene and Gene-Environmental Interactions
Ruixue Fan, Shaw-Hwa Lo
(Submitted on 2 Dec 2013)

Recently more and more evidence suggests that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G-G) and gene-environmental (G-E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G-G or G-E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G-G and G-E interactions.

Ploidy and the Predictability of Evolution in Fisher’s Geometric Model

Ploidy and the Predictability of Evolution in Fisher’s Geometric Model
Sandeep Venkataram, Diamantis Sellis, Dmitri A Petrov

Predicting adaptive evolutionary trajectories is a primary goal of evolutionary biology. One can differentiate between forward and backward predictability, where forward predictability measures the likelihood of the same adaptive trajectory occurring in independent evolutions and backward predictability measures the likelihood of a particular adaptive path given the knowledge of starting and final states. Recent studies have attempted to measure both forward and backward predictability using experimental evolution in asexual haploid microorganisms. Similar experiments in diploid organisms have not been conducted. Here we simulate adaptive walks using Fisher’s Geometric Model in haploids and diploids and find that adaptive walks in diploids are less forward- and more backward-predictable than adaptive walks in haploids. We argue that the difference is due to the ability of diploids in our simulations to generate transiently stable polymorphisms and to allow adaptive mutations of larger phenotypic effect. As stable polymorphisms can be generated in both haploid and diploid natural populations through a number of mechanisms, we argue that inferences based on experiments in which adaptive walks proceed through succession of monomorphic states might miss many of the key features of adaptation.

The effect of linkage on establishment and survival of locally beneficial mutations

The effect of linkage on establishment and survival of locally beneficial mutations
Simon Aeschbacher, Reinhard Buerger
(Submitted on 25 Nov 2013)

When organisms adapt to spatially heterogeneous environments, selection may drive divergence at multiple genes. If populations under divergent selection also exchange migrants, we expect genetic differentiation to be high at selected loci, relative to the baseline caused by migration and genetic drift. Indeed, empirical studies have found peaks of putatively adaptive differentiation. These are highly variable in length, some of them extending over several hundreds of thousands of base pairs. How can such `islands of differentiation’ be explained? Physical linkage produces elevated levels of differentiation at loci close to genes under selection. However, whether this is enough to account for the observed patterns of divergence is not well understood. Here, we investigate the fate of a locally beneficial mutation that arises in linkage to an existing migration-selection polymorphism and derive two important quantities: the probability that the mutation becomes established, and the expected time to its extinction. We find that intermediate levels of recombinations are sometimes favourable, and that physical linkage can lead to strongly elevated invasion probabilities and extinction times. We provide a rule of thumb for when this is the case. Moreover, we quantify the long-term effect of polygenic local adaptation on linked neutral variation.

Interspecific Introgressive Origin of Genomic Diversity in the House Mouse

Interspecific Introgressive Origin of Genomic Diversity in the House Mouse
Kevin J. Liu, Ying Song, Michael H. Kohn, Luay Nakhleh
(Submitted on 22 Nov 2013)

We report on a genome-wide scan for introgression in a eukaryote. The scan identified kilobase-to-megabase-long regions of introgressive origin involving Mus spretus in six Mus musculus domesticus chromosomes, based on genomes sampled from and near the European range of sympatry. Our analyses point to the introgression of both adaptive driver and linked passenger loci. Introgression could transfer traits, such as the discovered warfarin resistance in European M. m. domesticus, and could create new traits, as we infer using a functional network analysis. Our study sheds new light on the extent of adaptive introgession and calls for new analyses of eukaryotic genomes that explicitly account for the possibility of introgression.

Generation of high-resolution a priori Y-chromosome phylogenies using “next-generation” sequencing data

Generation of high-resolution a priori Y-chromosome phylogenies using “next-generation” sequencing data
Gregory R Magoon, Raymond H Banks, Christian Rottensteiner, Bonnie E Schrack, Vincent O Tilroe, Andrew J Grierson

An approach for generating high-resolution a priori maximum parsimony Y-chromosome (“chrY”) phylogenies based on SNP and small INDEL variant data from massively-parallel short-read (“next-generation”) sequencing data is described; the tree-generation methodology produces annotations localizing mutations to individual branches of the tree, along with indications of mutation placement uncertainty in cases for which “no-calls” (through lack of mapped reads or otherwise) at particular site precludes a precise placement of the mutation. The approach leverages careful variant site filtering and a novel iterative reweighting procedure to generate high-accuracy trees while considering variants in regions of chrY that had previously been excluded from analyses based on short-read sequencing data. It is argued that the proposed approach is also superior to previous region-based filtering approaches in that it adapts to the quality of the underlying data and will automatically allow the scope of sites considered to expand as the underlying data quality improves (e.g. through longer read lengths). Key related issues, including calling of genotypes for the hemizygous chrY, reliability of variant results, read mismappings and “heterozygous” genotype calls, and the mutational stability of different variants are discussed and taken into account. The methodology is demonstrated through application to a dataset consisting of 1292 male samples from diverse populations and haplogroups, with the majority coming from low-coverage sequencing by the 1000 Genomes Project. Application of the tree-generation approach to these data produces a tree involving over 120,000 chrY variant sites (about 45,000 sites if “singletons” are excluded). The utility of this approach in refining the Y-chromosome phylogenetic tree is demonstrated by examining results for several haplogroups. The results indicate a number of new branches on the Y-chromosome phylogenetic tree, many of them subdividing known branches, but also including some that inform the presence of additional levels along the “trunk” of the tree. Finally, opportunities for extensions of this phylogenetic analysis approach to other types of genetic data are examined.

Computational inference beyond Kingman’s coalescent

Computational inference beyond Kingman’s coalescent
Jere Koskela, Paul Jenkins, Dario Spano
(Submitted on 22 Nov 2013)

Full likelihood inference under Kingman’s coalescent is a computationally challenging problem to which importance sampling (IS) and the product of approximate conditionals (PAC) method have been applied successfully. Both methods can be expressed in terms of families of intractable conditional sampling distributions (CSDs), and rely on principled approximations for accurate inference. Recently, more general Λ- and Ξ-coalescents have been observed to provide better modelling fits to some genetic data sets. We derive families of approximate CSDs for finite sites Λ- and Ξ-coalescents, and use them to obtain “approximately optimal” IS and PAC algorithms for Λ-coalescents, yielding substantial gains in efficiency over existing methods.

Comment on “TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions” by Kim et al.

Comment on “TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions” by Kim et al.
Alexander Dobin, Thomas R Gingeras

In the recent paper by Kim et al. (Genome biology, 2013. 14(4): p. R36) the accuracy of TopHat2 was compared to other RNA-seq aligners. In this comment we re-examine most important analyses from this paper and identify several deficiencies that significantly diminished performance of some of the aligners, including incorrect choice of mapping parameters, unfair comparison metrics, and unrealistic simulated data. Using STAR (Dobin et al., Bioinformatics, 2013. 29(1): p. 15-21) as an exemplar, we demonstrate that correcting these deficiencies makes its accuracy equal or better than that of TopHat2. Furthermore, this exercise highlighted some serious issues with the TopHat2 algorithms, such as poor recall of alignments with a moderate (>3) number of mismatches, low sensitivity and high false discovery rate for splice junction detection, loss of precision for the realignment algorithm, and large number of false chimeric alignments.

Natural Allelic Variations of Xenobiotic Enzymes Pleiotropically Affect Sexual Dimorphism in Oryzias latipes

Natural Allelic Variations of Xenobiotic Enzymes Pleiotropically Affect Sexual Dimorphism in Oryzias latipes
Takafumi Katsumura, Shoji Oda, Shigeki Nakagome, Tsunehiko Hanihara, Hiroshi Kataoka, Hiroshi Mitani, Shoji Kawamura, Hiroki Oota

Sexual dimorphisms, which are phenotypic differences between males and females, are driven by sexual selection [1, 2]. Interestingly, sexually selected traits show geographic variations within species despite strong directional selective pressures [3, 4]. However, genetic factors that regulate varied sexual differences remain unknown. In this study, we show that polymorphisms in cytochrome P450 (CYP) 1B1, which encodes a xenobiotic-metabolising enzyme, are associated with local differences of sexual dimorphisms in the anal fin morphology of medaka fish (Oryzias latipes). High and low activity CYP1B1 alleles increased and decreased differences in anal fin sizes, respectively. Behavioural and phylogenetic analyses suggest maintenance of the high activity allele by sexual selection, whereas the low activity allele may have evolved by positive selection due to by-product effects of CYP1B1. The present data can elucidate evolutionary mechanisms behind genetic variations in sexual dimorphism and indicate pleiotropic effects of xenobiotic enzymes.