Identifying the genetic basis of antigenic change in influenza A(H1N1)

Identifying the genetic basis of antigenic change in influenza A(H1N1)

William T. Harvey, Victoria Gregory, Donald J. Benton, James P. J. Hall, Rodney S. Daniels, Trevor Bedford, Daniel T. Haydon, Alan J. Hay, John W. McCauley, Richard Reeve
(Submitted on 16 Apr 2014)

Determining phenotype from genetic data is a fundamental challenge for virus research. Identification of emerging antigenic variants among circulating influenza viruses is critical to the vaccine virus selection process, with effectiveness maximized when vaccine constituents are antigenically matched to circulating viruses. Generally, antigenic similarity of viruses is assessed by the haemagglutination inhibition (HI) assay. We present models that define key antigenic determinants by identifying substitutions that significantly affect antigenic phenotype assessed using HI assay. Sequences of 506 haemagglutinin (HA) proteins from seasonal influenza A(H1N1) isolates and reference viruses, spanning over a decade, with complementary HI data and a crystallographic structure were analysed. We identified substitutions at fifteen surface-exposed positions as causing changes in antigenic phenotype of HA. At four positions the antigenic impact of substitutions was apparent at multiple points in the phylogeny, while eleven further sites were resolved by identifying branches containing antigenicity-changing events and determining the substitutions responsible by ancestral state reconstruction. Reverse genetics was used to demonstrate the causal effect on antigenicity of a subset of substitutions including one instance where multiple contemporaneous substitutions made a definitive identification impossible in silico. This technique quantifies the impact of specific amino acid substitutions allowing us to make predictions of antigenic distance, increasing the value of new genetic sequence data for monitoring antigenic drift and phenotypic evolution. It demonstrates the generality of an approach originally developed for foot-and-mouth disease virus that could be extended to other established and emerging influenza virus subtypes as well as other antigenically variable pathogens.

READemption – A tool for the computational analysis of deep-sequencing-based transcriptome data

READemption – A tool for the computational analysis of deep-sequencing-based transcriptome data
Konrad Ulrich Förstner, Jörg Vogel, Cynthia Mira Sharma

Summary: RNA-Seq has become a potent and widely used method to qualitatively and quantitatively study transcriptomes. In order to draw biological conclusions based on RNA-Seq data, several steps some of which are computationally intensive, have to betaken. Our READemption pipeline takes care of these individual tasks and integrates them into an easy-to-use tool with a command line interface. To leverage the full power of modern computers, most subcommands of READemption offer parallel data processing. While READemption was mainly developed for the analysis of bacterial primary transcriptomes, we have successfully applied it to analyze RNA-Seq reads from other sample types, including whole transcriptomes, RNA immunoprecipitated with proteins, not only from bacteria, but also from eukaryotes and archaea. Availability and Implementation: READemption is implemented in Python and is published under the ISC open source license. The tool and documentation is hosted at http://pythonhosted.org/READemption (DOI:10.6084/m9.figshare.977849).

Asymptotic expression for the fixation probability of a mutant in star graphs

Asymptotic expression for the fixation probability of a mutant in star graphs

Fabio A. C. C. Chalub
(Submitted on 15 Apr 2014)

We consider the Moran process in a graph called “star” and obtain the asymptotic expression for the fixation probability of a single mutant when the size of the graph is large. The expression obtained corrects previously known expression announced in reference [E Lieberman, C Hauert, and MA Nowak. Evolutionary dynamics on graphs. Nature, 433(7023):312-316, 2005] and further studied in [M. Broom and J. Rychtar. An analysis of the fixation probability of a mutant on special classes of non-directed graphs. Proc. R. Soc. A-Math. Phys. Eng. Sci., 464(2098):2609-2627, 2008]. We also show that the star graph is an accelerator of evolution, if the graph is large enough.

Historical contingency and entrenchment in protein evolution under purifying selection

Historical contingency and entrenchment in protein evolution under purifying selection

Premal Shah, Joshua B. Plotkin
(Submitted on 15 Apr 2014)

The fitness contribution of an allele at one genetic site may depend on the states of other sites, a phenomenon known as epistasis. Epistasis can profoundly influence the process of evolution in populations under selection, and shape the course of protein evolution across divergent species. Whereas epistasis among adaptive substitutions has been the subject of extensive study, relatively little is known about epistasis under purifying selection. Here we use mechanistic models of thermodynamic stability in a ligand-binding protein to explore computationally the structure of epistatic interactions among substitutions that fix in protein sequences under purifying selection. We find that the selection coefficients of mutations that are nearly neutral when they fix are highly conditional on the presence of preceding mutations. In addition, substitutions which are initially neutral become increasingly entrenched over time due to antagonistic epistasis with subsequent substitutions. Our evolutionary model includes insertions and deletions, as well as point mutations, which allows us to quantify epistasis between these classes of mutations, and also to study the evolution of protein length. We find that protein length remains largely constant over time, because indels are more deleterious than point mutations. Our results imply that, even under purifying selection, protein sequence evolution is highly contingent on history and it cannot be predicted by the phenotypic effects of mutations introduced into the wildtype sequence alone.

Bayesian Neural Networks for Genetic Association Studies of Complex Disease

Bayesian Neural Networks for Genetic Association Studies of Complex Disease

Andrew L. Beam, Alison Motsinger-Reif, Jon Doyle
(Submitted on 15 Apr 2014)

Discovering causal genetic variants from large genetic association studies poses many difficult challenges. Assessing which genetic markers are involved in determining trait status is a computationally demanding task, especially in the presence of gene-gene interactions. A non-parametric Bayesian approach in the form of a Bayesian neural network is proposed for use in analyzing genetic association studies. Demonstrations on synthetic and real data reveal they are able to efficiently and accurately determine which variants are involved in determining case-control status. Using graphics processing units (GPUs) the time needed to build these models is decreased by several orders of magnitude. In comparison with commonly used approaches for detecting interactions, Bayesian neural networks perform very well across a broad spectrum of possible genetic relationships. The proposed framework is shown to be powerful at detecting causal SNPs while having the computational efficiency needed handle large datasets.

Modeling DNA methylation dynamics with approaches from phylogenetics

Modeling DNA methylation dynamics with approaches from phylogenetics

John A. Capra, Dennis Kostka
(Submitted on 11 Apr 2014)

Methylation of CpG dinucleotides is a prevalent epigenetic modification that is required for proper development in vertebrates, and changes in CpG methylation are essential to cellular differentiation. Genome-wide DNA methylation assays have become increasingly common, and recently distinct stages across differentiating cellular lineages have been assayed. How- ever, current methods for modeling methylation dynamics do not account for the dependency structure between precursor and dependent cell types. We developed a continuous-time Markov chain approach, based on the observation that changes in methylation state over tissue differentiation can be modeled similarly to DNA nucleotide changes over evolutionary time. This model explicitly takes precursor to descendant relationships into account and enables inference of CpG methylation dynamics. To illustrate our method, we analyzed a high-resolution methylation map of the differentiation of mouse stem cells into several blood cell types. Our model can successfully infer unobserved CpG methylation states from observations at the same sites in related cell types (90% correct), and this approach more accurately reconstructs missing data than imputation based on neighboring CpGs (84% correct). Additionally, the single CpG resolution of our methylation dynamics estimates enabled us to show that DNA sequence context of CpG sites is informative about methylation dynamics across tissue differentiation. Finally, we identified genomic regions with clusters of highly dynamic CpGs and present a likely functional example. Our work establishes a framework for inference and modeling that is well-suited to DNA methylation data, and our success suggests that other methods for analyzing DNA nucleotide substitutions will also translate to the modeling of epigenetic phenomena.

Flexible methods for estimating genetic distances from nucleotide data

Flexible methods for estimating genetic distances from nucleotide data

Simon Joly, David J Bryant, Peter J Lockhart

With the increasing use of massively parallel sequencing approaches in evolutionary biology, the need for fast and accurate methods suitable to investigate genetic structure and evolutionary history are more important than ever. We propose new distance measures for estimating genetic distances between individuals when allelic variation, gene dosage and recombination could compromise standard approaches. We present four distance measures based on single nucleotide polymorphisms (SNP) and evaluate them against previously published measures using coalescent-based simulations. Simulations were used to test (i) whether the measures give unbiased and accurate distance estimates, (ii) if they can accurately identify the genomic mixture of hybrid individuals and (iii) if they give precise (low variance) estimates. The results showed that the SNP-based genpofad distance we propose appears to work well in the widest circumstances. It was the most accurate method for estimating genetic distances and is also relatively good at estimating the genomic mixture of hybrid individuals. Our simulations provide benchmarks to compare the performance of different distance measures in specific situations.

Genetic Influences on Brain Gene Expression in Rats Selected for Tameness and Aggression

Genetic Influences on Brain Gene Expression in Rats Selected for Tameness and Aggression

Henrike O. Heyne, Susann Lautenschläger, Ronald Nelson, François Besnier, Maxime Rotival, Alexander Cagan, Rimma Kozhemyakina, Irina Z. Plyusnina, Lyudmila Trut, Örjan Carlborg, Enrico Petretto, Leonid Kruglyak, Svante Pääbo, Torsten Schöneberg, Frank W. Albert
(Submitted on 14 Apr 2014)

Inter-individual differences in many behaviors are partly due to genetic differences, but the identification of the genes and variants that influence behavior remains challenging. Here, we studied an F2 intercross of two outbred lines of rats selected for tame and aggressive behavior towards humans for more than 64 generations. By using a mapping approach that is able to identify genetic loci segregating within the lines, we identified four times more loci influencing tameness and aggression than by an approach that assumes fixation of causative alleles, suggesting that many causative loci were not driven to fixation by the selection. We used RNA sequencing in 150 F2 animals to identify hundreds of loci that influence brain gene expression. Several of these loci colocalize with tameness loci and may reflect the same genetic variants. Through analyses of correlations between allele effects on behavior and gene expression, differential expression between the tame and aggressive rat selection lines, and correlations between gene expression and tameness in F2 animals, we identify the genes Gltscr2, Lgi4, Zfp40 and Slc17a7 as candidate contributors to the strikingly different behavior of the tame and aggressive animals.

Comparing Evolutionary Rates Using An Exact Test for 2×2 Tables with Continuous Cell Entries

Comparing Evolutionary Rates Using An Exact Test for 2×2 Tables with Continuous Cell Entries

A. Morgan Thompson, M. Cyrus Maher, Lawrence H. Uricchio, Zachary A. Szpiech, Ryan D. Hernandez
(Submitted on 11 Apr 2014)

Assessing the statistical significance of an observed 2×2 contingency table can easily be accomplished using Fisher’s exact test (FET). However, if the cell entries are continuous or represent values inferred from a continuous parametric model, then FET cannot be applied. Such tables arise frequently in areas of biostatistical research including population genetics and evolutionary genomics, where cell entries are estimated by computational methods and result in cell entries drawn from the non-negative real line R+. Simply rounding cell entries to conform to the assumptions of FET is an ill-suited approach that we show creates problems related to both type-I and type-II errors. Pearson’s chi^2 test for independence, while technically applicable, is not often effective for these tables, as the test has several limiting assumptions that make application of this method inadvisable in many common instances (particularly with small cell entries). Here we develop a novel method for tables with continuous entries, which we term continuous Fisher’s Exact Test (cFET). Through simulations, we show that cFET has a close-to-uniform distribution of p-values under the null hypothesis of independence, and more power when applied to tables where the null hypothesis is false (compared to FET applied to rounded cell entries). We apply cFET to an example from comparative genomics to confirm an overall increased evolutionary rate among primates compared to rodents, and identify several genes that show particularly elevated evolutionary rates in primates. Some of these genes exhibit signatures of continued positive selection along the human lineage since our divergence with chimpanzee 5-7 million years ago, as well as ongoing selection in modern humans.

Selection signatures in worldwide Sheep populations

Selection signatures in worldwide Sheep populations

Maria-Ines Fariello, Bertrand Servin, Gwenola Tosser-Klopp, Rachelle Rupp, Carole Moreno, International Sheep Genomics Consortium n.a., Magali San Cristobal, simon boitard

The diversity of populations in domestic species offers great opportunities to study genome response to selection. The recently published Sheep HapMap dataset is a great example of characterization of the world wide genetic diversity in sheep. In this study, we re-analyzed the Sheep HapMap dataset to identify selection signatures in worldwide sheep populations. Compared to previous analyses, we made use of statistical methods that (i) take account of the hierarchical structure of sheep populations, (ii) make use of linkage disequilibrium information and (iii) focus specifically on either recent or older selection signatures. We show that this allows pinpointing several new selection signatures in the sheep genome and distinguishing those related to modern breeding objectives and to earlier post-domestication constraints. The newly identified regions, together with the ones previously identified, reveal the extensive genome response to selection on morphology, color and adaptation to new environments.