VSEAMS: A pipeline for variant set enrichment analysis using summary GWAS data identifies IKZF3, BATF and ESRRA as key transcription factors in type 1 diabetes

VSEAMS: A pipeline for variant set enrichment analysis using summary GWAS data identifies IKZF3, BATF and ESRRA as key transcription factors in type 1 diabetes

Oliver S Burren, Hui Guo, Chris Wallace
(Submitted on 17 Apr 2014)

Motivation: Genome-wide association studies (GWAS) have identified many loci implicated in disease susceptibility. Integration of GWAS summary statistics (p values) and functional genomic datasets should help to elucidate mechanisms. Results: We describe the extension of a previously described non-parametric method to test whether GWAS signals are enriched in functionally defined loci to a situation where only GWAS p values are available. The approach is implemented in VSEAMS, a freely available software pipeline. We use VSEAMS to integrate functional gene sets defined via transcription factor knock down experiments with GWAS results for type 1 diabetes and find variant set enrichment in gene sets associated with IKZF3, BATF and ESRRA. IKZF3 lies in a known T1D susceptibility region, whilst BATF and ESRRA overlap other immune disease susceptibility regions, validating our approach and suggesting novel avenues of research for type 1 diabetes. Availability and implementation: VSEAMS is available for download this http URL

Gradual divergence and diversification of mammalian duplicate gene functions

Gradual divergence and diversification of mammalian duplicate gene functions

Raquel Assis, Doris Bachtrog

Gene duplication provides raw material for the evolution of functional innovation. We recently developed a phylogenetic method to classify the evolutionary processes underlying the retention and functional evolution of duplicate genes by quantifying divergence of their gene expression profiles. Here, we apply our method to pairs of duplicate genes in eight mammalian genomes, using data from 11 distinct tissues to construct spatial gene expression profiles. We find that young mammalian duplicates are often functionally conserved, and that functional divergence gradually increases with evolutionary distance between species. Examination of expression patterns in genes with conserved and new functions supports the ?out-of-testes? hypothesis, in which new genes arise with testis-specific functions and acquire functions in other tissues over time. While new functions tend to be tissue-specific, there is no bias toward expression in any particular tissue. Thus, duplicate genes acquire a diversity of functions outside of the testes, possibly contributing to the origin of a multitude of complex phenotypes during mammalian evolution.

Identifying the genetic basis of antigenic change in influenza A(H1N1)

Identifying the genetic basis of antigenic change in influenza A(H1N1)

William T. Harvey, Victoria Gregory, Donald J. Benton, James P. J. Hall, Rodney S. Daniels, Trevor Bedford, Daniel T. Haydon, Alan J. Hay, John W. McCauley, Richard Reeve
(Submitted on 16 Apr 2014)

Determining phenotype from genetic data is a fundamental challenge for virus research. Identification of emerging antigenic variants among circulating influenza viruses is critical to the vaccine virus selection process, with effectiveness maximized when vaccine constituents are antigenically matched to circulating viruses. Generally, antigenic similarity of viruses is assessed by the haemagglutination inhibition (HI) assay. We present models that define key antigenic determinants by identifying substitutions that significantly affect antigenic phenotype assessed using HI assay. Sequences of 506 haemagglutinin (HA) proteins from seasonal influenza A(H1N1) isolates and reference viruses, spanning over a decade, with complementary HI data and a crystallographic structure were analysed. We identified substitutions at fifteen surface-exposed positions as causing changes in antigenic phenotype of HA. At four positions the antigenic impact of substitutions was apparent at multiple points in the phylogeny, while eleven further sites were resolved by identifying branches containing antigenicity-changing events and determining the substitutions responsible by ancestral state reconstruction. Reverse genetics was used to demonstrate the causal effect on antigenicity of a subset of substitutions including one instance where multiple contemporaneous substitutions made a definitive identification impossible in silico. This technique quantifies the impact of specific amino acid substitutions allowing us to make predictions of antigenic distance, increasing the value of new genetic sequence data for monitoring antigenic drift and phenotypic evolution. It demonstrates the generality of an approach originally developed for foot-and-mouth disease virus that could be extended to other established and emerging influenza virus subtypes as well as other antigenically variable pathogens.

READemption – A tool for the computational analysis of deep-sequencing-based transcriptome data

READemption – A tool for the computational analysis of deep-sequencing-based transcriptome data
Konrad Ulrich Förstner, Jörg Vogel, Cynthia Mira Sharma

Summary: RNA-Seq has become a potent and widely used method to qualitatively and quantitatively study transcriptomes. In order to draw biological conclusions based on RNA-Seq data, several steps some of which are computationally intensive, have to betaken. Our READemption pipeline takes care of these individual tasks and integrates them into an easy-to-use tool with a command line interface. To leverage the full power of modern computers, most subcommands of READemption offer parallel data processing. While READemption was mainly developed for the analysis of bacterial primary transcriptomes, we have successfully applied it to analyze RNA-Seq reads from other sample types, including whole transcriptomes, RNA immunoprecipitated with proteins, not only from bacteria, but also from eukaryotes and archaea. Availability and Implementation: READemption is implemented in Python and is published under the ISC open source license. The tool and documentation is hosted at http://pythonhosted.org/READemption (DOI:10.6084/m9.figshare.977849).

Asymptotic expression for the fixation probability of a mutant in star graphs

Asymptotic expression for the fixation probability of a mutant in star graphs

Fabio A. C. C. Chalub
(Submitted on 15 Apr 2014)

We consider the Moran process in a graph called “star” and obtain the asymptotic expression for the fixation probability of a single mutant when the size of the graph is large. The expression obtained corrects previously known expression announced in reference [E Lieberman, C Hauert, and MA Nowak. Evolutionary dynamics on graphs. Nature, 433(7023):312-316, 2005] and further studied in [M. Broom and J. Rychtar. An analysis of the fixation probability of a mutant on special classes of non-directed graphs. Proc. R. Soc. A-Math. Phys. Eng. Sci., 464(2098):2609-2627, 2008]. We also show that the star graph is an accelerator of evolution, if the graph is large enough.

Historical contingency and entrenchment in protein evolution under purifying selection

Historical contingency and entrenchment in protein evolution under purifying selection

Premal Shah, Joshua B. Plotkin
(Submitted on 15 Apr 2014)

The fitness contribution of an allele at one genetic site may depend on the states of other sites, a phenomenon known as epistasis. Epistasis can profoundly influence the process of evolution in populations under selection, and shape the course of protein evolution across divergent species. Whereas epistasis among adaptive substitutions has been the subject of extensive study, relatively little is known about epistasis under purifying selection. Here we use mechanistic models of thermodynamic stability in a ligand-binding protein to explore computationally the structure of epistatic interactions among substitutions that fix in protein sequences under purifying selection. We find that the selection coefficients of mutations that are nearly neutral when they fix are highly conditional on the presence of preceding mutations. In addition, substitutions which are initially neutral become increasingly entrenched over time due to antagonistic epistasis with subsequent substitutions. Our evolutionary model includes insertions and deletions, as well as point mutations, which allows us to quantify epistasis between these classes of mutations, and also to study the evolution of protein length. We find that protein length remains largely constant over time, because indels are more deleterious than point mutations. Our results imply that, even under purifying selection, protein sequence evolution is highly contingent on history and it cannot be predicted by the phenotypic effects of mutations introduced into the wildtype sequence alone.

Bayesian Neural Networks for Genetic Association Studies of Complex Disease

Bayesian Neural Networks for Genetic Association Studies of Complex Disease

Andrew L. Beam, Alison Motsinger-Reif, Jon Doyle
(Submitted on 15 Apr 2014)

Discovering causal genetic variants from large genetic association studies poses many difficult challenges. Assessing which genetic markers are involved in determining trait status is a computationally demanding task, especially in the presence of gene-gene interactions. A non-parametric Bayesian approach in the form of a Bayesian neural network is proposed for use in analyzing genetic association studies. Demonstrations on synthetic and real data reveal they are able to efficiently and accurately determine which variants are involved in determining case-control status. Using graphics processing units (GPUs) the time needed to build these models is decreased by several orders of magnitude. In comparison with commonly used approaches for detecting interactions, Bayesian neural networks perform very well across a broad spectrum of possible genetic relationships. The proposed framework is shown to be powerful at detecting causal SNPs while having the computational efficiency needed handle large datasets.