Exploring the genetic patterns of complex diseases via the integrative genome-wide approach

Exploring the genetic patterns of complex diseases via the integrative genome-wide approach

Ben Teng, Can Yang, Jiming Liu, Zhipeng Cai, Xiang Wan
(Submitted on 26 Jan 2015)

Motivation: Genome-wide association studies (GWASs), which assay more than a million single nucleotide polymorphisms (SNPs) in thousands of individuals, have been widely used to identify genetic risk variants for complex diseases. However, most of the variants that have been identified contribute relatively small increments of risk and only explain a small portion of the genetic variation in complex diseases. This is the so-called missing heritability problem. Evidence has indicated that many complex diseases are genetically related, meaning these diseases share common genetic risk variants. Therefore, exploring the genetic correlations across multiple related studies could be a promising strategy for removing spurious associations and identifying underlying genetic risk variants, and thereby uncovering the mystery of missing heritability in complex diseases. Results: We present a general and robust method to identify genetic patterns from multiple large-scale genomic datasets. We treat the summary statistics as a matrix and demonstrate that genetic patterns will form a low-rank matrix plus a sparse component. Hence, we formulate the problem as a matrix recovering problem, where we aim to discover risk variants shared by multiple diseases/traits and those for each individual disease/trait. We propose a convex formulation for matrix recovery and an efficient algorithm to solve the problem. We demonstrate the advantages of our method using both synthesized datasets and real datasets. The experimental results show that our method can successfully reconstruct both the shared and the individual genetic patterns from summary statistics and achieve better performance compared with alternative methods under a wide range of scenarios.

Origins of cattle on Chirikof Island, Alaska

Origins of cattle on Chirikof Island, Alaska

Jared E. Decker, Jeremy F. Taylor, Matthew A. Cronin, Leeson J. Alexander, Juha Kantanen, Ann Millbrooke, Robert D. Schnabel, Michael D. MacNeil
doi: http://dx.doi.org/10.1101/014415

Feral livestock may harbor genetic variation of commercial, scientific, historical or esthetic value. Origins and uniqueness of feral cattle on Chirikof Island, Alaska are uncertain. The island is now part of the Alaska Maritime Wildlife Refuge and Federal wildlife managers want grazing to cease, presumably leading to demise of the cattle. Here we characterize the Chirikof Island cattle relative to extant breeds and discern their origins. Our analyses support the inference that Russian cattle arrived first on Chirikof Island, then approximately 95 years ago the first European taurine cattle were introduced to the island, and finally Hereford cattle were introduced about 40 years ago. While clearly Bos taurus taurus, the Chirikof Island cattle appear at least as distinct as other recognized breeds. Further, this mixture of European and East-Asian cattle is unique compared to other North American breeds and we find evidence that natural selection in the relatively harsh environment of Chirikof Island has further impacted their genetic architecture. These results provide an objective basis for decisions regarding conservation of the Chirikof Island cattle.

Evolution of Conditional Cooperativity Between HOXA11 and FOXO1 Through Allosteric Regulation

Evolution of Conditional Cooperativity Between HOXA11 and FOXO1 Through Allosteric Regulation

Mauris C. Nnamani, Soumya Ganguly, Vincent J. Lynch, Laura S. Mizoue, Yingchun Tong, Heather Darling, Monika Fuxreiter, Jens Meiler, Gunter P. Wagner
doi: http://dx.doi.org/10.1101/014381

Transcription factors (TFs) play multiple roles in different cells and stages of development. Given this multitude of functional roles it has been assumed that TFs are evolutionarily highly constrained. Here we investigate the molecular mechanisms for the origin of a derived functional interaction between two TFs that play a key role in mammalian pregnancy, HOXA11 and FOXO1. We have previously shown that the regulatory role of HOXA11 in mammalian endometrial stromal cells requires an interaction with FOXO1, and that the physical interaction between these proteins evolved long before their functional cooperativity. Through a combination of functional, biochemical, and structural approaches, we demonstrate that the derived functional cooperativity between HOXA11 and FOXO1 is due to derived allosteric regulation of HOXA11 by FOXO1. This study shows that TF function can evolve through changes affecting the functional output of a pre-existing protein complex.

Diversity of Mycobacterium tuberculosis across evolutionary scales

Diversity of Mycobacterium tuberculosis across evolutionary scales
Mary B O’Neill, Tatum D Mortimer, Caitlin S Pepperell
doi: http://dx.doi.org/10.1101/014217

Tuberculosis (TB) is a global public health emergency. Increasingly drug resistant strains of Mycobacterium tuberculosis (M.tb) continue to emerge and spread, highlighting the adaptability of this pathogen. Most studies of M.tb evolution have relied on ‘between-host’ samples, in which each person with TB is represented by a single M.tb isolate. However, individuals with TB commonly harbor populations of M.tb numbering in the billions. Here, we use analyses of M.tb diversity found within and between hosts to gain insight into the adaptation of this pathogen. We find that the amount of M.tb genetic diversity harbored by individuals with TB is similar to that of global between-host surveys of TB patients. This suggests that M.tb genetic diversity is generated within hosts and then lost as the infection is transmitted. In examining genomic data from M.tb samples within and between hosts with TB, we find that genes involved in the regulation, synthesis, and transportation of immunomodulatory cell envelope lipids appear repeatedly in the extremes of various statistical measures of diversity. Polyketide synthase and Mycobacterial membrane protein Large (mmpL) genes are particularly notable in this regard. In addition, we observe identical mutations emerging across samples from different TB patients. Taken together, our observations suggest that M.tb cell envelope lipids are targets of selection within hosts. These lipids are specific to pathogenic mycobacteria and, in some cases, human-pathogenic mycobacteria. We speculate that rapid adaptation of cell envelope lipids is facilitated by functional redundancy, flexibility in their metabolism, and their roles mediating interactions with the host.

Bayesian priors for tree calibration: Evaluating two new approaches based on fossil intervals

Bayesian priors for tree calibration: Evaluating two new approaches based on fossil intervals
Ryan W Norris, Cory L Strope, David M McCandlish, Arlin Stoltzfus
doi: http://dx.doi.org/10.1101/014340

Background: Studies of diversification and trait evolution increasingly rely on combining molecular sequences and fossil dates to infer time-calibrated phylogenetic trees. Available calibration software provides many options for the shape of the prior probability distribution of ages at a node to be calibrated, but the question of how to assign a Bayesian prior from limited fossil data remains open. Results: We introduce two new methods for generating priors based upon (1) the interval between the two oldest fossils in a clade, i.e., the penultimate gap (PenG), and (2) the ghost lineage length (GLin), defined as the difference between the oldest fossils for each of two sister lineages. We show that PenG and GLin/2 are point estimates of the interval between the oldest fossil and the true age for the node. Furthermore, given either of these quantities, we derive a principled prior distribution for the true age. This prior is log-logistic, and can be implemented approximately in existing software. Using simulated data, we test these new methods against some other approaches. Conclusions: When implemented as approaches for assigning Bayesian priors, the PenG and GLin methods increase the accuracy of inferred divergence times, showing considerably more precision than the other methods tested, without significantly greater bias. When implemented as approaches to post-hoc scaling of a tree by linear regression, the PenG and GLin methods exhibit less bias than other methods tested. The new methods are simple to use and can be applied to a variety of studies that call for calibrated trees.

Genetics of intra-species variation in avoidance behavior induced by a thermal stimulus in C. elegans

Genetics of intra-species variation in avoidance behavior induced by a thermal stimulus in C. elegans
doi: http://dx.doi.org/10.1101/014290

Individuals within a species vary in their responses to a wide range of stimuli, partly as a result of differences in their genetic makeup. Relatively little is known about the genetic and neuronal mechanisms contributing to diversity of behavior in natural populations. By studying animal-to-animal variation in innate avoidance behavior to thermal stimuli in the nematode Caenorhabditis elegans, we uncovered genetic principles of how different components of a behavioral response can be altered in nature to generate behavioral diversity. Using a thermal pulse assay, we uncovered heritable variation in responses to a transient temperature increase. Quantitative trait locus mapping revealed that separate components of this response were controlled by distinct genomic loci. The loci we identified contributed to variation in components of thermal pulse avoidance behavior in an additive fashion. Our results show that the escape behavior induced by thermal stimuli is composed of simpler behavioral components that are influenced by at least six distinct genetic loci. The loci that decouple components of the escape behavior reveal a genetic system that allows independent modification of behavioral parameters. Our work sets the foundation for future studies of evolution of innate behaviors at the molecular and neuronal level.

Partitioning heritability by functional category using GWAS summary statistics

Partitioning heritability by functional category using GWAS summary statistics
Hilary Kiyo Finucane, Brendan Bulik-Sullivan, Alexander Gusev, Gosia Trynka, Yakir Reshef, Po-Ru Loh, Verneri Anttilla, Han Xu, Chongzhi Zang, Kyle Farh, Stephan Ripke, Felix Day, ReproGen Consortium, Schizophrenia Working Group of the Psychiatric Genetics Consortium, RACI Consortium, Shaun Purcell, Eli Stahl, Sara Lindstrom, John R.B. Perry, Yukinori Okada, Soumya Raychaudhuri, Mark Daly, Nick Patterson, Benjamin M. Neale, Alkes L. Price
doi: http://dx.doi.org/10.1101/014241

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.