Haplotypes of common SNPs can explain missing heritability of complex diseases

Haplotypes of common SNPs can explain missing heritability of complex diseases

Gaurav Bhatia, Alexander Gusev, Po-Ru Loh, Bjarni J Vilhjálmsson, Stephan Ripke, Shaun Purcell, Eli Stahl, Mark Daly, Teresa R de Candia, Kenneth S Kendler, Michael C O’Donovan, Sang Hong Lee, Naomi R Wray, Benjamin M Neale, Matthew C Keller, Noah A Zaitlen, Bogdan Pasaniuc, Jian Yang, Alkes L Price, Schizophrenia Working Group Psychiatric Genomics C
doi: http://dx.doi.org/10.1101/022418

While genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2), recent work has shown that more heritability is explained by all genotyped SNPs (hg2). However, much of the heritability is still missing (hg2

0.1% explained substantially more phenotypic variance (hhap2 = 0.64 (S.E. 0.084)) than genotyped SNPs alone (hg2 = 0.32 (S.E. 0.029)). These estimates were based on cross-cohort comparisons, ensuring that cohort-specific assay artifacts did not contribute to our estimates. In a large multiple sclerosis data set (WTCCC2-MS), we observed an even larger difference between hhap2 and hg2, though data from other cohorts will be required to validate this result. Overall, our results suggest that haplotypes of common SNPs can explain a large fraction of missing heritability of complex disease, shedding light on genetic architecture and informing disease mapping strategies.

Joint estimation of contamination, error and demography for nuclear DNA from ancient humans

Joint estimation of contamination, error and demography for nuclear DNA from ancient humans

Fernando Racimo, Gabriel Renaud, Montgomery Slatkin
doi: http://dx.doi.org/10.1101/022285

When sequencing an ancient DNA sample from a hominin fossil, DNA from present-day humans involved in excavation and extraction will be sequenced along with the endogenous material. This type of contamination is problematic for downstream analyses as it will introduce a bias towards the population to which the contaminating individuals belong. Quantifying the extent of contamination is a crucial step as it allows researchers to account for possible biases that may arise in downstream genetic analyses. Here, we present an MCMC algorithm to co-estimate the contamination rate, sequencing error rate and demographic parameters – including drift times and admixture rates – for an ancient nuclear genome obtained from human remains, when the putative contaminating DNA comes from present-day humans. We assume we have a large panel representing the putative contaminating population (e.g. European, East Asian or African). The method is implemented in a C++ program called ‘Demographic Inference with Contamination and Error’ (DICE). The program can also be used to determine the most likely population to which the contaminant DNA belongs. We applied it to simulations and Neanderthal genome data, and we recover accurate estimates of all parameters, even when the average sequencing coverage is low (0.5X) and the per-read contamination rate is high (25%).

Automated and accurate estimation of gene family abundance from shotgun metagenomes

Automated and accurate estimation of gene family abundance from shotgun metagenomes

Stephen Nayfach, Patrick H. Bradley, Stacia K. Wyman, Timothy J. Laurent, Alex Williams, Jonathan A. Eisen, Katherine S. Pollard, Thomas J. Sharpton
doi: http://dx.doi.org/10.1101/022335

Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several bioinformatic tools can be used to functionally annotate metagenomes, allowing researchers to draw inferences about the functional potential of the community and to identify putative functional biomarkers. However, little is known about how decisions made during annotation affect the reliability of the results. Here, we use statistical simulations to rigorously assess how to optimize annotation accuracy and speed, given parameters of the input data like read length and library size. We identify best practices in metagenome annotation and use them to guide the development of the Shotgun Metagenome Annotation Pipeline (ShotMAP). ShotMAP is an analytically flexible, end-to-end annotation pipeline that can be implemented either on a local computer or a cloud compute cluster. We use ShotMAP to assess how different annotation databases impact the interpretation of how marine metagenome and metatranscriptome functional capacity changes across seasons. We also apply ShotMAP to data obtained from a clinical microbiome investigation of inflammatory bowel disease. This analysis finds that gut microbiota collected from Crohn’s disease patients are functionally distinct from gut microbiota collected from either ulcerative colitis patients or healthy controls, with differential abundance of metabolic pathways related to host-microbiome interactions that may serve as putative biomarkers of disease.

Whole genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection

Whole genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection

PingHsun Hsieh, Krishna R Veeramah, Joseph Lachance, Sarah A Tishkoff, Jeffrey D Wall, Michael F Hammer, Ryan N Gutenkunst
doi: http://dx.doi.org/10.1101/022194

African Pygmies practicing a mobile hunter-gatherer lifestyle are phenotypically and genetically diverged from other anatomically modern humans, and they likely experienced strong selective pressures due to their unique lifestyle in the Central African rainforest. To identify genomic targets of adaptation, we sequenced the genomes of four Biaka Pygmies from the Central African Republic and jointly analyzed these data with the genome sequences of three Baka Pygmies from Cameroon and nine Yoruba famers. To account for the complex demographic history of these populations that includes both isolation and gene flow, we fit models using the joint allele frequency spectrum and validated them using independent approaches. Our two best-fit models both suggest ancient divergence between the ancestors of the farmers and Pygmies, 90,000 or 150,000 years ago. We also find that bi-directional asymmetric gene-flow is statistically better supported than a single pulse of unidirectional gene flow from farmers to Pygmies, as previously suggested. We then applied complementary statistics to scan the genome for evidence of selective sweeps and polygenic selection. We found that conventional statistical outlier approaches were biased toward identifying candidates in regions of high mutation or low recombination rate. To avoid this bias, we assigned P-values for candidates using whole-genome simulations incorporating demography and variation in both recombination and mutation rates. We found that genes and gene sets involved in muscle development, bone synthesis, immunity, reproduction, cell signaling and development, and energy metabolism are likely to be targets of positive natural selection in Western African Pygmies or their recent ancestors.

Simple multi-trait analysis identifies novel loci associated with growth and obesity measures

Simple multi-trait analysis identifies novel loci associated with growth and obesity measures

Xia Shen, Xiao Wang, Zheng Ning, Yakov Tsepilov, Masoud Shirali, Blair H. Smith, Lynne J. Hocking, Sandosh Padmanabhan, Caroline Hayward, David J. Porteous, Yudi Pawitan, Chris S. Haley, Yurii S. Aulchenko, Generation Scotland
doi: http://dx.doi.org/10.1101/022269

Anthropometric traits are of global clinical relevance as risk factors for a wide range of disease, including obesity. Yet despite many hundreds of genetic variants having been associated with anthropometric measurements, these variants still explain little variation of the traits. Joint-modeling of multiple anthropometric traits, has the potential to boost discovery power, but has not been applied to global-scale meta-analyses of genome-wide association studies (meta-GWAS). Here, we develop a simple method to perform multi-trait meta-GWAS using summary statistics reported in standard single-trait meta-GWAS and replicate the findings in an independent cohort. Using the summary statistics reported by the GIANT consortium meta-GWAS of 270,000 individuals, we discovered 359 novel loci significantly associated with six anthropometric traits. The “overeating gene” GRM5 (P = 4.38E-54) was the strongest novel locus, and was independently replicated in the Generation Scotland cohort (n = 9,603, P = 4.42E-03). The novel variants had an enriched rediscovery rate in the replication cohort. Our results provide new important insights into the biological mechanisms underlying anthropometric traits and emphasize the value of combining multiple correlated phenotypes in genomic studies. Our method has general applicability and can be applied as a secondary analysis of any standard GWAS or meta-GWAS with multiple traits.

TreeQTL: hierarchical error control for eQTL findings

TreeQTL: hierarchical error control for eQTL findings

Christine Peterson, Marina Bogomolov, Yoav Benjamini, Chiara Sabatti
doi: http://dx.doi.org/10.1101/021170

Commonly used multiplicity adjustments fail to control the error rate for reported findings in many expression quantitative trait loci (eQTL) studies. TreeQTL implements a stage-wise multiple testing procedure which allows control of appropriate error rates defined relative to a hierarchical grouping of the eQTL hypotheses. The R package TreeQTL is available for download at http://bioinformatics.org/treeqtl.

Flawed evidence for convergent evolution of the circadian CLOCK gene in mole-rats

Flawed evidence for convergent evolution of the circadian CLOCK gene in mole-rats

Frédéric Delsuc
doi: http://dx.doi.org/10.1101/022004

Convergently evolved mole-rats (Mammalia, Rodentia) provide a fascinating model for studying convergent molecular evolution. Three genome sequences have recently been made available for the blind mole-rat (Nannospalax galili; Spalacidae; Muroidea)1, and the convergently evolved naked mole-rat (Heterocephalus glaber; Heterocephalidae; Ctenohystrica)2 and its close relative the Damaraland mole-rat (Fukomys damarensis; Bathyergidae; Ctenohystrica)3. In their genome paper1, Fang et al. evaluated convergent molecular evolution related to the subterranean life-style between the naked mole-rat and the blind mole-rat. One particularly striking result was the strong signal for amino acid convergence detected in the circadian rhythm CLOCK gene. Here I show that this unexpected result is erroneous because it is based on the use of the wrong sequence for the naked mole-rat, which has been mistakenly replaced by a sequence from a blind mole-rat. When the correct sequence is used, the evidence for convergent molecular evolution in this gene appears very limited.

Computing the Internode Certainty and related measures from partial gene trees.

Computing the Internode Certainty and related measures from partial gene trees.

Kassian Kobert, Leonidas Salichos, Antonis Rokas, Alexandros Stamatakis
doi: http://dx.doi.org/10.1101/022053

We present, implement, and evaluate an approach to calculate the internode certainty and tree certainty on a given reference tree from a collection of partial gene trees. Previously, the calculation of these values was only possible from a collection of gene trees with exactly the same taxon set as the reference tree. An application to sets of partial gene trees requires mathematical corrections in the internode certainty and tree certainty calculations. We implement our methods in RAxML and test them on empirical data sets. These tests imply that the inclusion of partial trees does matter. However, in order to provide meaningful measurements, any data set should also contain comprehensive trees.

The role of recombination in evolutionary rescue

The role of recombination in evolutionary rescue

Hildegard Uecker, Joachim Hermisson
doi: http://dx.doi.org/10.1101/022020

How likely is it that a population escapes extinction through adaptive evolution? The answer to this question is of great relevance in conservation biology, where we aim at species’ rescue and the maintenance of biodiversity, and in agriculture and epidemiology, where we seek to hamper the emergence of pesticide or drug resistance. By reshuffling the genome, recombination has two antagonistic effects on the probability of evolutionary rescue: it generates and it breaks up favorable gene combinations. Which of the two effects prevails, depends on the fitness effects of mutations and on the impact of stochasticity on the allele frequencies. In this paper, we analyze a mathematical model for rescue after a sudden environmental change when adaptation is contingent on mutations at two loci. The analysis reveals a complex nonlinear dependence of population survival on recombination. We moreover find that, counterintuitively, a fast eradication of the wildtype can promote rescue in the presence of recombination. The model also shows that two-step rescue is not unlikely to happen and can even be more likely than single-step rescue (where adaptation relies on a single mutation), depending on the circumstances.

Heterozygous gene truncation delineates the human haploinsufficient genome

Heterozygous gene truncation delineates the human haploinsufficient genome

István Bartha, Antonio Rausell, Paul McLaren, Manuel Tardaguila, Pejman Mohammadi, Nimisha Chaturvedi, Jacques Fellay, Amalio Telenti
doi: http://dx.doi.org/10.1101/010611

Sequencing projects have identified large numbers of rare stop-gain and frameshift variants in the human genome. As most of these are observed in the heterozygous state, they test a gene?s tolerance to haploinsufficiency and dominant loss of function. We analyzed the distribution of truncating variants across 16,260 protein coding autosomal genes in 11,546 individuals. We observed 39,893 truncating variants affecting 12,062 genes, which significantly differed from an expectation of 12,916 genes under a model of neutral de novo mutation (p<1E-4). Extrapolating this to increasing numbers of sequenced individuals, we estimate that 10.8% of human genes do not tolerate heterozygous truncating variants. An additional 10 to 15% of truncated genes may be rescued by incomplete penetrance or compensatory mutations, or because the truncating variants are of limited functional impact. The study of protein truncating variants delineates the essential genome and, more generally, identifies rare heterozygous variants as an unexplored source of diversity of phenotypic traits and diseases.