Simple multi-trait analysis identifies novel loci associated with growth and obesity measures

Simple multi-trait analysis identifies novel loci associated with growth and obesity measures

Xia Shen, Xiao Wang, Zheng Ning, Yakov Tsepilov, Masoud Shirali, Blair H. Smith, Lynne J. Hocking, Sandosh Padmanabhan, Caroline Hayward, David J. Porteous, Yudi Pawitan, Chris S. Haley, Yurii S. Aulchenko, Generation Scotland
doi: http://dx.doi.org/10.1101/022269

Anthropometric traits are of global clinical relevance as risk factors for a wide range of disease, including obesity. Yet despite many hundreds of genetic variants having been associated with anthropometric measurements, these variants still explain little variation of the traits. Joint-modeling of multiple anthropometric traits, has the potential to boost discovery power, but has not been applied to global-scale meta-analyses of genome-wide association studies (meta-GWAS). Here, we develop a simple method to perform multi-trait meta-GWAS using summary statistics reported in standard single-trait meta-GWAS and replicate the findings in an independent cohort. Using the summary statistics reported by the GIANT consortium meta-GWAS of 270,000 individuals, we discovered 359 novel loci significantly associated with six anthropometric traits. The “overeating gene” GRM5 (P = 4.38E-54) was the strongest novel locus, and was independently replicated in the Generation Scotland cohort (n = 9,603, P = 4.42E-03). The novel variants had an enriched rediscovery rate in the replication cohort. Our results provide new important insights into the biological mechanisms underlying anthropometric traits and emphasize the value of combining multiple correlated phenotypes in genomic studies. Our method has general applicability and can be applied as a secondary analysis of any standard GWAS or meta-GWAS with multiple traits.

TreeQTL: hierarchical error control for eQTL findings

TreeQTL: hierarchical error control for eQTL findings

Christine Peterson, Marina Bogomolov, Yoav Benjamini, Chiara Sabatti
doi: http://dx.doi.org/10.1101/021170

Commonly used multiplicity adjustments fail to control the error rate for reported findings in many expression quantitative trait loci (eQTL) studies. TreeQTL implements a stage-wise multiple testing procedure which allows control of appropriate error rates defined relative to a hierarchical grouping of the eQTL hypotheses. The R package TreeQTL is available for download at http://bioinformatics.org/treeqtl.

Flawed evidence for convergent evolution of the circadian CLOCK gene in mole-rats

Flawed evidence for convergent evolution of the circadian CLOCK gene in mole-rats

Frédéric Delsuc
doi: http://dx.doi.org/10.1101/022004

Convergently evolved mole-rats (Mammalia, Rodentia) provide a fascinating model for studying convergent molecular evolution. Three genome sequences have recently been made available for the blind mole-rat (Nannospalax galili; Spalacidae; Muroidea)1, and the convergently evolved naked mole-rat (Heterocephalus glaber; Heterocephalidae; Ctenohystrica)2 and its close relative the Damaraland mole-rat (Fukomys damarensis; Bathyergidae; Ctenohystrica)3. In their genome paper1, Fang et al. evaluated convergent molecular evolution related to the subterranean life-style between the naked mole-rat and the blind mole-rat. One particularly striking result was the strong signal for amino acid convergence detected in the circadian rhythm CLOCK gene. Here I show that this unexpected result is erroneous because it is based on the use of the wrong sequence for the naked mole-rat, which has been mistakenly replaced by a sequence from a blind mole-rat. When the correct sequence is used, the evidence for convergent molecular evolution in this gene appears very limited.

Computing the Internode Certainty and related measures from partial gene trees.

Computing the Internode Certainty and related measures from partial gene trees.

Kassian Kobert, Leonidas Salichos, Antonis Rokas, Alexandros Stamatakis
doi: http://dx.doi.org/10.1101/022053

We present, implement, and evaluate an approach to calculate the internode certainty and tree certainty on a given reference tree from a collection of partial gene trees. Previously, the calculation of these values was only possible from a collection of gene trees with exactly the same taxon set as the reference tree. An application to sets of partial gene trees requires mathematical corrections in the internode certainty and tree certainty calculations. We implement our methods in RAxML and test them on empirical data sets. These tests imply that the inclusion of partial trees does matter. However, in order to provide meaningful measurements, any data set should also contain comprehensive trees.

The role of recombination in evolutionary rescue

The role of recombination in evolutionary rescue

Hildegard Uecker, Joachim Hermisson
doi: http://dx.doi.org/10.1101/022020

How likely is it that a population escapes extinction through adaptive evolution? The answer to this question is of great relevance in conservation biology, where we aim at species’ rescue and the maintenance of biodiversity, and in agriculture and epidemiology, where we seek to hamper the emergence of pesticide or drug resistance. By reshuffling the genome, recombination has two antagonistic effects on the probability of evolutionary rescue: it generates and it breaks up favorable gene combinations. Which of the two effects prevails, depends on the fitness effects of mutations and on the impact of stochasticity on the allele frequencies. In this paper, we analyze a mathematical model for rescue after a sudden environmental change when adaptation is contingent on mutations at two loci. The analysis reveals a complex nonlinear dependence of population survival on recombination. We moreover find that, counterintuitively, a fast eradication of the wildtype can promote rescue in the presence of recombination. The model also shows that two-step rescue is not unlikely to happen and can even be more likely than single-step rescue (where adaptation relies on a single mutation), depending on the circumstances.

Heterozygous gene truncation delineates the human haploinsufficient genome

Heterozygous gene truncation delineates the human haploinsufficient genome

István Bartha, Antonio Rausell, Paul McLaren, Manuel Tardaguila, Pejman Mohammadi, Nimisha Chaturvedi, Jacques Fellay, Amalio Telenti
doi: http://dx.doi.org/10.1101/010611

Sequencing projects have identified large numbers of rare stop-gain and frameshift variants in the human genome. As most of these are observed in the heterozygous state, they test a gene?s tolerance to haploinsufficiency and dominant loss of function. We analyzed the distribution of truncating variants across 16,260 protein coding autosomal genes in 11,546 individuals. We observed 39,893 truncating variants affecting 12,062 genes, which significantly differed from an expectation of 12,916 genes under a model of neutral de novo mutation (p<1E-4). Extrapolating this to increasing numbers of sequenced individuals, we estimate that 10.8% of human genes do not tolerate heterozygous truncating variants. An additional 10 to 15% of truncated genes may be rescued by incomplete penetrance or compensatory mutations, or because the truncating variants are of limited functional impact. The study of protein truncating variants delineates the essential genome and, more generally, identifies rare heterozygous variants as an unexplored source of diversity of phenotypic traits and diseases.

High Frequency Haplotypes are Expected Events, not Historical Figures

High Frequency Haplotypes are Expected Events, not Historical Figures

Elsa G Guillot, Murray P Cox
doi: http://dx.doi.org/10.1101/022160

Cultural transmission of reproductive success states that successful men have more children and pass this greater fecundity to their offspring. Balaresque and colleagues found high frequency haplotypes in a Central Asian Y chromosome dataset, which they attribute to cultural transmission of reproductive success by prominent historical men, including Genghis Khan. Using coalescent simulation, we show that these high frequency haplotypes are expected simply by chance. Hence, an explanation invoking cultural transmission of reproductive success is statistically unnecessary.

Low but significant genetic differentiation underlies biologically meaningful phenotypic divergence in a large Atlantic salmon population

Low but significant genetic differentiation underlies biologically meaningful phenotypic divergence in a large Atlantic salmon population

Tutku Aykanat, Susan E Johnston, Panu Orell, Eero Niemelä, Jaakko Erkinaro, Craig Primmer
doi: http://dx.doi.org/10.1101/022178

Despite decades of research assessing the genetic structure of natural populations, the biological meaning of low yet significant genetic divergence often remains unclear due to a lack of associated phenotypic and ecological information. At the same time, structured populations with low genetic divergence and overlapping boundaries can potentially provide excellent models to study the eco-evolutionary dynamics in cases where high resolution genetic markers and relevant phenotypic and life history information are available. Here, we combined SNP-based population inference with extensive phenotypic and life history data to identify potential biological mechanisms driving fine scale sub-population differentiation in Atlantic salmon (Salmo salar) from the Teno River, a major salmon river in Europe. Two sympatrically occurring sub-populations had low but significant genetic differentiation (FST = 0.018) and displayed marked differences in the distribution of life history strategies, including variation in juvenile growth rate, age at maturity and size within age classes. Large, late-maturing individuals were virtually absent from one of the two sub-populations and there were significant differences in juvenile growth rates and size-at-age after oceanic migration between individuals in the respective sub-populations. Our findings suggest that different eco-evolutionary processes affect each sub-population and that hybridization and subsequent selection may maintain low genetic differentiation without hindering adaptive divergence.

Worldwide patterns of human epigenetic variation

Worldwide patterns of human epigenetic variation

Oana Carja, Julia L MacIsaac, Sarah M Mah, Brenna M Henn, Michael S Kobor, Marcus W Feldman, Hunter B Fraser
doi: http://dx.doi.org/10.1101/021931

DNA methylation is an epigenetic modification, influenced by both genetic and environmental variation, that can affect transcription and many organismal phenotypes. Although patterns of DNA methylation have been shown to differ between human populations, it remains to be determined whether epigenetic diversity mirrors the patterns observed for DNA polymorphisms or gene expression levels. We measured DNA methylation at 480,000 sites in 34 individuals from five diverse human populations in the Human Genome Diversity Panel, and analyzed these together with single nucleotide polymorphisms (SNPs) and gene expression data. We found greater population-specificity of DNA methylation than of mRNA levels, which may be driven by the greater genetic control of methylation. This study provides insights into gene expression and its epigenetic regulation across populations and offers a deeper understanding of worldwide patterns of epigenetic diversity in humans.

Predicting genome sizes and restriction enzyme recognition-sequence probabilities across the eukaryotic tree of life

Predicting genome sizes and restriction enzyme recognition-sequence probabilities across the eukaryotic tree of life

Santiago Herrera, Paula H. Reyes-Herrera, Timothy M. Shank
doi: http://dx.doi.org/10.1101/007781

High-throughput sequencing of reduced representation libraries obtained through digestion with restriction enzymes ? generically known as restriction-site associated DNA sequencing (RAD-seq) ? is a common strategy to generate genome-wide genotypic and sequence data from eukaryotes. A critical design element of any RAD-seq study is a knowledge of the approximate number of genetic markers that can be obtained for a taxon using different restriction enzymes, as this number determines the scope of a project, and ultimately defines its success. This number can only be directly determined if a reference genome sequence is available, or it can be estimated if the genome size and restriction recognition sequence probabilities are known. However, both scenarios are uncommon for non-model species. Here, we performed systematic in silico surveys of recognition sequences, for diverse and commonly used type II restriction enzymes across the eukaryotic tree of life. Our observations reveal that recognition-sequence frequencies for a given restriction enzyme are strikingly variable among broad eukaryotic taxonomic groups, being largely determined by phylogenetic relatedness. We demonstrate that genome sizes can be predicted from cleavage frequency data obtained with restriction enzymes targeting ?neutral? elements. Models based on genomic compositions are also effective tools to accurately calculate probabilities of recognition sequences across taxa, and can be applied to species for which reduced-representation data is available (including transcriptomes and ?neutral? RAD-seq datasets). The analytical pipeline developed in this study, PredRAD (https://github.com/phrh/PredRAD), and the resulting databases constitute valuable resources that will help guide the design of any study using RAD-seq or related methods.