Exploring the genetic patterns of complex diseases via the integrative genome-wide approach

Exploring the genetic patterns of complex diseases via the integrative genome-wide approach

Ben Teng, Can Yang, Jiming Liu, Zhipeng Cai, Xiang Wan
(Submitted on 26 Jan 2015)

Motivation: Genome-wide association studies (GWASs), which assay more than a million single nucleotide polymorphisms (SNPs) in thousands of individuals, have been widely used to identify genetic risk variants for complex diseases. However, most of the variants that have been identified contribute relatively small increments of risk and only explain a small portion of the genetic variation in complex diseases. This is the so-called missing heritability problem. Evidence has indicated that many complex diseases are genetically related, meaning these diseases share common genetic risk variants. Therefore, exploring the genetic correlations across multiple related studies could be a promising strategy for removing spurious associations and identifying underlying genetic risk variants, and thereby uncovering the mystery of missing heritability in complex diseases. Results: We present a general and robust method to identify genetic patterns from multiple large-scale genomic datasets. We treat the summary statistics as a matrix and demonstrate that genetic patterns will form a low-rank matrix plus a sparse component. Hence, we formulate the problem as a matrix recovering problem, where we aim to discover risk variants shared by multiple diseases/traits and those for each individual disease/trait. We propose a convex formulation for matrix recovery and an efficient algorithm to solve the problem. We demonstrate the advantages of our method using both synthesized datasets and real datasets. The experimental results show that our method can successfully reconstruct both the shared and the individual genetic patterns from summary statistics and achieve better performance compared with alternative methods under a wide range of scenarios.

Genetics of intra-species variation in avoidance behavior induced by a thermal stimulus in C. elegans

Genetics of intra-species variation in avoidance behavior induced by a thermal stimulus in C. elegans
RAJARSHI GHOSH, JOSHUA S BLOOM, Aylia Mohammadi, MOLLY E SCHUMER, PETER ANDOLFATTO, WILLIAM S RYU, LEONID KRUGLYAK
doi: http://dx.doi.org/10.1101/014290

Individuals within a species vary in their responses to a wide range of stimuli, partly as a result of differences in their genetic makeup. Relatively little is known about the genetic and neuronal mechanisms contributing to diversity of behavior in natural populations. By studying animal-to-animal variation in innate avoidance behavior to thermal stimuli in the nematode Caenorhabditis elegans, we uncovered genetic principles of how different components of a behavioral response can be altered in nature to generate behavioral diversity. Using a thermal pulse assay, we uncovered heritable variation in responses to a transient temperature increase. Quantitative trait locus mapping revealed that separate components of this response were controlled by distinct genomic loci. The loci we identified contributed to variation in components of thermal pulse avoidance behavior in an additive fashion. Our results show that the escape behavior induced by thermal stimuli is composed of simpler behavioral components that are influenced by at least six distinct genetic loci. The loci that decouple components of the escape behavior reveal a genetic system that allows independent modification of behavioral parameters. Our work sets the foundation for future studies of evolution of innate behaviors at the molecular and neuronal level.

Partitioning heritability by functional category using GWAS summary statistics

Partitioning heritability by functional category using GWAS summary statistics
Hilary Kiyo Finucane, Brendan Bulik-Sullivan, Alexander Gusev, Gosia Trynka, Yakir Reshef, Po-Ru Loh, Verneri Anttilla, Han Xu, Chongzhi Zang, Kyle Farh, Stephan Ripke, Felix Day, ReproGen Consortium, Schizophrenia Working Group of the Psychiatric Genetics Consortium, RACI Consortium, Shaun Purcell, Eli Stahl, Sara Lindstrom, John R.B. Perry, Yukinori Okada, Soumya Raychaudhuri, Mark Daly, Nick Patterson, Benjamin M. Neale, Alkes L. Price
doi: http://dx.doi.org/10.1101/014241

Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here, we analyze a broad set of functional elements, including cell-type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits spanning a total of 1.3 million phenotype measurements. To enable this analysis, we introduce a new method for partitioning heritability from GWAS summary statistics while controlling for linked markers. This new method is computationally tractable at very large sample sizes, and leverages genome-wide information. Our results include a large enrichment of heritability in conserved regions across many traits; a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers; and many cell-type-specific enrichments including significant enrichment of central nervous system cell types in body mass index, age at menarche, educational attainment, and smoking behavior. These results demonstrate that GWAS can aid in understanding the biological basis of disease and provide direction for functional follow-up.

Ancestry specific association mapping in admixed populations

Ancestry specific association mapping in admixed populations

Line Skotte, Thorfinn Sand S Korneliussen, Ida Moltke, Anders Albrechtsen
doi: http://dx.doi.org/10.1101/014001

As recently demonstrated in several genetic association studies, historically small and isolated populations can offer increased statistical power due to extended link- age equilibrium and increased genetic drift over many generations. However, many such populations, like the Greenlandic Inuit population, have recently experienced substantial admixture with other populations, which can complicate the association studies. One important complication is that most current methods for performing association testing are based on the assumption that the effect of the tested ge- netic marker is the same regardless of ancestry. This is a reasonable assumption for a causal variant, but may not hold for the genetic markers that are tested in association studies, which are usually not causal. The effects of non-causal genetic markers depend on how strongly their presence correlate with the presence of the causal marker, and this may vary between ancestral populations because of different linkage disequilibrium patterns and allele frequencies. Motivated by this, we here introduce a new statistical method for association testing in recently admixed populations, where the effect sizes are allowed to depend on the ancestry of the allele.Our method does not rely on accurate inference of local ancestry, yet using simulations we show that in some scenarios it gives a dramatic increase in statistical power to detect associations. In addition, the method allows for testing for difference in effect size between ancestral populations, which can be used to determine if a SNP is causal. We demonstrate the usefulness of the method on data from the Greenlandic population.

Integrating crop growth models with whole genome prediction through approximate Bayesian computation

Integrating crop growth models with whole genome prediction through approximate Bayesian computation

Frank Technow, Carlos D. Messina, L. Radu Totir, Mark Cooper
doi: http://dx.doi.org/10.1101/014100

Genomic selection, enabled by whole genome prediction (WGP) methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (GxE), continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs) attempt to represent the functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of GxE and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC), a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with a simulated data set. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising novel approach to improving prediction accuracy for some of the most challenging scenarios of interest to applied geneticists.

Mutation detection in candidate genes for parauberculosis resistance in sheep

Mutation detection in candidate genes for parauberculosis resistance in sheep

Bianca Moioli, Luigi De Grossi, Roberto Steri, Silvia D’Andrea, Fabio Pilla
doi: http://dx.doi.org/10.1101/014035

The marker-assisted selection exploits anonymous genetic markers that have been associated with measurable differences on complex traits; because it is based on the Linkage Disequilibrium between the polymorphic markers and the polymorphisms which code for the trait, its success is limited to the population in which the association has been assessed. The identification of the gene with effect on the target and the detection of the functional mutations will allow selection in independent populations, while encouraging studies on gene expression. The results of a genome-wide scan performed with the Illumina Ovine SNP50K Beadchip, on 100 sheep, 50 of which positive at paratuberculosis serological assessment, identified two candidate genes of immunity response, the PCP4 and the CD109, located in proximity of the markers with different allele frequency in positive and negative sheep. The coding region of the two genes was directly sequenced: three missense mutations were detected: two in the PCP4 gene and one in the second exon of the CD109 gene. The PCP4 mutations had a very low frequency (.12 and .07) so making hazardous to hypothesize their direct effect on immune response. On the contrary, the mutation detected in the CD109 gene showed a strong linkage disequilibrium with the anonymous marker. Direct sequencing of the DNA of sheep of different populations showed that disequilibrium was maintained. Allele frequency at the hypothesized marker associated to immune response, calculated for other breeds of sheep, showed that the marker allele potentially associated to disease resistance is more frequent in the local breeds and in breeds that have not been submitted to selection programs.

The genetics of resistance to Morinda fruit toxin during the postembryonic stages in Drosophila sechellia

The genetics of resistance to Morinda fruit toxin during the postembryonic stages in Drosophila sechellia

Yan Huang, Deniz Erezyilmaz
doi: http://dx.doi.org/10.1101/014027

Many phytophagous insect species are ecologic specialists that have adapted to utilize a single host plant. Drosophila sechellia is a specialist that utilizes the ripe fruit of Morinda citrifolia, which is toxic to its sibling species, D. simulans. Here we apply multiplexed shotgun genotyping and QTL analysis to examine the genetic basis of resistance to M. citrifolia fruit toxin in interspecific hybrids. We find that at least four dominant and four recessive loci interact additively to confer resistance to the M. citrifolia fruit toxin. These QTL include a dominant locus of large effect on the third chromosome (QTL-IIIsima) that was not detected in previous analyses. The small-effect loci that we identify overlap with regions that were identified in selection experiments with D. simulans on octanoic acid and in QTL analyses of adult resistance to octanoic acid. Our high-resolution analysis sheds new light upon the complexity of M. citrifolia resistance, and suggests that partial resistance to lower levels of M. citrifolia toxin could be passed through introgression from D. sechellia to D. simulans in nature. The identification of a locus of major effect, QTL-IIIsima, is an important step towards identifying the molecular basis of host plant specialization by D. sechellia.

MultiMeta: an R package for meta-analysing multi-phenotype genome-wide association studies

MultiMeta: an R package for meta-analysing multi-phenotype genome-wide association studies
Dragana Vuckovic, Paolo Gasparini, Nicole Soranzo, Valentina Iotchkova
doi: http://dx.doi.org/10.1101/013920

Summary: As new methods for multivariate analysis of Genome Wide Association Studies (GWAS) become available, it is important to be able to combine results from different cohorts in a meta-analysis. The R package MultiMeta provides an implementation of the inverse-variance based method for meta-analysis, generalized to an n-dimensional setting. Availability: The R package MultiMeta can be downloaded from CRAN Contact: dragana.vuckovic@burlo.trieste.it

Genome-engineering with CRISPR-Cas9 in the mosquito Aedes aegypti

Genome-engineering with CRISPR-Cas9 in the mosquito Aedes aegypti

Kathryn E Kistler, Leslie B Vosshall, Benjamin J Matthews
doi: http://dx.doi.org/10.1101/013276

The mosquito Aedes aegypti is a potent vector of the Chikungunya, yellow fever, and Dengue viruses, which result in hundreds of millions of infections and over 50,000 human deaths per year. Loss-of-function mutagenesis in Ae. aegypti has been established with TALENs, ZFNs, and homing endonucleases, which require the engineering of DNA-binding protein domains to generate target specificity for a particular stretch of genomic DNA. Here, we describe the first use of the CRISPR-Cas9 system to generate targeted, site-specific mutations in Ae. aegypti. CRISPR-Cas9 relies on RNA-DNA base-pairing to generate targeting specificity, resulting in cheaper, faster, and more flexible genome-editing reagents. We investigate the efficiency of reagent concentrations and compositions, demonstrate the ability of CRISPR-Cas9 to generate several different types of mutations via disparate repair mechanisms, and show that stable germ-line mutations can be readily generated at the vast majority of genomic loci tested. This work offers a detailed exploration into the optimal use of CRISPR-Cas9 in Ae. aegypti that should be applicable to non-model organisms previously out of reach of genetic modification.

Testing for genetic associations in arbitrarily structured populations

Testing for genetic associations in arbitrarily structured populations
Minsun Song, Wei Hao, John D. Storey
doi: http://dx.doi.org/10.1101/012682

We present a new statistical test of association between a trait (either quantitative or binary) and genetic markers, which we theoretically and practically prove to be robust to arbitrarily complex population structure. The statistical test involves a set of parameters that can be directly estimated from large-scale genotyping data, such as that measured in genome-wide associations studies (GWAS). We also derive a new set of methodologies, called a genotype-conditional association test (GCAT), shown to provide accurate association tests in populations with complex structures, manifested in both the genetic and environmental contributions to the trait. We demonstrate the proposed method on a large simulation study and on the Northern Finland Birth Cohort study. In the Finland study, we identify several new significant loci that other methods do not detect. Our proposed framework provides a substantially different approach to the problem from existing methods. We provide some discussion on its similarities and differences with the linear mixed model and principal component approaches.