Explicit modeling of ancestry improves polygenic risk scores and BLUP prediction

Explicit modeling of ancestry improves polygenic risk scores and BLUP prediction

Chia-Yen Chen, Jiali Han, David J. Hunter, Peter Kraft, Alkes L. Price
doi: http://dx.doi.org/10.1101/012005

Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color, tanning ability and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRS) and Best Linear Unbiased Prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R2 for hair color increased by 66% (0.0456 to 0.0755; p<10-16), the R2 for tanning ability increased by 123% (0.0154 to 0.0344; p<10-16) and the liability-scale R2 for BCC increased by 68% (0.0138 to 0.0232; p<10-16) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being over-weighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be under-weighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.

A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians

A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians

Heejung Shim, Daniel I Chasman, Joshua D Smith, Samia Mora, Paul M Ridker, Deborah A Nickerson, Ronald M Krauss, Matthew Stephens
doi: http://dx.doi.org/10.1101/011270

We conducted a genome-wide association analysis of 7 subfractions of low density lipoproteins (LDLs) and 3 subfractions of intermediate density lipoproteins (IDLs) measured by gradient gel electrophoresis, and their response to statin treatment, in 1868 individuals of European ancestry from the Pharmacogenomics and Risk of Cardiovascular Disease study. Our analyses identified four previously-implicated loci (SORT1, APOE, LPA, and CETP) as containing variants that are very strongly associated with lipoprotein subfractions (log10 Bayes Factor > 15). Subsequent conditional analyses suggest that three of these (APOE, LPA and CETP) likely harbor multiple independently associated SNPs. Further, while different variants typically showed different characteristic patterns of association with combinations of subfractions, the two SNPs in CETP show strikingly similar patterns – both in our original data and in a replication cohort – consistent with a common underlying molecular mechanism. Notably, the CETP variants are very strongly associated with LDL subfractions, despite showing no association with total LDLs in our study, illustrating the potential value of the more detailed phenotypic measurements. In contrast with these strong subfraction associations, genetic association analysis of subfraction response to statins showed much weaker signals (none exceeding log10 Bayes Factor of 6). However, two SNPs (in APOE and LPA) previously-reported to be associated with LDL statin response do show some modest evidence for association in our data, and the subfraction response profiles at the LPA SNP are consistent with the LPA association, with response likely being due primarily to resistance of Lp(a) particles to statin therapy. An additional important feature of our analysis is that, unlike most previous analyses of multiple related phenotypes, we analyzed the subfractions jointly, rather than one at a time. Comparisons of our multivariate analyses with standard univariate analyses demonstrate that multivariate analyses can substantially increase power to detect associations. Software implementing our multivariate analysis methods is available at http://stephenslab.uchicago.edu/software.html.

Exploring the phenotypic space and the evolutionary history of a natural mutation in Drosophila melanogaster

Exploring the phenotypic space and the evolutionary history of a natural mutation in Drosophila melanogaster
Anna Ullastres, Natalia Petit, Josefa González
doi: http://dx.doi.org/10.1101/010918

A major challenge of modern Biology is elucidating the functional consequences of natural mutations. While we have a good understanding of the effects of lab-induced mutations on the molecular- and organismal-level phenotypes, the study of natural mutations has lagged behind. In this work, we explore the phenotypic space and the evolutionary history of a previously identified adaptive transposable element insertion. We first combined several tests that capture different signatures of selection to show that there is evidence of positive selection in the regions flanking FBti0019386 insertion. We then explored several phenotypes related to known phenotypic effects of nearby genes, and having plausible connections to fitness variation in nature. We found that flies with FBti0019386 insertion had a shorter developmental time and were more sensitive to stress, which are likely to be the adaptive effect and the cost of selection of this mutation, respectively. Interestingly, these phenotypic effects are not consistent with a role of FBti0019386 in temperate adaptation as has been previously suggested. Indeed, a global analysis of the population frequency of FBti0019386 showed that clinal frequency patterns are found in North America and Australia but not in Europe. Finally, we showed that FBti0019386 is associated with down-regulation of sra most likely because it induces the formation of heterochromatin by recruiting HP1a protein. Overall, our integrative approach allowed us to shed light on the evolutionary history, the relevant fitness effects and the likely molecular mechanisms of an adaptive mutation and highlights the complexity of natural genetic variants.

Genome-wide association study of carbon and nitrogen metabolism in the maize nested association mapping population

Genome-wide association study of carbon and nitrogen metabolism in the maize nested association mapping population
Nengyi Zhang, Yves Gibon, Nicholas Lepak, Pinghua Li, Lauren Dedow, Charles Chen, Yoon-Sup So, Jason Wallace, Karl Kremling, Peter Bradbury, Thomas Brutnell, Mark Stitt, Edward Buckler
doi: http://dx.doi.org/10.1101/010785

Carbon (C) and nitrogen (N) metabolism are critical to plant growth and development and at the basis of yield and adaptation. We have applied high throughput metabolite analyses to over 12,000 diverse field grown samples from the maize nested association mapping population. This allowed us to identify natural variation controlling the levels of twelve key C and N metabolites, often with single gene resolution. In addition to expected genes like invertases, critical natural variation was identified in key C4 metabolism genes like carbonic anhydrases and a malate transporter. Unlike prior maize studies, extensive pleiotropy was found for C and N metabolites. This integration of field-derived metabolite data with powerful mapping and genomics resources allows dissection of key metabolic pathways, providing avenues for future genetic improvement.

A 22,403 marker composite genetic linkage map for cassava (Manihot esculenta Crantz) derived from ten populations

A 22,403 marker composite genetic linkage map for cassava (Manihot esculenta Crantz) derived from ten populations

International Cassava Genetic Map Consortium
doi: http://dx.doi.org/10.1101/010637

Cassava (Manihot esculenta Crantz) is a major staple crop in Africa, Asia, and South America, and its starchy roots provide nourishment for 800 million people worldwide. Although native to South America, cassava was brought to Africa approximately 400 years ago and is now widely cultivated across sub-Saharan Africa. The widespread use of clonal planting material, however, aids the spread of disease. Breeding for disease resistance and improved yield began in the 1920s and has accelerated in the last 45 years. To assist in the rapid identification of markers for pathogen resistance and crop traits, and to accelerate breeding programs, we generated a framework map for M. esculenta Crantz derived from reduced representation sequencing (genotyping-by- sequencing [GBS]). The composite 2,412 cM map integrates ten biparental maps (comprising 3,480 meioses) and organizes 22,403 genetic markers on 18 chromosomes, in agreement with the observed karyotype. The map anchors 71.9% of the draft genome assembly and 90.7% of the predicted protein-coding genes. The resulting chromosome-anchored genome sequence provides an essential framework for identification of trait markers and causal genes as well as genomics-enhanced breeding of this important crop.

DNA methylation variation in Arabidopsis has a genetic basis and shows evidence of local adaptation

DNA methylation variation in Arabidopsis has a genetic basis and shows evidence of local adaptation

Manu J. Dubin, Pei Zhang, Dazhe Meng, Marie-Stanislas Remigereau, Edward J. Osborne, Francesco Paolo Casale, Phillip Drewe, André Kahles, Bjarni Vilhjálmsson, Joanna Jagoda, Selen Irez, Viktor Voronin, Qiang Song, Quan Long, Gunnar Rätsch, Oliver Stegle, Richard M. Clark, Magnus Nordborg
(Submitted on 21 Oct 2014)

Epigenome modulation in response to the environment potentially provides a mechanism for organisms to adapt, both within and between generations. However, neither the extent to which this occurs, nor the molecular mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects on DNA methylation were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association mapping revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) on the coding region of genes was not affected by growth temperature, but was instead strongly correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was correlated with elevated transcription levels for the genes affected. Genome-wide association mapping revealed that this effect was largely due to trans-acting loci, a significant fraction of which showed evidence of local adaptation. These findings constitute the first direct link between DNA methylation and adaptation to the environment, and provide a basis for further dissecting how environmentally driven and genetically determined epigenetic variation interact and influence organismal fitness.

Transcriptome Sequencing Reveals Widespread Gene-Gene and Gene-Environment Interactions

Transcriptome Sequencing Reveals Widespread Gene-Gene and Gene-Environment Interactions
Alfonso Buil, Andrew A Brown, Tuuli Lappalainen, Ana Viñuela, Matthew N Davies, Houfeng F Zheng, Brent J Richards, Daniel Glass, Kerrin S Small, Richard Durbin, Timothy D Spector, Emmanouil T Dermitzakis
doi: http://dx.doi.org/10.1101/010546

Understanding the genetic architecture of gene expression is an intermediate step to understand the genetic architecture of complex diseases. RNA-seq technologies have improved the quantification of gene expression and allow to measure allelic specific expression (ASE)1-3. ASE is hypothesized to result from the direct effect of cis regulatory variants, but a proper estimation of the causes of ASE has not been performed to date. In this study we take advantage of a sample of twins to measure the relative contribution of genetic and environmental effects on ASE and we found substantial effects of gene x gene (GxG) and gene x environment (GxE) interactions. We propose a model where ASE requires genetic variability in cis, a difference in the sequence of both alleles, but the magnitude of the ASE effect depends on trans genetic and environmental factors that interact with the cis genetic variants. We uncover large GxG and GxE effects on gene expression and likely complex phenotypes that currently remain elusive.

Association Mapping across Numerous Traits Reveals Patterns of Functional Variation in Maize

Association Mapping across Numerous Traits Reveals Patterns of Functional Variation in Maize

Jason G Wallace, Peter Bradbury, Nengyi Zhang, Yves Gibon, Mark Stitt, Edward Buckler
doi: http://dx.doi.org/10.1101/010207
AbstractInfo/HistoryMetricsData Supplements Preview PDF
ABSTRACT

Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions. Despite the vast amount of genomic data becoming available, many pressing questions remain about the nature of genetic mutations that underlie functional variation. We present the results of combining genome-wide association analysis of 41 different phenotypes in ~5,000 inbred maize lines to analyze patterns of high-resolution genetic association among of 28.9 million single-nucleotide polymorphisms (SNPs) and ~800,000 copy-number variants (CNVs). We show that genic and intergenic regions have opposite patterns of enrichment, minor allele frequencies, and effect sizes, implying tradeoffs among the probability that a given polymorphism will have an effect, the detectable size of that effect, and its frequency in the population. We also find that genes tagged by GWAS are enriched for regulatory functions and are ~50% more likely to have a paralog than expected by chance, indicating that gene regulation and neofunctionalization are strong drivers of phenotypic variation. These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.

Molecular phenotypes that are causal to complex traits can have low heritability and are expected to have small influence.

Molecular phenotypes that are causal to complex traits can have low heritability and are expected to have small influence.

Leopold Parts
doi: http://dx.doi.org/10.1101/009506

Work on genetic makeup of complex traits has led to some unexpected findings. Molecular trait heritability estimates have consistently been lower than those of common diseases, even though it is intuitively expected that the genotype signal weakens as it becomes more dissociated from DNA. Further, results from very large studies have not been sufficient to explain most of the heritable signal, and suggest hundreds if not thousands of responsible alleles. Here, I demonstrate how trait heritability depends crucially on the definition of the phenotype, and is influenced by the variability of the assay, measurement strategy, and the quantification approach used. For a phenotype downstream of many molecular traits, it is possible that its heritability is larger than for any of its upstream determinants. I also rearticulate via models and data that if a phenotype has many dependencies, a large number of small effect alleles are expected. However, even if these alleles do drive highly heritable causal intermediates that can be modulated, it does not imply that large changes in phenotype can be obtained.

Accounting for eXentricities: Analysis of the X chromosome in GWAS reveals X-linked genes implicated in autoimmune diseases

Accounting for eXentricities: Analysis of the X chromosome in GWAS reveals X-linked genes implicated in autoimmune diseases

Diana Chang, Feng Gao, Li Ma, Aaron Sams, Andrea Slavney, Yedael Waldman, Paul Billing-Ross, Aviv Madar, Richard Spritz, Alon Keinan
doi: http://dx.doi.org/10.1101/009464

Many complex human diseases are highly sexually dimorphic, which suggests a potential contribution of the X chromosome. However, the X chromosome has been neglected in most genome-wide association studies (GWAS). We present tailored analytical methods and software that facilitate X-wide association studies (XWAS), which we further applied to reanalyze data from 16 GWAS of different autoimmune diseases (AID). We associated several X-linked genes with disease risk, among which ARHGEF6 is associated with Crohn’s disease and replicated in a study of ulcerative colitis, another inflammatory bowel disease (IBD). Indeed, ARHGEF6 interacts with a gastric bacterium that has been implicated in IBD. Additionally, we found that the centromere protein CENPI is associated with three different AID; replicated a previously investigated association of FOXP3, which regulates genes involved in T-cell function, in vitiligo; and discovered that C1GALT1C1 exhibits sex-specific effect on disease risk in both IBDs. These and other X-linked genes that we associated with AID tend to be highly expressed in tissues related to immune response, display differential gene expression between males and females, and participate in major immune pathways. Combined, the results demonstrate the importance of the X chromosome in autoimmunity, reveal the potential of XWAS, even based on existing data, and provide the tools and incentive to appropriately include the X chromosome in future studies.