A Robust Model-free Approach for Rare Variants Association Studies Incorporating Gene-Gene and Gene-Environmental Interactions
Ruixue Fan, Shaw-Hwa Lo
(Submitted on 2 Dec 2013)
Recently more and more evidence suggests that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G-G) and gene-environmental (G-E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G-G or G-E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G-G and G-E interactions.
Natural Allelic Variations of Xenobiotic Enzymes Pleiotropically Affect Sexual Dimorphism in Oryzias latipes
Takafumi Katsumura, Shoji Oda, Shigeki Nakagome, Tsunehiko Hanihara, Hiroshi Kataoka, Hiroshi Mitani, Shoji Kawamura, Hiroki Oota
Sexual dimorphisms, which are phenotypic differences between males and females, are driven by sexual selection [1, 2]. Interestingly, sexually selected traits show geographic variations within species despite strong directional selective pressures [3, 4]. However, genetic factors that regulate varied sexual differences remain unknown. In this study, we show that polymorphisms in cytochrome P450 (CYP) 1B1, which encodes a xenobiotic-metabolising enzyme, are associated with local differences of sexual dimorphisms in the anal fin morphology of medaka fish (Oryzias latipes). High and low activity CYP1B1 alleles increased and decreased differences in anal fin sizes, respectively. Behavioural and phylogenetic analyses suggest maintenance of the high activity allele by sexual selection, whereas the low activity allele may have evolved by positive selection due to by-product effects of CYP1B1. The present data can elucidate evolutionary mechanisms behind genetic variations in sexual dimorphism and indicate pleiotropic effects of xenobiotic enzymes.
Joint analysis of functional genomic data and genome-wide association studies of 18 human traits
Annotations of gene structures and regulatory elements can inform genome-wide association studies (GWAS). However, choosing the relevant annotations for interpreting an association study of a given trait remains challenging. We describe a statistical model that uses association statistics computed across the genome to identify classes of genomic element that are enriched or depleted for loci that influence a trait. The model naturally incorporates multiple types of annotations. We applied the model to GWAS of 18 human traits, including red blood cell traits, platelet traits, glucose levels, lipid levels, height, BMI, and Crohn’s disease. For each trait, we evaluated the relevance of 450 different genomic annotations, including protein-coding genes, enhancers, and DNase-I hypersensitive sites in over a hundred tissues and cell lines. We show that the fraction of phenotype-associated SNPs that influence protein sequence ranges from around 2% (for platelet volume) up to around 20% (for LDL cholesterol); that repressed chromatin is significantly depleted for SNPs associated with several traits; and that cell type-specific DNase-I hypersensitive sites are enriched for SNPs associated with several traits (for example, fibroblasts in Crohn’s disease and muscle tissue in bone density). Finally, by re-weighting each GWAS using information from functional genomics, we increase the number of loci with high-confidence associations by around 5%.
Data Mining of Online Genealogy Datasets for Revealing Lifespan Patterns in Human Population
Michael Fire, Yuval Elovici
(Submitted on 18 Nov 2013)
Online genealogy datasets contain extensive information about millions of people and their past and present family connections. This vast amount of data can assist in identifying various patterns in human population. In this study, we present methods and algorithms which can assist in identifying variations in lifespan distributions of human population in the past centuries, in detecting social and genetic features which correlate with human lifespan, and in constructing predictive models of human lifespan based on various features which can easily be extracted from genealogy datasets.
We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 8.8 million connections, all of which were collected from the WikiTree website. Our findings indicate that significant but small positive correlations exist between the parents’ lifespan and their children’s lifespan. Additionally, we found slightly higher and significant correlations between the lifespans of spouses. We also discovered a very small positive and significant correlation between longevity and reproductive success in males, and a small and significant negative correlation between longevity and reproductive success in females. Moreover, our machine learning algorithms presented better than random classification results in predicting which people who outlive the age of 50 will also outlive the age of 80.
We believe that this study will be the first of many studies which utilize the wealth of data on human populations, existing in online genealogy datasets, to better understand factors which influence human lifespan. Understanding these factors can assist scientists in providing solutions for successful aging.
The evolution of sex differences in disease genetics
William P Gilks, Jessica K Abbott, Edward H Morrow
There are significant differences in the biology of males and females, ranging from biochemical pathways to behavioural responses, which are relevant to modern medicine. Broad-sense heritability estimates differ between the sexes for many common medical disorders, indicating that genetic architecture can be sex-dependent. Recent genome-wide association studies (GWAS) have successfully identified sex-specific and sex-biased effects, where in addition to sex-specific effects on gene expression, twenty-two medical traits have sex-specific or sex-biased loci. Sex-specific genetic architecture of complex traits is also extensively documented in model organisms using genome-wide linkage or association mapping, and in gene disruption studies. The evolutionary origins of sex-specific genetic architecture and sexual dimorphism lie in the fact that males and females share most of their genetic variation yet experience different selection pressures. At the extreme is sexual antagonism, where selection on an allele acts in opposite directions between the sexes. Sexual antagonism has been repeatedly identified via a number of experimental methods in a range of different taxa. Although the molecular basis remains to be identified, mathematical models predict the maintenance of deleterious variants that experience selection in a sex-dependent manner. There are multiple mechanisms by which sexual antagonism and alleles under sex-differential selection could contribute toward the genetics of common, complex disorders. The evidence we review clearly indicates that further research into sex-dependent selection and the sex-specific genetic architecture of diseases would be rewarding. This would be aided by studies of laboratory and wild animal populations, and by modelling sex-specific effects in genome-wide association data with joint, gene-by-sex interaction tests. We predict that even sexually monomorphic diseases may harbour cryptic sex-specific genetic architecture. Furthermore, empirical evidence suggests that investigating sex-dependent epistasis may be especially rewarding. Finally, the prevalent nature of sex-specific genetic architecture in disease offers scope for the development of more effective, sex-specific therapies.
Mutant epigenetic machinery mediates climate adaptation in Arabidopsis thaliana
Xia Shen, Simon Forsberg, Mats Pettersson, Zheya Sheng, Orjan Carlborg
(Submitted on 16 Oct 2013)
The genetic basis of adaptation to climate is largely unknown. We explored the genetic regulation of climate plasticity and its contribution to adaptation using publicly available data from two collections of natural Arabidopsis thaliana accessions from a wide range of habitats. Sixteen loci with plastic alleles were mapped and many of these contained candidate genes with amino acid changes. The Chromomethylase 2 (CMT2) genotype influenced adaptation to seasonal temperature variability and accessions carrying a mutant CMT2 allele disrupting the genome-wide CHH-methylation pattern displayed a more plastic response to climate. We conclude that genetic regulation of plasticity appears to be important for climate adaptation and that genetic variation in the epigenetic machinery, leading to altered genome-wide epigenetic modifications, is one of the underlying molecular mechanisms.
forqs: Forward-in-time Simulation of Recombination, Quantitative Traits, and Selection
Darren Kessner, John Novembre
(Submitted on 11 Oct 2013)
forqs is a forward-in-time simulation of recombination, quantitative traits, and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number of generations due to recombination and/or selection on polygenic quantitative traits. forqs is implemented as a command- line C++ program. Source code and binary executables for Linux, OSX, and Windows are freely available under a permissive BSD license.
Application of compressed sensing to genome wide association studies and genomic selection
Shashaank Vattikuti, James J. Lee, Stephen D. H. Hsu, Carson C. Chow
(Submitted on 8 Oct 2013)
We show that the signal-processing paradigm known as compressed sensing (CS) is applicable to genome-wide association studies (GWAS) and genomic selection (GS). The aim of GWAS is to isolate trait-associated loci, whereas GS attempts to predict the phenotypic values of new individuals on the basis of training data. CS addresses a problem common to both endeavors, namely that the number of genotyped markers often greatly exceeds the sample size. We show using CS methods and theory that all loci of nonzero effect can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability h2 = 1, there is a sharp phase transition to complete selection as the sample size is increased. For heritability values less than one, complete selection can still occur although the transition is smoothed. The transition boundary is only weakly dependent on the total number of genotyped markers. The crossing of a transition boundary provides an objective means to determine when true effects are being recovered. For h2 = 0.5, we find that a sample size that is thirty times the number of nonzero loci is sufficient for good recovery.
Integrating diverse datasets improves developmental enhancer prediction
Genevieve D. Erwin, Rebecca M. Truty, Dennis Kostka, Katherine S. Pollard, John A. Capra
(Submitted on 27 Sep 2013)
Gene-regulatory enhancers have been identified by many lines of evidence, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a novel method for predicting developmental enhancers and their tissue specificity. EnhancerFinder uses a two-step multiple-kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and thousands of diverse functional genomics datasets from a variety of cell types and developmental stages. We trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser, in contrast to histone mark or sequence-based enhancer definitions commonly used. We comprehensively evaluated EnhancerFinder, and found that our integrative approach improves enhancer prediction accuracy over previous approaches that consider a single type of data. Our evaluation highlights the importance of considering information from many tissues when predicting specific types of enhancers. We find that VISTA enhancers active in embryonic heart are easier to predict than enhancers active in several other tissues due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and hits from genome-wide association studies. We demonstrate the utility of our enhancer predictions by identifying and validating a novel cranial nerve enhancer in the ZEB2 locus. Our genome-wide developmental enhancer predictions will be freely available as a UCSC Genome Browser track.
The effect of paternal age on offspring intelligence and personality when controlling for paternal trait level
Ruben C. Arslan, Lars Penke, Wendy Johnson, William G. Iacono, Matt McGue
(Submitted on 18 Sep 2013)
Paternal age at conception has been found to predict the number of new genetic mutations. We examined the effect of father’s age at birth on offspring intelligence, head circumference and personality traits. Using the Minnesota Twin Family Study sample we tested paternal age effects while controlling for parents’ trait levels measured with the same precision as offspring’s. From evolutionary genetic considerations we predicted a negative effect of paternal age on offspring intelligence, but not on other traits. Controlling for parental IQ had the effect of turning a positive-zero order association negative. We found paternal age effects on offspring IQ and MPQ Absorption, but they were not robustly significant, nor replicable with additional covariates. No other noteworthy effects were found. Parents’ intelligence and personality correlated with their ages at twin birth, which may have obscured a small negative effect of advanced paternal age (< 1% of variance explained) on intelligence. We discuss future avenues for studies of paternal age effects and suggest that stronger research designs are needed to rule out confounding factors involving birth order and the Flynn effect.