Accounting for eXentricities: Analysis of the X chromosome in GWAS reveals X-linked genes implicated in autoimmune diseases
Diana Chang, Feng Gao, Li Ma, Aaron Sams, Andrea Slavney, Yedael Waldman, Paul Billing-Ross, Aviv Madar, Richard Spritz, Alon Keinan
Many complex human diseases are highly sexually dimorphic, which suggests a potential contribution of the X chromosome. However, the X chromosome has been neglected in most genome-wide association studies (GWAS). We present tailored analytical methods and software that facilitate X-wide association studies (XWAS), which we further applied to reanalyze data from 16 GWAS of different autoimmune diseases (AID). We associated several X-linked genes with disease risk, among which ARHGEF6 is associated with Crohn’s disease and replicated in a study of ulcerative colitis, another inflammatory bowel disease (IBD). Indeed, ARHGEF6 interacts with a gastric bacterium that has been implicated in IBD. Additionally, we found that the centromere protein CENPI is associated with three different AID; replicated a previously investigated association of FOXP3, which regulates genes involved in T-cell function, in vitiligo; and discovered that C1GALT1C1 exhibits sex-specific effect on disease risk in both IBDs. These and other X-linked genes that we associated with AID tend to be highly expressed in tissues related to immune response, display differential gene expression between males and females, and participate in major immune pathways. Combined, the results demonstrate the importance of the X chromosome in autoimmunity, reveal the potential of XWAS, even based on existing data, and provide the tools and incentive to appropriately include the X chromosome in future studies.
Accounting for experimental noise reveals that transcription dominates control of steady-state protein levels in yeast
Gábor Csárdi, Alexander Franks, David S. Choi, Eduardo M. Airoldi, D. Allan Drummond
Cells respond to their environment by modulating protein levels through mRNA transcription and post-transcriptional control. Modest correlations between global steady-state mRNA and protein measurements have been interpreted as evidence that transcript levels determine roughly 40% of the variation in protein levels, indicating dominant post-transcriptional effects. However, the techniques underlying these conclusions, such as correlation and regression, yield biased results when data are noisy, missing systematically, and collinear—properties of mRNA and protein measurements—which motivated us to revisit this subject. Noise-robust analyses of 25 studies of budding yeast reveal that mRNA levels explain roughly 80% of the variation in steady-state protein levels. Post-transcriptional regulation amplifies rather than competes with the transcriptional signal. Measurements are highly reproducible within but not between studies, and are distorted in part by between-study differences in gene expression. These results substantially revise current models of protein-level regulation and introduce multiple noise-aware approaches essential for proper analysis of many biological phenomena.
The genetic architecture of neurodevelopmental disorders
Kevin J Mitchell
Neurodevelopmental disorders include rare conditions caused by identified single mutations, such as Fragile X, Down and Angelman syndromes, and much more common clinical categories such as autism, epilepsy and schizophrenia. These common conditions are all highly heritable but their genetics is considered to be “complex”. In fact, this sharp dichotomy in genetic architecture between rare and common disorders may be largely artificial. On the one hand, much of the apparent complexity in the genetics of common disorders may derive from underlying genetic heterogeneity, which has remained obscure until recently. On the other hand, even for supposedly Mendelian conditions, the relationship between single mutations and clinical phenotypes is rarely simple. The categories of monogenic and complex disorders may therefore merge across a continuum, with some mutations being strongly associated with specific syndromes and others having a more variable outcome, modified by the presence of additional genetic variants.
MUSiCC: Towards an accurate estimation of average genomic copy-numbers in the human microbiome
Ohad Manor, Elhanan Borenstein
Functional metagenomic analyses commonly involve a normalization step, where measured levels of genes or pathways are converted into relative abundances. Here, we demonstrate that this normalization scheme introduces marked biases both across and within human microbiome samples and systematically identify various sample- and gene-specific properties that contribute to these biases. We introduce an alternative normalization paradigm, MUSiCC, which combines universal single-copy genes with machine learning methods to correct these biases and to obtain a more accurate and biologically meaningful measure of gene abundances. Finally, we demonstrate that MUSiCC significantly improves downstream discovery of functional shifts in the microbiome. MUSiCC is available at http://elbo.gs.washington.edu/software.html.
Disentangling effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex trait loci
Gosia Trynka, Harm-Jan Westra, Kamil Slowikowski, Xinli Hu, Han Xu, Barbara E Stranger, Buhm Han, Soumya Raychaudhuri
Identifying genomic annotations that differentiate causal from associated variants is critical to fine-map disease loci. While many studies have identified non-coding annotations overlapping disease variants, these annotations colocalize, complicating fine-mapping efforts. We demonstrate that conventional enrichment tests are inflated and cannot distinguish causal effects from colocalizing annotations. We developed a sensitive and specific statistical approach that is able to identify independent effects from colocalizing annotations. We first confirm that gene regulatory variants map to DNase-I hypersensitive sites (DHS) near transcription start sites. We then show that (1) 15-35% of causal variants within disease loci map to DHS independent of other annotations; (2) breast cancer and rheumatoid arthritis loci harbor potentially causal variants near the summits of histone marks rather than full peak bodies; and (3) variants associated with height are highly enriched for embryonic stem cell DHS sites. We highlight specific loci where we can most effectively prioritize causal variation.
Origins and impacts of new exons
Jason Merkin*, Ping Chen*, Sampsa Hautaniemi, Christopher Burge
Mammalian genes are typically broken into several protein-coding and non-coding exons, but the evolutionary origins and functions of new exons are not well understood. Here, we analyzed patterns of exon gain using deep cDNA sequencing data from several mammals and one bird, identifying thousands of species- and lineage-specific exons. While exons conserved across mammals are mostly protein-coding and constitutively spliced, species-specific exons were mostly located in 5′ untranslated regions and alternatively spliced. New exons most often derived from unique intronic sequence rather than repetitive elements, and were associated with upstream intronic deletions, increased nucleosome occupancy and RNA polymerase II pausing. Surprisingly, exon gain was associated with increased gene expression, but only in tissues where the exon was included, suggesting that splicing enhances steady-state mRNA levels and that changes in splicing represent a major contributor to the evolution of gene expression.
Different tastes for different individuals
Individual taste differences were first reported in the first half of the 20th century, but the primary reasons for these differences have remained uncertain. Much of the taste variation among different mammalian species can be explained by pseudogenization of taste receptors. In this study, by analyzing 14 ethnically diverse populations, we investigated whether the most recent disruptions of taste receptor genes segregate with their intact forms. Our results revealed an unprecedented prevalence of segregating loss-of-function (LoF) taste receptor variants, identifying one of the most pronounced cases of functional population diversity in the human genome. LoF variant frequency was considerably higher than the overall mutation rate, and many humans harbored varying numbers of critical mutations. In particular, molecular evolutionary rates of sour and bitter receptors were far higher in humans than those of sweet, salty, and umami receptors compared with other carnivorous mammals although not all of the taste receptors genes were identified. Many LoF variants are population-specific, some of which arose even after the population differentiation, but not before divergence of the modern and archaic (Neanderthal and Denisovan) human. Based on these findings, we conclude that modern humans might have been losing their taste receptor genes because of high-frequency LoF taste receptor variants. Finally I actually demonstrated the genetic testing of taste receptors from personal exome sequence.
The genetic ancestry of African, Latino, and European Americans across the United States.
Katarzyna Bryc, Eric Durand, J Michael Macpherson, David Reich, Joanna Mountain
Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans brought largely by the Trans-Atlantic slave trade, shaping the early history of what became the United States. We studied the genetic ancestry of 5,269 self-described African Americans, 8,663 Latinos, and 148,789 European Americans who are 23andMe customers and show that the legacy of these historical interactions is visible in the genetic ancestry of present-day Americans. We document pervasive mixed ancestry and asymmetrical male and female ancestry contributions in all groups studied. We show that regional ancestry differences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced relocation of Native Americans within the US. This study sheds light on the fine-scale differences in ancestry within and across the United States, and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry.
Genome-Wide Mapping In A House Mouse Hybrid Zone Reveals Hybrid Sterility Loci And Dobzhansky-Muller Interactions
Leslie Turner, Bettina Harr
Mapping hybrid defects in contact zones between incipient species can identify genomic regions contributing to reproductive isolation and reveal genetic mechanisms of speciation. The house mouse features a rare combination of sophisticated genetic tools and natural hybrid zones between subspecies. Male hybrids often show reduced fertility, a common reproductive barrier between incipient species. Laboratory crosses have identified sterility loci, but each encompasses hundreds of genes. We map genetic determinants of testis weight and testis gene expression using offspring of mice captured in a hybrid zone between M. musculus musculus and M. m. domesticus. Many generations of admixture enables high-resolution mapping of loci contributing to these sterility-related phenotypes. We identify complex interactions among sterility loci, suggesting multiple, non-independent genetic incompatibilities contribute to barriers to gene flow in the hybrid zone.
Inference of Gorilla demographic and selective history from whole genome sequence data
Kimberly F. McManus, Joanna L. Kelley, Shiya Song, Krishna Veeramah, August E. Woerner, Laurie S. Stevison, Oliver A. Ryder, , Jeffrey M. Kidd, Jeffrey D. Wall, Carlos D. Bustamante, Michael F. Hammer
While population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor ~261 thousand years ago (kya), and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage ~68 kya. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of ~1.4-fold around ~970 kya and a recent ~5.6-fold contraction in population size ~23 kya. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.