Disentangling effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex trait loci

Disentangling effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex trait loci

Gosia Trynka, Harm-Jan Westra, Kamil Slowikowski, Xinli Hu, Han Xu, Barbara E Stranger, Buhm Han, Soumya Raychaudhuri
doi: http://dx.doi.org/10.1101/009258

Identifying genomic annotations that differentiate causal from associated variants is critical to fine-map disease loci. While many studies have identified non-coding annotations overlapping disease variants, these annotations colocalize, complicating fine-mapping efforts. We demonstrate that conventional enrichment tests are inflated and cannot distinguish causal effects from colocalizing annotations. We developed a sensitive and specific statistical approach that is able to identify independent effects from colocalizing annotations. We first confirm that gene regulatory variants map to DNase-I hypersensitive sites (DHS) near transcription start sites. We then show that (1) 15-35% of causal variants within disease loci map to DHS independent of other annotations; (2) breast cancer and rheumatoid arthritis loci harbor potentially causal variants near the summits of histone marks rather than full peak bodies; and (3) variants associated with height are highly enriched for embryonic stem cell DHS sites. We highlight specific loci where we can most effectively prioritize causal variation.

Origins and impacts of new exons

Origins and impacts of new exons
Jason Merkin*, Ping Chen*, Sampsa Hautaniemi, Christopher Burge
doi: http://dx.doi.org/10.1101/009282

Mammalian genes are typically broken into several protein-coding and non-coding exons, but the evolutionary origins and functions of new exons are not well understood. Here, we analyzed patterns of exon gain using deep cDNA sequencing data from several mammals and one bird, identifying thousands of species- and lineage-specific exons. While exons conserved across mammals are mostly protein-coding and constitutively spliced, species-specific exons were mostly located in 5′ untranslated regions and alternatively spliced. New exons most often derived from unique intronic sequence rather than repetitive elements, and were associated with upstream intronic deletions, increased nucleosome occupancy and RNA polymerase II pausing. Surprisingly, exon gain was associated with increased gene expression, but only in tissues where the exon was included, suggesting that splicing enhances steady-state mRNA levels and that changes in splicing represent a major contributor to the evolution of gene expression.

Different tastes for different individuals

Different tastes for different individuals
Kohei Fujikura
doi: http://dx.doi.org/10.1101/009357

Individual taste differences were first reported in the first half of the 20th century, but the primary reasons for these differences have remained uncertain. Much of the taste variation among different mammalian species can be explained by pseudogenization of taste receptors. In this study, by analyzing 14 ethnically diverse populations, we investigated whether the most recent disruptions of taste receptor genes segregate with their intact forms. Our results revealed an unprecedented prevalence of segregating loss-of-function (LoF) taste receptor variants, identifying one of the most pronounced cases of functional population diversity in the human genome. LoF variant frequency was considerably higher than the overall mutation rate, and many humans harbored varying numbers of critical mutations. In particular, molecular evolutionary rates of sour and bitter receptors were far higher in humans than those of sweet, salty, and umami receptors compared with other carnivorous mammals although not all of the taste receptors genes were identified. Many LoF variants are population-specific, some of which arose even after the population differentiation, but not before divergence of the modern and archaic (Neanderthal and Denisovan) human. Based on these findings, we conclude that modern humans might have been losing their taste receptor genes because of high-frequency LoF taste receptor variants. Finally I actually demonstrated the genetic testing of taste receptors from personal exome sequence.

The genetic ancestry of African, Latino, and European Americans across the United States.

The genetic ancestry of African, Latino, and European Americans across the United States.
Katarzyna Bryc, Eric Durand, J Michael Macpherson, David Reich, Joanna Mountain
doi: http://dx.doi.org/10.1101/009340

Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans brought largely by the Trans-Atlantic slave trade, shaping the early history of what became the United States. We studied the genetic ancestry of 5,269 self-described African Americans, 8,663 Latinos, and 148,789 European Americans who are 23andMe customers and show that the legacy of these historical interactions is visible in the genetic ancestry of present-day Americans. We document pervasive mixed ancestry and asymmetrical male and female ancestry contributions in all groups studied. We show that regional ancestry differences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced relocation of Native Americans within the US. This study sheds light on the fine-scale differences in ancestry within and across the United States, and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry.

Genome-Wide Mapping In A House Mouse Hybrid Zone Reveals Hybrid Sterility Loci And Dobzhansky-Muller Interactions

Genome-Wide Mapping In A House Mouse Hybrid Zone Reveals Hybrid Sterility Loci And Dobzhansky-Muller Interactions
Leslie Turner, Bettina Harr
doi: http://dx.doi.org/10.1101/009373

Mapping hybrid defects in contact zones between incipient species can identify genomic regions contributing to reproductive isolation and reveal genetic mechanisms of speciation. The house mouse features a rare combination of sophisticated genetic tools and natural hybrid zones between subspecies. Male hybrids often show reduced fertility, a common reproductive barrier between incipient species. Laboratory crosses have identified sterility loci, but each encompasses hundreds of genes. We map genetic determinants of testis weight and testis gene expression using offspring of mice captured in a hybrid zone between M. musculus musculus and M. m. domesticus. Many generations of admixture enables high-resolution mapping of loci contributing to these sterility-related phenotypes. We identify complex interactions among sterility loci, suggesting multiple, non-independent genetic incompatibilities contribute to barriers to gene flow in the hybrid zone. 

Inference of Gorilla demographic and selective history from whole genome sequence data

Inference of Gorilla demographic and selective history from whole genome sequence data

Kimberly F. McManus, Joanna L. Kelley, Shiya Song, Krishna Veeramah, August E. Woerner, Laurie S. Stevison, Oliver A. Ryder, , Jeffrey M. Kidd, Jeffrey D. Wall, Carlos D. Bustamante, Michael F. Hammer
doi: http://dx.doi.org/10.1101/009191

While population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor ~261 thousand years ago (kya), and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage ~68 kya. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of ~1.4-fold around ~970 kya and a recent ~5.6-fold contraction in population size ~23 kya. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.

Author post: Century-scale methylome stability in a recently diverged Arabidopsis thaliana lineage

This guest post is by Claude Becker, Jörg Hagmann and Detlef Weigel on their preprint Century-scale methylome stability in a recently diverged Arabidopsis thaliana lineage, bioRxived here.

This paper is the result of a collaboration between experts in machine learning and statistical analysis (from the group of Karsten Borgwardt at the Max Planck Institute of Intelligent Systems), a lab that has spearheaded the assembly and SNP genotyping of a world-wide collection of Arabidopsis thaliana specimen (Joy Bergelson’s lab at the University of Chicago), a group specialized in large-scale phenotyping (the lab of Thomas Altmann at the Leibniz Institute of Plant Genetics and Crop Plant Research in Gatersleben) and our epigenomics group at the Max Planck Institute for Developmental Biology in Tübingen.

The epigenome of an organism, in a restricted definition, consists of the entirety of post-translational histone modifications (e.g. methylation, acetylation, etc.) and chemical modifications to the DNA, such as methylation of cytosines. Epigenetic marks can influence the transcriptional activity of genes and transposable elements by locally modulating the accessibility of the DNA. The local configuration of the epigenome can change (i) spontaneously, (ii) in dependence of genetic rearrangements, or (iii) as a consequence of external signals. That the epigenome reacts to external signals such as stress and nutrient supply and that it can influence physiological processes – even behavior – has caused much recent excitement. Academic and popular scientific articles have raised the question whether the epigenome has the potential to maintain environmental footprints across generations. The epigenome is thus presented as an entity that fuels acclimation to rapidly changing environmental conditions and that enables adaptation in subsequent generations. Studies investigating the epigenetic basis of the inheritance of acquired traits, however, often either lack the depth of analysis necessary for the identification of locus-specific epigenetic changes or investigate inheritance over a rather short time period of only one or two generations. Moreover, many study designs do not allow for easy distinction between genetic variation causing the observed epigenetic change and epigenetic differences independent of DNA sequence variation.

In our new study we aim to tackle the question to what extent long exposure to varying and diverse environmental conditions can change the heritable DNA methylation landscape. We overcome several of the above-mentioned problems and limitations by studying variation of DNA methylation in a quasi-isogenic lineage of the model plant Arabidopsis thaliana. North America (NA) was only recently colonized by A. thaliana, and approximately half of the current population is made of a single lineage that underwent a recent population bottleneck, having diverged from a common ancestor more a century or two ago, resulting in minimal genetic diversity in the current population [1].

We sequenced the genome and DNA methylome of thirteen closely related NA accessions originating from different geographical locations in order to determine the spectrum, frequency and effect of epigenetic variants. We then compared the epigenetic variation in the NA lineage to that of a previously analyzed set of isogenic A. thaliana lines that had been propagated for 30 generations in the greenhouse [2,3].

Pairwise comparison of the NA accessions revealed that only 3% of the genome-wide methylation showed variable methylation. By using the genetic mutations as a molecular clock, we found that – contrary to our expectation – epimutations did not accumulate at a higher rate under varying natural conditions compared to growth in a stable greenhouse environment. Even more surprisingly, changes in DNA methylation of single cytosines and of larger contiguous regions were often seen in both NA and greenhouse-grown accessions. In both datasets, accumulation of epimutations over time was non-linear, likely reflecting frequent reversions of methylation changes back to the initial configuration. Population structure inferred from methylation data reflected the genetic relatedness of the accessions and showed no signal of a genome-wide environmental footprint. This, together with the fact that most epigenetic variants were neutral and did not correlate with changes in gene expression, indicated that epigenetic variants accumulate to a large extent as a function of time and genetic diversification rather than as a consequence of local adaptation to environmental changes.

In summary, we have shown that long-term methylome variation of plants grown in varying and diverse natural sites is largely stable at the whole-genome level and in several aspects is intriguingly similar to that of lines raised in uniform conditions. This does not rule out a limited number of subtle adaptive DNA methylation changes that are linked to specific growth conditions, but it is in stark contrast to the published claims of broad, genome-wide epigenetic variation reflecting local adaptation. Heritable polymorphisms that arise in response to specific growth conditions certainly appear to be much less frequent than those that arise spontaneously or due to genetic variation.

In addition to the biological findings discussed above, an important part of our paper is an improved method for the detection of differentially methylated regions. Past studies have relied on clustering of differentially methylated positions or on fixed sliding windows, with the caveat of high rates of false negatives and false positives, respectively. We have adapted a Hidden Markov Model, initially developed for animal methylation data, to the more complex DNA methylation patterns in plants. Upon identification of methylated regions in each strain, these are then tested for differential methylation between strains. Our method results in increased specificity and higher accuracy and we believe it will be of broad interest to the epigenomics community.


1. Platt A, Horton M, Huang YS, Li Y, Anastasio AE, et al. (2010) The scale of population structure in Arabidopsis thaliana. PLoS Genet 6: e1000843.

2. Becker C, Hagmann J, Müller J, Koenig D, Stegle O, et al. (2011) Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 480: 245-249.

3. Schmitz RJ, Schultz MD, Lewsey MG, O’Malley RC, Urich MA, et al. (2011) Transgenerational epigenetic instability is a source of novel methylation variants. Science 334: 369-373.