Origins and impacts of new exons

Origins and impacts of new exons
Jason Merkin*, Ping Chen*, Sampsa Hautaniemi, Christopher Burge
doi: http://dx.doi.org/10.1101/009282

Mammalian genes are typically broken into several protein-coding and non-coding exons, but the evolutionary origins and functions of new exons are not well understood. Here, we analyzed patterns of exon gain using deep cDNA sequencing data from several mammals and one bird, identifying thousands of species- and lineage-specific exons. While exons conserved across mammals are mostly protein-coding and constitutively spliced, species-specific exons were mostly located in 5′ untranslated regions and alternatively spliced. New exons most often derived from unique intronic sequence rather than repetitive elements, and were associated with upstream intronic deletions, increased nucleosome occupancy and RNA polymerase II pausing. Surprisingly, exon gain was associated with increased gene expression, but only in tissues where the exon was included, suggesting that splicing enhances steady-state mRNA levels and that changes in splicing represent a major contributor to the evolution of gene expression.

Different tastes for different individuals

Different tastes for different individuals
Kohei Fujikura
doi: http://dx.doi.org/10.1101/009357

Individual taste differences were first reported in the first half of the 20th century, but the primary reasons for these differences have remained uncertain. Much of the taste variation among different mammalian species can be explained by pseudogenization of taste receptors. In this study, by analyzing 14 ethnically diverse populations, we investigated whether the most recent disruptions of taste receptor genes segregate with their intact forms. Our results revealed an unprecedented prevalence of segregating loss-of-function (LoF) taste receptor variants, identifying one of the most pronounced cases of functional population diversity in the human genome. LoF variant frequency was considerably higher than the overall mutation rate, and many humans harbored varying numbers of critical mutations. In particular, molecular evolutionary rates of sour and bitter receptors were far higher in humans than those of sweet, salty, and umami receptors compared with other carnivorous mammals although not all of the taste receptors genes were identified. Many LoF variants are population-specific, some of which arose even after the population differentiation, but not before divergence of the modern and archaic (Neanderthal and Denisovan) human. Based on these findings, we conclude that modern humans might have been losing their taste receptor genes because of high-frequency LoF taste receptor variants. Finally I actually demonstrated the genetic testing of taste receptors from personal exome sequence.

The genetic ancestry of African, Latino, and European Americans across the United States.

The genetic ancestry of African, Latino, and European Americans across the United States.
Katarzyna Bryc, Eric Durand, J Michael Macpherson, David Reich, Joanna Mountain
doi: http://dx.doi.org/10.1101/009340

Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans brought largely by the Trans-Atlantic slave trade, shaping the early history of what became the United States. We studied the genetic ancestry of 5,269 self-described African Americans, 8,663 Latinos, and 148,789 European Americans who are 23andMe customers and show that the legacy of these historical interactions is visible in the genetic ancestry of present-day Americans. We document pervasive mixed ancestry and asymmetrical male and female ancestry contributions in all groups studied. We show that regional ancestry differences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced relocation of Native Americans within the US. This study sheds light on the fine-scale differences in ancestry within and across the United States, and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry.

Genome-Wide Mapping In A House Mouse Hybrid Zone Reveals Hybrid Sterility Loci And Dobzhansky-Muller Interactions

Genome-Wide Mapping In A House Mouse Hybrid Zone Reveals Hybrid Sterility Loci And Dobzhansky-Muller Interactions
Leslie Turner, Bettina Harr
doi: http://dx.doi.org/10.1101/009373

Mapping hybrid defects in contact zones between incipient species can identify genomic regions contributing to reproductive isolation and reveal genetic mechanisms of speciation. The house mouse features a rare combination of sophisticated genetic tools and natural hybrid zones between subspecies. Male hybrids often show reduced fertility, a common reproductive barrier between incipient species. Laboratory crosses have identified sterility loci, but each encompasses hundreds of genes. We map genetic determinants of testis weight and testis gene expression using offspring of mice captured in a hybrid zone between M. musculus musculus and M. m. domesticus. Many generations of admixture enables high-resolution mapping of loci contributing to these sterility-related phenotypes. We identify complex interactions among sterility loci, suggesting multiple, non-independent genetic incompatibilities contribute to barriers to gene flow in the hybrid zone. 

Inference of Gorilla demographic and selective history from whole genome sequence data

Inference of Gorilla demographic and selective history from whole genome sequence data

Kimberly F. McManus, Joanna L. Kelley, Shiya Song, Krishna Veeramah, August E. Woerner, Laurie S. Stevison, Oliver A. Ryder, , Jeffrey M. Kidd, Jeffrey D. Wall, Carlos D. Bustamante, Michael F. Hammer
doi: http://dx.doi.org/10.1101/009191

While population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor ~261 thousand years ago (kya), and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage ~68 kya. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of ~1.4-fold around ~970 kya and a recent ~5.6-fold contraction in population size ~23 kya. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.

Author post: Century-scale methylome stability in a recently diverged Arabidopsis thaliana lineage

This guest post is by Claude Becker, Jörg Hagmann and Detlef Weigel on their preprint Century-scale methylome stability in a recently diverged Arabidopsis thaliana lineage, bioRxived here.

This paper is the result of a collaboration between experts in machine learning and statistical analysis (from the group of Karsten Borgwardt at the Max Planck Institute of Intelligent Systems), a lab that has spearheaded the assembly and SNP genotyping of a world-wide collection of Arabidopsis thaliana specimen (Joy Bergelson’s lab at the University of Chicago), a group specialized in large-scale phenotyping (the lab of Thomas Altmann at the Leibniz Institute of Plant Genetics and Crop Plant Research in Gatersleben) and our epigenomics group at the Max Planck Institute for Developmental Biology in Tübingen.

The epigenome of an organism, in a restricted definition, consists of the entirety of post-translational histone modifications (e.g. methylation, acetylation, etc.) and chemical modifications to the DNA, such as methylation of cytosines. Epigenetic marks can influence the transcriptional activity of genes and transposable elements by locally modulating the accessibility of the DNA. The local configuration of the epigenome can change (i) spontaneously, (ii) in dependence of genetic rearrangements, or (iii) as a consequence of external signals. That the epigenome reacts to external signals such as stress and nutrient supply and that it can influence physiological processes – even behavior – has caused much recent excitement. Academic and popular scientific articles have raised the question whether the epigenome has the potential to maintain environmental footprints across generations. The epigenome is thus presented as an entity that fuels acclimation to rapidly changing environmental conditions and that enables adaptation in subsequent generations. Studies investigating the epigenetic basis of the inheritance of acquired traits, however, often either lack the depth of analysis necessary for the identification of locus-specific epigenetic changes or investigate inheritance over a rather short time period of only one or two generations. Moreover, many study designs do not allow for easy distinction between genetic variation causing the observed epigenetic change and epigenetic differences independent of DNA sequence variation.

In our new study we aim to tackle the question to what extent long exposure to varying and diverse environmental conditions can change the heritable DNA methylation landscape. We overcome several of the above-mentioned problems and limitations by studying variation of DNA methylation in a quasi-isogenic lineage of the model plant Arabidopsis thaliana. North America (NA) was only recently colonized by A. thaliana, and approximately half of the current population is made of a single lineage that underwent a recent population bottleneck, having diverged from a common ancestor more a century or two ago, resulting in minimal genetic diversity in the current population [1].

We sequenced the genome and DNA methylome of thirteen closely related NA accessions originating from different geographical locations in order to determine the spectrum, frequency and effect of epigenetic variants. We then compared the epigenetic variation in the NA lineage to that of a previously analyzed set of isogenic A. thaliana lines that had been propagated for 30 generations in the greenhouse [2,3].

Pairwise comparison of the NA accessions revealed that only 3% of the genome-wide methylation showed variable methylation. By using the genetic mutations as a molecular clock, we found that – contrary to our expectation – epimutations did not accumulate at a higher rate under varying natural conditions compared to growth in a stable greenhouse environment. Even more surprisingly, changes in DNA methylation of single cytosines and of larger contiguous regions were often seen in both NA and greenhouse-grown accessions. In both datasets, accumulation of epimutations over time was non-linear, likely reflecting frequent reversions of methylation changes back to the initial configuration. Population structure inferred from methylation data reflected the genetic relatedness of the accessions and showed no signal of a genome-wide environmental footprint. This, together with the fact that most epigenetic variants were neutral and did not correlate with changes in gene expression, indicated that epigenetic variants accumulate to a large extent as a function of time and genetic diversification rather than as a consequence of local adaptation to environmental changes.

In summary, we have shown that long-term methylome variation of plants grown in varying and diverse natural sites is largely stable at the whole-genome level and in several aspects is intriguingly similar to that of lines raised in uniform conditions. This does not rule out a limited number of subtle adaptive DNA methylation changes that are linked to specific growth conditions, but it is in stark contrast to the published claims of broad, genome-wide epigenetic variation reflecting local adaptation. Heritable polymorphisms that arise in response to specific growth conditions certainly appear to be much less frequent than those that arise spontaneously or due to genetic variation.

In addition to the biological findings discussed above, an important part of our paper is an improved method for the detection of differentially methylated regions. Past studies have relied on clustering of differentially methylated positions or on fixed sliding windows, with the caveat of high rates of false negatives and false positives, respectively. We have adapted a Hidden Markov Model, initially developed for animal methylation data, to the more complex DNA methylation patterns in plants. Upon identification of methylated regions in each strain, these are then tested for differential methylation between strains. Our method results in increased specificity and higher accuracy and we believe it will be of broad interest to the epigenomics community.

References

1. Platt A, Horton M, Huang YS, Li Y, Anastasio AE, et al. (2010) The scale of population structure in Arabidopsis thaliana. PLoS Genet 6: e1000843.

2. Becker C, Hagmann J, Müller J, Koenig D, Stegle O, et al. (2011) Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 480: 245-249.

3. Schmitz RJ, Schultz MD, Lewsey MG, O’Malley RC, Urich MA, et al. (2011) Transgenerational epigenetic instability is a source of novel methylation variants. Science 334: 369-373.

Non-crossover gene conversions show strong GC bias and unexpected clustering in humans

Non-crossover gene conversions show strong GC bias and unexpected clustering in humans

Amy Williams, Giulio Geneovese, Thomas Dyer, Katherine Truax, Goo Jun, Nick Patterson, Joanne E. Curran, Ravi Duggirala, John Blangero, David Reich, Molly Przeworski,
doi: http://dx.doi.org/10.1101/009175

Although the past decade has seen tremendous progress in our understanding of fine-scale recombination, little is known about non-crossover (or “gene conversion”) resolutions. We report the first genome-wide study of non-crossover gene conversion events in humans. Using SNP array data from 94 meioses, we identified 107 sites affected by non-crossover events, of which 51/53 were confirmed in sequence data. Our results suggest that a site is involved in a non-crossover event at a rate of 6.7 × 10-6/bp/generation, consistent with results from sperm-typing studies. Observed non-crossover events show strong allelic bias, with 70% (61–79%) of events transmitting GC alleles (P=7.9 × 10-5), and have tracts lengths that vary over more than an order of magnitude. Strikingly, in 4 of 15 regions with available resequencing data, multiple (~2–4) distinct non-crossover events cluster within ~20–30 kb. This pattern has not been reported previously in mammals and is inconsistent with canonical models of double strand break repair.

Century-scale methylome stability in a recently diverged Arabidopsis thaliana lineage

Century-scale methylome stability in a recently diverged Arabidopsis thaliana lineage

Joerg Hagmann, Claude Becker, Jonas Müller, Oliver Stegle, Rhonda C Meyer, Korbinian Schneeberger, Joffrey Fitz, Thomas Altmann, Joy Bergelson, Karsten Borgwardt, Detlef Weigel
doi: http://dx.doi.org/10.1101/009225

There has been much excitement about the possibility that exposure to specific environments can induce an ecological memory in the form of whole-sale, genome-wide epigenetic changes that are maintained over many generations. In the model plant Arabidopsis thaliana, numerous heritable DNA methylation differences have been identified in greenhouse-grown isogenic lines, but it remains unknown how natural, highly variable environments affect the rate and spectrum of such changes. Here we present detailed methylome analyses in a geographically dispersed A. thaliana population that constitutes a collection of near-isogenic lines, diverged for at least a century from a common ancestor. We observed little DNA methylation divergence whole-genome wide. Nonetheless, methylome variation largely reflected genetic distance, and was in many aspects similar to that of lines raised in uniform conditions. Thus, even when plants are grown in varying and diverse natural sites, genome-wide epigenetic variation accumulates in a clock-like manner, and epigenetic divergence thus parallels the pattern of genome-wide DNA sequence divergence.

Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell type-specific expression

Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell type-specific expression

Maxwell W Libbrecht, Ferhat Ay, Michael M Hoffman, David M Gilbert, Jeffrey A Bilmes, William Stafford Noble
doi: http://dx.doi.org/10.1101/009209

The genomic neighborhood of a gene influences its activity, a behavior that is attributable in part to domain-scale regulation, in which regions of hundreds or thousands of kilobases known as domains are regulated as a unit. Previous studies using genomics assays such as chromatin immunoprecipitation (ChIP)-seq and chromatin conformation capture (3C)-based assays have identified many types of regulatory domains. However, due to the difficulty of integrating genomics data sets, the relationships among these domain types are poorly understood. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation of heterogeneous collections of genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing SAGA methods can incorporate only data sets that can be expressed as a one-dimensional vector over the genome and therefore cannot integrate inherently pairwise chromatin conformation data. We developed a new computational method, called graph-based regularization (GBR), for expressing a pairwise prior that encourages certain pairs of genomic loci to receive the same label in a genome annotation. We used GBR to exploit chromatin conformation information during genome annotation by encouraging positions that are close in 3D to occupy the same type of domain. Using this approach, we produced a comprehensive model of chromatin domains in eight human cell types, thereby revealing the relationships among known domain types. Through this model, we identified clusters of tightly-regulated genes expressed in only a small number of cell types, which we term “specific expression domains.” We additionally found that a subset of domain boundaries marked by promoters and CTCF motifs are consistent between cell types even when domain activity changes. Finally, we showed that GBR can be used for the seemingly unrelated task of transferring information from well-studied cell types to less well characterized cell types during genome annotation, making it possible to produce high-quality annotations of the hundreds of cell types with limited available data.

Author post: Segregation distorters are not a primary source of Dobzhansky-Muller incompatibilities in house mouse hybrids

This guest post is by Russ Corbett-Detig, Emily Jacobs-Palmer, and Hopi Hoekstra (@hopihoekstra) on their paper Corbett-Detig et al Segregation distorters are not a primary source of Dobzhansky-Muller incompatibilities in house mouse hybrids bioRxived here.

What are segregation distorters and how can they contribute to reproductive isolation?

Within an individual, somatic cells are typically genetic clones of one another; in contrast, haploid gametes are related to their compatriots at only half of all loci on average, opening doors to intra-individual competition and conflict. Eggs and sperm may express selfish genetic elements called segregation distorters (SDs) that disable or destroy competitor gametes carrying unrelated alleles. The resulting transmission advantage attained by SDs allows them to invade populations without improving the fitness of individuals that harbor them. Indeed, SDs often negatively impact carriers’ fitness because such hosts transmit fewer fit (or viable) gametes. Hence natural selection favors the evolution of alleles that suppress distortion and thereby restore fertility.

Coevolution of SDs and their suppressors can in turn contribute to the evolution of reproductive isolation between diverging lineages. How? If two populations become temporarily isolated from one another, SDs and later their accompanying suppressors may arise and eventually fix in one isolated population, possibly multiple times over. Should the two populations then encounter each other again, the sperm of hybrid males, for example, will contain one or more distorters without the appropriate suppressors, and these males will suffer decreased fertility. Over time, gene flow may be substantially and perhaps permanently hindered leading to the formation of two reproductively isolated species.

In some Drosophila species pairs, and in many crop plants, it is clear that the coevolution of SDs and their suppressors are major, even primary, contributors to the evolution of reproductive isolation between diverging lineages. At present, however, the relative importance of SDs-suppressor systems to reproductive isolation in broader taxonomic swathes of sexually reproducing organisms (e.g. mammals) is largely unexplored.

Our solution to the practical challenges of studying SDs

Supplemental_Figure_S1

The primary impediment to addressing this important question in evolutionary biology is practical, not conceptual. Conventionally, researchers detect SD-suppressor systems by crossing two strains to produce a large second-generation hybrid population; they then genotype these hybrids at a set of markers across the genome to identify loci that show substantive deviations from 50:50 mendelian ratios—putative SDs. Ultimately, this traditional approach suffers from two major pitfalls. First, for many organisms it is not feasible to raise and genotype enough hybrids (hundreds to thousands) to have sufficient statistical power to detect SDs, especially those with weaker effects. Second, by genotyping these second generation hybrids, rather than the gametes of their parents, one conflates SD with hybrid inviability, and it can be very difficult to disentangle these two factors.

How to circumvent these challenges? In this work, we develop an alternative approach that avoids these practical challenges. We first obtain high quality, motile sperm from first generation hybrid males (generated from two strains with available genome sequences), and then sequence these sperm in bulk as well as a somatic ‘control’ tissue. We then contrast the relative representation of the parental chromosomes in windows across the genome in both samples, searching for regions where the sperm allele ratios show more DNA copies of one parental haplotype, but the somatic alleles do not. Importantly, this approach is very general, and it can easily be applied to any number of interspecific or intraspecific crosses where it is possible to obtain large quantities of viable gametes.

Little evidence for SDs in house mouse hybrids

We apply this method to a nascent pair of Mus musculus subspecies,M. m. castaneus and M. m. domesticus. We chose these subspecies because hybrid males formed in this cross are known to be partially reproductively dysfunctional. Nonetheless, using our novel method we find no evidence supporting the presence of SDs—no genomics regions showing a statistical deviation from 50:50 compared to control tissue—despite strong statistical power to detect them. We conclude that SDs do not contribute appreciably to the evolution of reproductive isolation in this nascent species pair. Instead, reproductive isolation in these mammalian subspecies likely stems from other incompatibilities in spermatogenesis or ejaculate production unrelated to SD-suppressor coevolution.

So what’s next? Because this approach—bulk sequencing of sperm from hybrid males—can be used on almost any pair of interfertile taxa, we can begin to better understand the prevalence of SD and its role in speciation in a wide diversity of species.