An age-of-allele test of neutrality for transposable element insertions not at equilibrium

An age-of-allele test of neutrality for transposable element insertions not at equilibrium

Justin P. Blumenstiel, Miaomiao He, Casey M. Bergman
(Submitted on 16 Sep 2012)

How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, many previous studies have relied on the assumption of equilibrium between transposition and selection. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable. Here we derive a test of neutrality for TE insertions that does not rely on the assumption of transpositional equilibrium. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence and have had their allele frequency estimated in a population sample. By conditioning on age information provided within the sequence of a TE insertion in the form of the number of substitutions that have occurred within the fragment since insertion into a reference genome, we derive the probability distribution for the TE allele frequency in a population sample under neutrality. Taking models of population fluctuation into account, we then test the fit of predictions of our model to allele frequency data from 190 retrotransposon insertion loci in North American and African populations of Drosophila melanogaster. Using this non-equilibrium model, we are able to explain about 80% of the variance in TE insertion allele frequencies. Controlling for nonequilibrium dynamics of transposition and host demography, we demonstrate how one may detect negative selection acting against most TEs as well as evidence for a small subset of TEs being driven to high frequency by positive selection. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs or gene duplications.

An excess of gene expression divergence on the X chromosome in Drosophila embryos: implications for the faster-X hypothesis

An excess of gene expression divergence on the X chromosome in Drosophila embryos: implications for the faster-X hypothesis

Melek A. Kayserili, Dave T. Gerrard, Pavel Tomancak, Alex T. Kalinka
(Submitted on 5 Sep 2012)

The X chromosome is present as a single copy in the heterogametic sex, and this hemizygosity is expected to drive unusual patterns of evolution on the X relative to the autosomes. For example, the hemizgosity of the X may lead to a lower chromosomal effective population size compared to the autosomes suggesting that the X might be more strongly affected by genetic drift. However, the X may also experience stronger positive selection than the autosomes because recessive beneficial mutations will be more visible to selection on the X where they will spend less time being masked by the dominant, less beneficial allele – a proposal known as the faster-X hypothesis. Thus, empirical studies demonstrating increased genetic divergence on the X chromosome could be indicative of either adaptive or non-adaptive evolution. We measured gene expression in Drosophila species and in D. melanogaster inbred strains for both embryos and adults. In the embryos we found that expression divergence is on average more than 20% higher for genes on the X chromosome relative to the autosomes, but in contrast, in the inbred strains gene expression variation is significantly lower on the X chromosome. Furthermore, expression divergence of genes on Muller’s D element is significantly greater along the branch leading to the obscura sub-group, in which this element segregates as a neo-X chromosome. In the adults, divergence is greatest on the X chromosome for males, but not for females, yet in both sexes inbred strains harbour the lowest level of gene expression variation on the X chromosome. We consider different explanations for our results and conclude that they are most consistent within the framework of the faster-X hypothesis.

Transposable sequence evolution is driven by gene context

Transposable sequence evolution is driven by gene context

Anna-Sophie Fiston-Lavier, Charles E. Vejnar, Hadi Quesneville
(Submitted on 2 Sep 2012)

Transposable elements (TEs) in eukaryote genomes are quantitatively the main components affecting genome size, structure and expression. The dynamics of their insertion and deletion depend on diverse factors varying in strength and nature along the genome. We address here how TE sequence evolution is affected by neighboring genes and the chromatin status (euchromatin or heterochromatin) at their insertion site. We estimated the rates of evolution of TE sequences in Arabidopsis thaliana, and found that they depend on the distance to the nearest genes: TEs located close to genes evolve faster than those that are more distant. Consequently, TE sequences in heterochromatic regions, which are gene-poor regions, are surprisingly younger and longer than those elsewhere. We present a model of TE sequence dynamics in TE-rich genomes, such as maize and wheat, and in TE-poor genomes such as fly and A. thaliana.

Evolutionary genomics of transposable elements in Saccharomyces cerevisiae

Evolutionary genomics of transposable elements in Saccharomyces cerevisiae

Martin Carr, Douda Bensasson, Casey M. Bergman
(Submitted on 1 Sep 2012)

Saccharomyces cerevisiae is one of the premier model systems for studying the genomics and evolution of transposable elements. The availability of the S. cerevisiae genome led to many insights into its five known transposable element families (Ty1-Ty5) in the years shortly after its completion. However, subsequent advances in bioinformatics tools for analysing transposable elements and the recent availability of genome sequences for multiple strains and species of yeast motivates new investigations into Ty evolution in S. cerevisiae. Here we provide a comprehensive phylogenetic and population genetic analysis of Ty families in S. cerevisiae based on a reannotation of Ty elements in the S288c reference genome. We show that previous annotation efforts have underestimated the total copy number of Ty elements for all known families. In addition, we identify a new family of Ty3-like elements related to the S. paradoxus Ty3p which is composed entirely of degenerate solo LTRs. Phylogenetic analyses of LTR sequences identified three families with short-branch, recently active clades nested among long branch, inactive insertions (Ty1, Ty3, Ty4), one family with essentially all recently active elements (Ty2) and two families with only inactive elements (Ty3p and Ty5). Population genomic data from 38 additional strains of S. cerevisiae show that elements present in active clades are predominantly polymorphic, whereas most of the inactive elements are fixed. Finally, we use comparative genomic data to provide evidence that the Ty2 and Ty3p families have arisen in the S. cerevisiae genome by horizontal transfer. Our results demonstrate that the genome of a single individual contains important information about the state of TE population dynamics within a species and suggest that horizontal transfer may play an important role in shaping the diversity of transposable elements in unicellular eukaryotes.

Our paper: Lineage-specific transposons drove massive gene expression recruitments during the evolution of pregnancy in mammals

Our next “our paper” guest post is by Vincent Lynch [@VinJLynch] who’s just joined the UChicago faculty from a postdoc at Yale. He’s posting about his recently arXived paper:

Lineage-specific transposons drove massive gene expression recruitments during the evolution of pregnancy in mammals. ArXived here.
_________________________________________________________________________________
Explaining how morphology evolves is a major challenge in biology. While it’s clear that changes in gene regulation are ultimately responsible for the development and evolution of complex characters, we are only just beginning to understand the molecular mechanisms of gene regulatory evolution. This is largely due to the emergence of new technologies, such as mRNA-Seq and ChIP-Seq, which give biologists the tools to explore evolution across the genome and in non-model species.

We took advantage of these methods to explore the evolution of gene expression in the uterus during the origin of pregnancy in mammals. Using mRNA-Seq, we show that gene expression evolved extremely rapidly during major stages in the evolution of pregnancy, for example during the origin of maternal resource provisioning in the stem-lineage of Mammalia, placentation in the stem-lineage of Theria, and implantation in the stem-lineage of Eutheria. Using ChIP-Seq to identify the cis-regulatory elements of genes recruited into uterine expression in mammals suggests that the majority of enhancers and promoters derived from mammalian lineage-specific transposons.

While recent technological advances are changing the way we do biology (see Wagner 2013), as these emerging methods come into the mainstream we must collectively define our new standards of evidence. What experiments and methods build a convincing case for X? Is it sufficient, for example, to conclude that a transposon donated a novel promoter to a gene if a ChIP-Seq peak for a histone mark associated with promoters lies within the transposon? If we then expand that observation across the genome, can we reasonably conclude that transposons are casually responsible for gene regulatory change? For these reasons we chose to post our manuscript as a work-in-progress to arXiv, both as our contribution to the larger discussion of what constitutes the standards of evidence in this emerging field of biology and as an opportunity to receive feedback from our colleagues to complement formal peer-review.

Vincent Lynch

Our paper: Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster

Casey Bergman [@caseybergman and @bergmanlab] kindly wrote a post about his recently arXived paper:
Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster
ArXived here.
__________________________________________________________________
As part of the Drosophila 12 Genome Project, Steve Salzberg and colleagues’ published a pioneering paper in 2005 showing that complete genomes of the bacterial endosymbiont Wolbachia pipientis can be extracted from the whole-genome shotgun sequence assemblies of Drosophila species. This paper always left an impression on me as a very clever use of extracting new biology from existing genomic data, and when the era of resequencing multiple strains of D. melanogaster kicked off a few years ago, it seemed like a natural extension to ask if this approach could be adapted to a next-generation sequencing data to study the co-evolution of Wolbachia and Drosophila using whole genome data.

In the current work, we used short-read next generation sequencing data from two major resequencing efforts in D. melanogaster — the Drosophila Genetic Reference Panel (DGRP) and Drosophila Population Genomics Project (DPGP) — together with the reference Wolbachia genome published by Wu et al. (2005) and extracted over 175 complete Wolbachia genomes and nearly 300 complete mitochondrial genomes. Readers can find the main results in the paper, which is currently in review. I’d like to discuss here the social context of the project and some of the reasons we submitted to arXiv.

This project started out as summer project for a masters student, Mark Richardson, in 2010 who did an amazing job developing the initial pipeline made most of the initial discoveries in the paper. Mark and I started a collaboration with Frank Jiggins and Mike McGwire shortly after to verify that our in silico genotyping results were making sense, who suggested to bring in Lucy Weinart and John Welch to help with the more sophisticated Bayesian phylogenetic analysis. Another PhD student in my lab, Raquel Linheiro, adapted her transposable element detection pipeline to identify particular Wolbachia sublineages which was crucial to linking our data with previous results. This was a great collaboration, where everyone made significant contributions, and I would collaborate with everyone again (and I hope to!).

At the time (summer 2010), we only had access to the North American strains from the DGRP sample; knowing that North American D. melanogaster are derived populations, we were cautious about the impact that population structure had on our results. We planned in early 2011 to publish on only the DGRP dataset since Mark was going off to do a PhD in Australia and I didn’t have anyone else in the group working on this project. In the summer of 2011, the African DPGP data came online and I decided to take a peek and run the pipeline on the African strains as well. This led to a major overhaul of the project and set us back a year, since all the data had to be reanalyzed again together and the interpretation of the biogeography results was substantially altered. This was in some ways lucky because our initial interpretation of evidence for a selective sweep on one of the cytoplasmic lineages was probably wrong, and it saved us from having to back peddle on this misinterpretation in a later publication.

As we plugged away at trying to finish this project, we had inquiries about the status of the project from several other groups working in the Wolbachia field. Honestly this stressed me out quite a bit, since some of the inquiries were coming from post-docs in big labs. But instead of just sitting on the data, after we finalized the dataset we decided to release these data openly on our lab blog in April 2012. We decided on an open release as a way to help these teams (and others we didn’t know about), but also to get some priority in this area by providing the “gold standard” that other groups could use (and cite!). For the record, I will note that we asked two teams who contacted us about our project if they would reciprocate by sharing unpublished genomic data or in one case published genomic data that was not submitted to GenBank; both declined.

After making the decision to release the data pre-publication, it was a natural step to submit the manuscript to arXiv. I’m an open science advocate and used the Nature Preprint server occasionally in the past. I never really liked the Nature Preprint server, though, since I thought people posted there to give their manuscript the stink of being “Nature (in prep)” on their CV. And I never posted to arXiv in the past, since I always thought it was for more hardcore computational or mathematical biology. But recently, I was convinced by Rosie Redfield, Leonid Kruglyak and colleagues putting their Arsenic Life paper on arXiv that more empirical work in quantitative biology was arXiv-able. And just as with releasing our data early, it seemed like the best way to prevent being scooped was to get our results out as quickly as possible and letting people know about it.

So we went for it. And I have to say the experience has been thoroughly rewarding. Submitting was a piece of cake, easier than any journal I’ve ever submitted to. Having a URL to point to allowed me to tweet about it, which got some exposure to the paper and some new colleagues on twitter. It also allowed me to send a submitted manuscript around to colleagues for informal review, without cluttering up their inboxes with big attachments or providing a moral dilemma about who they can share the manuscript with. And somehow submitting to arXiv pushed the “it’s submitted” button in my brain, which made me a whole lot less stressed about the possibility of being scooped and I’ve been more relaxed throughout the formal submission process. Finally, I know that the pre-publication release of the data and posting of the manuscript has led to a group in Russia using these sequences into their work, and I’ve just gotten a manuscript to review from this group citing our arXiv manuscript and extending our results before our paper is even published! This is what research is all about, right: doing science, getting it out, and letting others build on it. I’ll definitely submit to arXiv for all my papers from my lab, and look forward to the Haldane’s Sieve readership giving us a hard time about our manuscripts while they evolve into formal publications.

Casey Bergman

Lineage-specific transposons drove massive gene expression recruitments during the evolution of pregnancy in mammals

Lineage-specific transposons drove massive gene expression recruitments during the evolution of pregnancy in mammals
Vincent J. Lynch, Mauris Nnamani, Kathryn J. Brayer, Deena Emera, Joel O. Wertheim, Sergei L. Kosakovsky Pond, Frank Grützner, Stefan Bauersachs, Alexander Graf, Aurélie Kapusta, Cédric Feschotte, Günter P. Wagner
(Submitted on 22 Aug 2012)

A major challenge in biology is explaining how novel characters originate, however, the molecular mechanisms that underlie the emergence of evolutionary innovations are unclear. Here we show that while gene expression in the uterus evolves at a slow and relatively constant rate, it has been punctuated by periods of rapid change associated with the recruitment of thousands of genes into uterine expression during the evolution of pregnancy in mammals. We found that numerous genes and signaling pathways essential for the establishment of pregnancy and maternal-fetal communication evolved uterine expression in mammals. Remarkably the majority of genes recruited into endometrial expression have cis-regulatory elements derived from lineage-specific transposons, suggesting that that bursts of transposition facilitate adaptation and speciation through genomic and regulatory reorganization.