A comparative analysis of transcription factor expression during metazoan embryonic development

A comparative analysis of transcription factor expression during metazoan embryonic development
Alicia Schep, Boris Adryan
(Submitted on 8 Jan 2013)

During embryonic development, a complex organism is formed from a single starting cell. These processes of growth and differentiation are driven by large transcriptional changes, which are following the expression and activity of transcription factors (TFs). This study sought to compare TF expression during embryonic development in a diverse group of metazoan animals: representatives of vertebrates (Danio rerio, Xenopus tropicalis), a chordate (Ciona intestinalis) and invertebrate phyla such as insects (Drosophila melanogaster, Anopheles gambiae) and nematodes (Caenorhabditis elegans) were sampled, The different species showed overall very similar TF expression patterns, with TF expression increasing during the initial stages of development. C2H2 zinc finger TFs were over-represented and Homeobox TFs were under-represented in the early stages in all species. We further clustered TFs for each species based on their quantitative temporal expression profiles. This showed very similar TF expression trends in development in vertebrate and insect species. However, analysis of the expression of orthologous pairs between more closely related species showed that expression of most individual TFs is not conserved, following the general model of duplication and diversification. The degree of similarity between TF expression between Xenopus tropicalis and Danio rerio followed the hourglass model, with the greatest similarity occuring during the early tailbud stage in Xenopus tropicalis and the late segmentation stage in Danio rerio. However, for Drosophila melanogaster and Anopheles gambiae there were two periods of high TF transcriptome similarity, one during the Arthropod phylotypic stage at 8-10 hours into Drosophila development and the other later at 16-18 hours into Drosophila development.

Our paper: A statistical framework for joint eQTL analysis in multiple tissues

This guest post is by Timothée Flutre and William Wen on their paper “A statistical framework for joint eQTL analysis in multiple tissues” with Matthew Stephens and Jonathan Pritchard arXived here.

As large eQTL data sets are being produced for multiple tissues, it is important to leverage all the information in the data to detect eQTLs as well as to provide ways to interpret them. Motivated by this, we developed a statistical framework for eQTL discovery that allows for joint analysis of multiple tissues. Though the details are in the paper, in this blog post we take the opportunity to highlight what we think are the main statistical features.

Looking for eQTLs in multiple tissues immediately raises the question of tissue specificity. In this paper, we define an eQTL as “active” in a particular tissue if it has a non-zero genetic effect on the expression of the target gene in this tissue. Most published works implicitly use this definition to refer to tissue-specific eQTLs. One could take issue with this definition: for example, if an eQTL is very strong in one tissue and very weak in another then one might think of this as “tissue-specific”, or at least “tissue-inconsistent”, but in our paper we stick with the binary representation of activity as a useful first step. We represent the activity pattern of a potential eQTL by a binary vector called a configuration (see Han & Eskin, PLoS Genetics 2012, and Wen & Stephens, arXiv 1111.1210). As an example, the following configuration, (110), corresponds to the case where three tissues are analyzed and the eQTL is active only in the first two tissues.

In a brief summary, we can highlight three important features of our model. First, by mapping eQTLs jointly rather than in each tissue separately, our model borrows information between the tissues in which an eQTL is active, and thereby greatly increases power. This is somewhat equivalent to relaxing the threshold of significance in the second tissue when one has already detected the eQTL in the first tissue. Second, by comparing evidence in the data for each configuration, our model provides an interpretation of how an eQTL acts in multiple tissues. In statistical terms, as more than two hypotheses are being tested (for three tissues there are 7 non-null configurations), one usually speaks of model comparison. Third, our model also estimates the proportion of each configuration in the data set. This is achieved by pooling all genes together, and thus borrowing information between them.

Besides simulations, we re-analyzed the largest available data set so far, 3 tissues from 75 individuals, from Dimas et al (Science 2009). Our joint analysis model has more power and detects substantially more eQTLs than a tissue-by-tissue analysis (63% at FDR=0.05). Moreover, we show how a tissue-by-tissue analysis can largely overestimate the fraction of tissue-specific eQTLs, because it does not account for incomplete power when testing in each tissue separately. Qualitatively, the discrepancy between both methods is very large on this data set. Indeed, according to the tissue-by-tissue analysis, only 19% of eQTLs are consistent across tissues, i.e. configuration (111), whereas our model estimates >80% of eQTLs to be consistent. After checking several of our assumptions, we are fairly confident in our estimate. Moreover such a high proportion of consistent eQTLs is also obtained with the pairwise approach originally used by Nica et al (2011).

The analysis of this specific data set therefore indicates that most eQTLs are consistent across tissues. Yet we find examples of strong tissue-specific eQTLs, such as between gene ENSG00000166839 (ANKDD1A) and SNP rs1628955:

box-forest_strong-specific_rmvPCs_ENSG00000166839-rs1628955

Haldane’s Sieve sifts through 2012

We started Haldane’s Sieve back in August 2012 to promote a preprint culture in evolutionary genetics (see here for more details). Since starting we’ve had ~150 posts, the vast majority of which have been preprint abstracts. We’ve had over 30,000 views from all over the world. During this time we’ve also seen more journals adopting favorable policies towards preprints, in particular Genetics and Genome Research, reflecting a growing recognition that preprint archives are a natural stage in the publication process. Overall it has been great to see the support for Haldane’s Sieve from so many people; we hope that it, and preprints more generally, will go from strength to strength in 2013.

Below are our top 10 most viewed pages of 2012. Each one of these has received hundreds of views. One noticeable trend is that many of them are the “Our paper” posts, which suggests that writing a blurb about your paper for Haldane’s Sieve is a great way to bring it more attention. Let us know if you want to write a post on your preprint article, or a quick post on a preprint you’ve enjoyed.

  1. Horizontal gene transfer may explain variation in θs. Maddamsetti et al. respond to a recent paper by Martincorena et al. The attention garnered by this post is undoubtedly due to its lively comment section. Martincorena et al. themselves responded with a pre-print here.
  2. Our paper: The genetic prehistory of southern Africa. Pickrell et al. write about their preprint. Their published paper is out at Nature Communications.
  3. Thoughts on: Finding the sources of missing heritability in a yeast cross. Joe Pickrell’s post about Bloom et al.
  4. Our paper: The geography of recent genetic ancestry across Europe. Peter Ralph and Graham Coop write about their arXived paper.
  5. Thoughts on: The date of interbreeding between Neandertals and modern humans. Graham Coop’s post on Sankararaman et al.’s paper. The authors’ post on their paper (Our paper: The date of interbreeding between Neandertals and modern humans) also made our top 10. The paper was published in PLoS Genetics.
  6. Our paper: Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster. Casey Bergman’s post on his group’s paper by Richardson et al. The paper was published in PLoS Genetics.
  7. Our paper: A genetic variant near olfactory receptor genes influences cilantro preference. Nick Eriksson’s post about 23andMe’s preprint. The paper appeared in Flavour.
  8. Species Identification and Unbiased Profiling of Complex Microbial Communities Using Shotgun Illumina Sequencing of 16S rRNA Amplicon Sequences. Ong et al.
  9. Our paper: Population genomics of sub-Saharan Drosophila melanogaster: African diversity and non-African admixture. John Pool’s post on Pool et al. The paper appeared in PLoS Genetics.
  10. Blood ties: ABO is a trans-species polymorphism in primates . Ségurel et al.’s paper which Laure Ségurel posted about here. The paper came out in PNAS.

Optimal Assembly for High Throughput Shotgun Sequencing

Optimal Assembly for High Throughput Shotgun Sequencing
Guy Bresler, Ma’ayan Bresler, David Tse
(Submitted on 1 Jan 2013)

We present a framework for the design of optimal assembly algorithms for shotgun sequencing under the criterion of complete reconstruction. We derive a lower bound on the read length and the coverage depth required for reconstruction in terms of the repeat statistics of the genome. We design a de Brujin graph based assembly algorithm which can achieve very close to the lower bound for repeat statistics of a wide range of sequenced genomes.

Most viewed on Haldane’s Sieve: December 2012

The most viewed preprints on Haldane’s Sieve in December 2012 were:

Our paper: Epistasis not needed to explain low dN/dS

This guest post is by Joshua Plotkin on his group’s paper McCandlish et al. Epistasis not needed to explain low dN/dS arXived here.

Our lab has recently begun to post research pre-prints on arXiv. All members of the group enthusiastically support this trend, both within our own group and within the broader scientific community. The merits of sharing pre-prints have been described elsewhere. The benefits of pre-prints are so immediately apparent, I feel, that there is no need to add further verses to the praises that have already been sung.

Recently, however, my research group and I faced an unusual and difficult question: whether we should post a pre-print that does not describe primary research, but rather is a critique of a recent paper published by another group – a paper on the role of epistasis in molecular evolution from the group led by Fyodor Kondrashov. My group and I have never before written such a commentary; and so I faced this choice with some uncertainty. Here are some thoughts on our group’s decision to write the commentary and to post it to arXiv.

Kondrashov’s group is at the vanguard of contemporary research in molecular evolution. In this particular paper from his group, Breen et al. contend that epistasis is “pervasive throughout protein evolution”; a view that I mostly support and indeed have expressed, in a more limited scope, in several publications and commentaries (e.g. here, here, and here). However, in discussing the paper by Breen et al. over lunch, our research group came to the consensus that their argument is logically flawed. Breen et al. reached their conclusion because the dN/dS values observed in some genes are much lower than their expectation in the absence of epistasis. But when calculating the expected dN/dS ratio in the absence of epistasis, Breen et al. assumed that all amino acids observed in a protein alignment at any particular position have equal fitness. This assumption is unrealistic because, simply, some amino acids may be more fit than others. When we relaxed this unrealistic assumption, we found that the observed dN/dS values and the observed patterns of amino acid diversity at each site are perfectly consistent with a non-epistatic model of protein evolution, for all the nuclear and chloroplast genes in the Breen et al. dataset (but, interestingly, not for their mitochondrial genes).

In an ideal world, scientific disagreements would be resolved by straightforward transactions based solely on logic and data. But in reality, such disagreements inevitably involve intellectual biases, not to mention personalities, politics, reputations, et cetera. In fact, we (my research group and I) are colleagues and admirers of Kondrashov and his comrades (these two papers of his are among our favorites). Why risk our collegiality by publishing a critique on arXiv?

The answer is two-fold. First, we are passionate about understanding molecular evolution, both as individuals and within the context of a scientific community – and we believe this exchange will advance that understanding. Second, we have had extensive email correspondences with Fedya about the scientific issues at hand. These correspondences have been completely open and straightforward: we have shared our computer code so that Fedya can reproduce our analyses; and Fedya has agreed with our critique, in principle, although he has some reservations and may appreciate subtleties of his data that we do not. In any case, I feel that the scientific exchange has been honest, and it will hopefully avoid the snark that sometimes accompanies such disagreements, and focus instead on the scientific issues at stake.

I wish to thank Graham Coop for inviting me to contribute to Haldane’s Sieve. And thanks of course to my co-authors, including our own fearless leader, David McCandlish.

—Joshua B. Plotkin

N.B.: This blog post is meant as an exchange among scientific colleagues, and not as an advertisement to the media.

Epistasis not needed to explain low dN/dS

Epistasis not needed to explain low dN/dS
In Response to “Epistasis as the primary factor in molecular evolution” by Breen et al. Nature 490, 535-538 (2012)
David M. McCandlish, Etienne Rajon, Premal Shah, Yang Ding, Joshua B. Plotkin
(Submitted on 20 Dec 2012)

An important question in molecular evolution is whether an amino acid that occurs at a given position makes an independent contribution to fitness, or whether its effect depends on the state of other loci in the organism’s genome, a phenomenon known as epistasis. In a recent letter to Nature, Breen et al. (2012) argued that epistasis must be “pervasive throughout protein evolution” because the observed ratio between the per-site rates of non-synonymous and synonymous substitutions (dN/dS) is much lower than would be expected in the absence of epistasis. However, when calculating the expected dN/dS ratio in the absence of epistasis, Breen et al. assumed that all amino acids observed in a protein alignment at any particular position have equal fitness. Here, we relax this unrealistic assumption and show that any dN/dS value can in principle be achieved at a site, without epistasis. Furthermore, for all nuclear and chloroplast genes in the Breen et al. dataset, we show that the observed dN/dS values and the observed patterns of amino acid diversity at each site are jointly consistent with a non-epistatic model of protein evolution.

A statistical framework for joint eQTL analysis in multiple tissues

A statistical framework for joint eQTL analysis in multiple tissues
Timothée Flutre, Xiaoquan Wen, Jonathan Pritchard, Matthew Stephens
(Submitted on 19 Dec 2012)

Mapping expression Quantitative Trait Loci (eQTLs) represents a powerful and widely-adopted approach to identifying putative regulatory variants and linking them to specific genes. Up to now eQTL studies have been conducted in a relatively narrow range of tissues or cell types. However, understanding the biology of organismal phenotypes will involve understanding regulation in multiple tissues, and ongoing studies are collecting eQTL data in dozens of cell types. Here we present a statistical framework for powerfully detecting eQTLs in multiple tissues or cell types (or, more generally, multiple subgroups). The framework explicitly models the potential for each eQTL to be active in some tissues and inactive in others. By modeling the sharing of active eQTLs among tissues this framework increases power to detect eQTLs that are present in more than one tissue compared with “tissue-by-tissue” analyses that examine each tissue separately. Conversely, by modeling the inactivity of eQTLs in some tissues, the framework allows the proportion of eQTLs shared across different tissues to be formally estimated as parameters of a model, addressing the difficulties of accounting for incomplete power when comparing overlaps of eQTLs identified by tissue-by-tissue analyses. Applying our framework to re-analyze data from transformed B cells, T cells and fibroblasts we find that it substantially increases power compared with tissue-by-tissue analysis, identifying 63% more genes with eQTLs (at FDR=0.05). Further the results suggest that, in contrast to previous analyses of the same data, the majority of eQTLs detectable in these data are shared among all three tissues.

easyGWAS: An integrated interspecies platform for performing genome-wide association studies

easyGWAS: An integrated interspecies platform for performing genome-wide association studies

Dominik Grimm, Bastian Greshake, Stefan Kleeberger, Christoph Lippert, Oliver Stegle, Bernhard Schölkopf, Detlef Weigel, Karsten Borgwardt
(Submitted on 19 Dec 2012)

Motivation: The rapid growth in genome-wide association studies (GWAS) in plants and animals has brought about the need for a central resource that facilitates i) performing GWAS, ii) accessing data and results of other GWAS, and iii) enabling all users regardless of their background to exploit the latest statistical techniques without having to manage complex software and computing resources.
Results: We present easyGWAS, a web platform that provides methods, tools and dynamic visualizations to perform and analyze GWAS. In addition, easyGWAS makes it simple to reproduce results of others, validate findings, and access larger sample sizes through merging of public datasets.
Availability: Detailed method and data descriptions as well as tutorials are available in the supplementary materials. easyGWAS is available at this http URL
Contact: dominik.grimm@tuebingen.mpg.de

Selection biases the prevalence and type of epistasis along adaptive trajectories

Selection biases the prevalence and type of epistasis along adaptive trajectories
Jeremy A. Draghi, Joshua B. Plotkin
(Submitted on 17 Dec 2012)

The contribution to an organism’s phenotype from one genetic locus may depend upon the status of other loci. Such epistatic interactions among loci are now recognized as fundamental to shaping the process of adaptation in evolving populations. Although little is known about the structure of epistasis in most organisms, recent experiments with bacterial populations have concluded that antagonistic interactions abound and tend to de-accelerate the pace of adaptation over time. Here, we use a broad class of mathematical fitness landscapes to examine how natural selection biases the mutations that substitute during evolution based on their epistatic interactions. We find that, even when beneficial mutations are rare, these biases are strong and change substantially throughout the course of adaptation. In particular, epistasis is less prevalent than the neutral expectation early in adaptation and much more prevalent later, with a concomitant shift from predominantly antagonistic interactions early in adaptation to synergistic and sign epistasis later in adaptation. We observe the same patterns when re-analyzing data from a recent microbial evolution experiment. Since these biases depend on the population size and other parameters, they must be quantified before we can hope to use experimental data to infer an organism’s underlying fitness landscape or to understand the role of epistasis in shaping its adaptation. In particular, we show that when the order of substitutions is not known to an experimentalist, then standard methods of analysis may suggest that epistasis retards adaptation when in fact it accelerates it.