Loss of amyloid disaggregases during the evolution of Metazoa

Loss of amyloid disaggregases during the evolution of Metazoa
Albert Erives, Jan Fassler
(Submitted on 15 Jan 2013)

In yeast, phenotypic adaptations can evolve by natural selection of conformational variant prions and their variant amyloid fibers. This system requires the Hsp104 disaggregase, which fragments amyloid fibers into smaller seed prions that are passed on to mitotic descendants and meiotic spores. Interestingly, Hsp104 is found in diverse eukaryotes except metazoans. To investigate whether a prion-based transmission “genetics” was incompatible with the evolution of Metazoa, we identify genes conserved in fungi and choanoflagellates but lost in animals. We show that both eukaryotic clpB amyloid disaggregases, HSP104 and its nuclear-encoded mitochondrial endo-ortholog HSP78, were lost in the stem-metazoan lineage along with only a small number of other relevant genes. We show that these gene losses are not unrelated historical accidents because these loci comprise a very small regulon devoted to prion transmission in yeast. We propose that evolution of developmental asymmetric cell-specifications necessitated the evolutionary deprecation of the ancient clpB system.

Strong Purifying Selection at Synonymous Sites in D. melanogaster

Strong Purifying Selection at Synonymous Sites in D. melanogaster
David S. Lawrie, Philipp W. Messer, Ruth Hershberg, Dmitri A. Petrov
(Submitted on 15 Jan 2013)

Synonymous sites are generally assumed to be subject to weak selective constraint. For this reason, they are often neglected as a possible source of important functional variation. We use site frequency spectra from deep population sequencing data to show that, contrary to this expectation, 22% of four-fold synonymous (4D) sites in D. melanogaster evolve under very strong selective constraint while few, if any, appear to be under weak constraint. Linking polymorphism with divergence data, we further find that the fraction of synonymous sites exposed to strong purifying selection is higher for those positions that show slower evolution on the Drosophila phylogeny. The function underlying the inferred strong constraint appears to be separate from splicing enhancers, nucleosome positioning, and the translational optimization generating canonical codon bias. The fraction of synonymous sites under strong constraint within a gene correlates well with gene expression, particularly in the mid-late embryo, pupae, and adult developmental stages. Genes enriched in strongly constrained synonymous sites tend to be particularly functionally important and are often involved in key developmental pathways. Given that the observed widespread constraint acting on synonymous sites is likely not limited to Drosophila, the role of synonymous sites in genetic disease and adaptation should be reevaluated.

Does your gene need a background check? How genetic background impacts the analysis of mutations, genes, and evolution

Does your gene need a background check? How genetic background impacts the analysis of mutations, genes, and evolution
Chris H. Chandler, Sudarshan Chari, Ian Dworkin
(Submitted on 12 Jan 2013)

The premise of genetic analysis is that a causal link exists between phenotypic and allelic variation. Yet it has long been documented that mutant phenotypes are not a simple result of a single DNA lesion, but rather are due to interactions of the focal allele with other genes and the environment. Although an experimentally rigorous approach, focusing on individual mutations and isogenic control strains, has facilitated amazing progress within genetics and related fields, a glimpse back suggests that a vast complexity has been omitted from our current understanding of allelic effects. Armed with traditional genetic analyses and the foundational knowledge they have provided, we argue that the time and tools are ripe to return to the under-explored aspects of gene function and embrace the context-dependent nature of genetic effects. We assert that a broad understanding of genetic effects and the evolutionary dynamics of alleles requires identifying how mutational outcomes depend upon the wild-type genetic background. Furthermore, we discuss how best to exploit genetic background effects to broaden genetic research programs.

SLiM: Simulating Evolution with Selection and Linkage

SLiM: Simulating Evolution with Selection and Linkage
Philipp W. Messer
(Submitted on 14 Jan 2013)

SLiM is an efficient forward population genetic simulation designed for studying the effects of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance of new mutations, arbitrary gene and chromosomal structure, and user-defined recombination maps.

Dynamics of adaptation: extreme value domains, distance to fitness optimum and fitness correlations

Dynamics of adaptation: extreme value domains, distance to fitness optimum and fitness correlations
Sarada Seetharaman, Kavita Jain
(Submitted on 8 Jan 2013)

We study the properties of adaptive walk performed by a maladapted asexual population in which beneficial mutations fix sequentially until a local fitness peak is reached. Here we consider three factors that govern the adaptation dynamics: the extreme value domain of beneficial mutations, initial distance to the local fitness optimum and the correlations amongst the fitnesses. We show that there is a transition in the behaviour of the walk length and average fitness fixed during adaptation when the mean and variance of the fitness distribution respectively become infinite. When the mean is finite, walk length decreases logarithmically with initial fitness but is a constant otherwise. We also find that the walks are longer for faster decaying fitness distributions and correlated fitnesses. For fitness distributions with finite variance, the fitness fixed during initial steps does not depend on the fitness of the local optimum but increases with the local peak fitness otherwise. Interestingly, the fitness difference between successive steps shows a pattern of diminishing returns for bounded distributions and accelerating returns for fat-tailed distributions. These trends are found to be robust with respect to fitness correlations.

A comparative analysis of transcription factor expression during metazoan embryonic development

A comparative analysis of transcription factor expression during metazoan embryonic development
Alicia Schep, Boris Adryan
(Submitted on 8 Jan 2013)

During embryonic development, a complex organism is formed from a single starting cell. These processes of growth and differentiation are driven by large transcriptional changes, which are following the expression and activity of transcription factors (TFs). This study sought to compare TF expression during embryonic development in a diverse group of metazoan animals: representatives of vertebrates (Danio rerio, Xenopus tropicalis), a chordate (Ciona intestinalis) and invertebrate phyla such as insects (Drosophila melanogaster, Anopheles gambiae) and nematodes (Caenorhabditis elegans) were sampled, The different species showed overall very similar TF expression patterns, with TF expression increasing during the initial stages of development. C2H2 zinc finger TFs were over-represented and Homeobox TFs were under-represented in the early stages in all species. We further clustered TFs for each species based on their quantitative temporal expression profiles. This showed very similar TF expression trends in development in vertebrate and insect species. However, analysis of the expression of orthologous pairs between more closely related species showed that expression of most individual TFs is not conserved, following the general model of duplication and diversification. The degree of similarity between TF expression between Xenopus tropicalis and Danio rerio followed the hourglass model, with the greatest similarity occuring during the early tailbud stage in Xenopus tropicalis and the late segmentation stage in Danio rerio. However, for Drosophila melanogaster and Anopheles gambiae there were two periods of high TF transcriptome similarity, one during the Arthropod phylotypic stage at 8-10 hours into Drosophila development and the other later at 16-18 hours into Drosophila development.

Our paper: A statistical framework for joint eQTL analysis in multiple tissues

This guest post is by Timothée Flutre and William Wen on their paper “A statistical framework for joint eQTL analysis in multiple tissues” with Matthew Stephens and Jonathan Pritchard arXived here.

As large eQTL data sets are being produced for multiple tissues, it is important to leverage all the information in the data to detect eQTLs as well as to provide ways to interpret them. Motivated by this, we developed a statistical framework for eQTL discovery that allows for joint analysis of multiple tissues. Though the details are in the paper, in this blog post we take the opportunity to highlight what we think are the main statistical features.

Looking for eQTLs in multiple tissues immediately raises the question of tissue specificity. In this paper, we define an eQTL as “active” in a particular tissue if it has a non-zero genetic effect on the expression of the target gene in this tissue. Most published works implicitly use this definition to refer to tissue-specific eQTLs. One could take issue with this definition: for example, if an eQTL is very strong in one tissue and very weak in another then one might think of this as “tissue-specific”, or at least “tissue-inconsistent”, but in our paper we stick with the binary representation of activity as a useful first step. We represent the activity pattern of a potential eQTL by a binary vector called a configuration (see Han & Eskin, PLoS Genetics 2012, and Wen & Stephens, arXiv 1111.1210). As an example, the following configuration, (110), corresponds to the case where three tissues are analyzed and the eQTL is active only in the first two tissues.

In a brief summary, we can highlight three important features of our model. First, by mapping eQTLs jointly rather than in each tissue separately, our model borrows information between the tissues in which an eQTL is active, and thereby greatly increases power. This is somewhat equivalent to relaxing the threshold of significance in the second tissue when one has already detected the eQTL in the first tissue. Second, by comparing evidence in the data for each configuration, our model provides an interpretation of how an eQTL acts in multiple tissues. In statistical terms, as more than two hypotheses are being tested (for three tissues there are 7 non-null configurations), one usually speaks of model comparison. Third, our model also estimates the proportion of each configuration in the data set. This is achieved by pooling all genes together, and thus borrowing information between them.

Besides simulations, we re-analyzed the largest available data set so far, 3 tissues from 75 individuals, from Dimas et al (Science 2009). Our joint analysis model has more power and detects substantially more eQTLs than a tissue-by-tissue analysis (63% at FDR=0.05). Moreover, we show how a tissue-by-tissue analysis can largely overestimate the fraction of tissue-specific eQTLs, because it does not account for incomplete power when testing in each tissue separately. Qualitatively, the discrepancy between both methods is very large on this data set. Indeed, according to the tissue-by-tissue analysis, only 19% of eQTLs are consistent across tissues, i.e. configuration (111), whereas our model estimates >80% of eQTLs to be consistent. After checking several of our assumptions, we are fairly confident in our estimate. Moreover such a high proportion of consistent eQTLs is also obtained with the pairwise approach originally used by Nica et al (2011).

The analysis of this specific data set therefore indicates that most eQTLs are consistent across tissues. Yet we find examples of strong tissue-specific eQTLs, such as between gene ENSG00000166839 (ANKDD1A) and SNP rs1628955:

box-forest_strong-specific_rmvPCs_ENSG00000166839-rs1628955

Most viewed on Haldane’s Sieve: December 2012

The most viewed preprints on Haldane’s Sieve in December 2012 were:

A statistical framework for joint eQTL analysis in multiple tissues

A statistical framework for joint eQTL analysis in multiple tissues
Timothée Flutre, Xiaoquan Wen, Jonathan Pritchard, Matthew Stephens
(Submitted on 19 Dec 2012)

Mapping expression Quantitative Trait Loci (eQTLs) represents a powerful and widely-adopted approach to identifying putative regulatory variants and linking them to specific genes. Up to now eQTL studies have been conducted in a relatively narrow range of tissues or cell types. However, understanding the biology of organismal phenotypes will involve understanding regulation in multiple tissues, and ongoing studies are collecting eQTL data in dozens of cell types. Here we present a statistical framework for powerfully detecting eQTLs in multiple tissues or cell types (or, more generally, multiple subgroups). The framework explicitly models the potential for each eQTL to be active in some tissues and inactive in others. By modeling the sharing of active eQTLs among tissues this framework increases power to detect eQTLs that are present in more than one tissue compared with “tissue-by-tissue” analyses that examine each tissue separately. Conversely, by modeling the inactivity of eQTLs in some tissues, the framework allows the proportion of eQTLs shared across different tissues to be formally estimated as parameters of a model, addressing the difficulties of accounting for incomplete power when comparing overlaps of eQTLs identified by tissue-by-tissue analyses. Applying our framework to re-analyze data from transformed B cells, T cells and fibroblasts we find that it substantially increases power compared with tissue-by-tissue analysis, identifying 63% more genes with eQTLs (at FDR=0.05). Further the results suggest that, in contrast to previous analyses of the same data, the majority of eQTLs detectable in these data are shared among all three tissues.

Comment on “Evidence of Abundant and Purifying Selection in Humans for Recently Acquired Regulatory Functions”

Comment on “Evidence of Abundant and Purifying Selection in Humans for Recently Acquired Regulatory Functions”
Nicolas Bray, Lior Pachter
(Submitted on 13 Dec 2012)

Ward and Kellis (Reports, September 5 2012) identify regulatory regions in the human genome exhibiting lineage-specific constraint and estimate the extent of purifying selection. There is no statistical rationale for the examples they highlight, and their estimates of the fraction of the genome under constraint are biased by arbitrary designations of completely constrained regions.