Transcript length mediates developmental timing of gene expression across Drosophila

Transcript length mediates developmental timing of gene expression across Drosophila
Carlo G. Artieri, Hunter B. Fraser
(Submitted on 18 Jan 2013)

The time required to transcribe genes with long primary transcripts may limit their ability to be expressed in cells with short mitotic cycles, a phenomenon termed intron delay. As such short cycles are a hallmark of the earliest stages of insect development, we used Drosophila developmental timecourse expression data to test whether intron delay affects gene expression genome-wide, and to determine its consequences for the evolution of gene structure. We find that long zygotically expressed, but not maternally deposited, genes show substantial delay in expression relative to their shorter counterparts and that this delay persists over a substantial portion of the ~24 hours of embryogenesis. Patterns of RNA-seq coverage from the 5′ and 3′ ends of transcripts show that this delay is consistent with their inability to terminate transcription, but not with transcriptional initiation-based regulatory control. Highly expressed zygotic genes are subject to purifying selection to maintain compact transcribed regions, allowing conservation of embryonic expression patterns across the Drosophila phylogeny. We propose that intron delay is an underappreciated physical mechanism affecting both patterns of expression as well as gene structure of many genes across Drosophila.

Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations

Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations
Katarzyna Bryc, Wlodek Bryc, Jack W. Silverstein
(Submitted on 18 Jan 2013)

We present a mathematical model, and the corresponding mathematical analysis, that justifies and quantifies the use of principal component analysis of biallelic genetic marker data for a set of individuals to detect the number of subpopulations represented in the data. We indicate that the power of the technique relies more on the number of individuals genotyped than on the number of markers.

Reproductive isolation between phylogeographic lineages scales with divergence

Reproductive isolation between phylogeographic lineages scales with divergence
Sonal Singhal, Craig Moritz
(Submitted on 17 Jan 2013)

Phylogeographic studies frequently reveal multiple morphologically-cryptic lineages within species. What is yet unclear is whether such lineages represent nascent species or evolutionary ephemera. To address this question, we compare five contact zones, each of which occurs between eco-morphologically cryptic lineages of rainforest skinks from the rainforests of the Australian Wet Tropics. Although the contacts likely formed concurrently in response to Holocene expansion from glacial refugia, we estimate that the divergence times (t) of the lineage-pairs range from 3.1 to 11.5 Myr. Multilocus analyses of the contact zones yielded estimates of reproductive isolation that are tightly correlated with divergence time and, for longer-diverged lineages (t > 5 Myr), substantial. These results show that phylogeographic splits of increasing depth can represent stages along the speciation continuum, even in the absence of overt change in ecologically relevant morphology.

Gene set bagging for estimating replicability of gene set analyses

Gene set bagging for estimating replicability of gene set analyses
Andrew E. Jaffe, John D. Storey, Hongkai Ji, Jeffrey T. Leek
(Submitted on 16 Jan 2013)

Background: Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features for association with disease. We propose a new approach, called gene set bagging, for measuring the stability of ranking procedures using predefined gene sets. Gene set bagging involves resampling the original high-throughput data, performing gene-set analysis on the resampled data, and confirming that biological categories replicate. This procedure can be thought of as bootstrapping gene-set analysis and can be used to determine which are the most reproducible gene sets. Results: Here we apply this approach to two common genomics applications: gene expression and DNA methylation. Even with state-of-the-art statistical ranking procedures, significant categories in a gene set enrichment analysis may be unstable when subjected to resampling. Conclusions: We demonstrate that gene lists are not necessarily stable, and therefore additional steps like gene set bagging can improve biological inference of gene set analysis.

Evolution of molecular phenotypes under stabilizing selection

Evolution of molecular phenotypes under stabilizing selection
Armita Nourmohammad, Stephan Schiffels, Michael Laessig
(Submitted on 17 Jan 2013)

Molecular phenotypes are important links between genomic information and organismic functions, fitness, and evolution. Complex phenotypes, which are also called quantitative traits, often depend on multiple genomic loci. Their evolution builds on genome evolution in a complicated way, which involves selection, genetic drift, mutations and recombination. Here we develop a coarse-grained evolutionary statistics for phenotypes, which decouples from details of the underlying genotypes. We derive approximate evolution equations for the distribution of phenotype values within and across populations. This dynamics covers evolutionary processes at high and low recombination rates, that is, it applies to sexual and asexual populations. In a fitness landscape with a single optimal phenotype value, the phenotypic diversity within populations and the divergence between populations reach evolutionary equilibria, which describe stabilizing selection. We compute the equilibrium distributions of both quantities analytically and we show that the ratio of mean divergence and diversity depends on the strength of selection in a universal way: it is largely independent of the phenotype’s genomic encoding and of the recombination rate. This establishes a new method for the inference of selection on molecular phenotypes beyond the genome level. We discuss the implications of our findings for the predictability of evolutionary processes.

Efficient Identification of Equivalences in Dynamic Graphs and Pedigree Structures

Efficient Identification of Equivalences in Dynamic Graphs and Pedigree Structures
Hoyt Koepke, Elizabeth Thompson
(Submitted on 16 Jan 2013)

We propose a new framework for designing test and query functions for complex structures that vary across a given parameter such as genetic marker position. The operations we are interested in include equality testing, set operations, isolating unique states, duplication counting, or finding equivalence classes under identifiability constraints. A motivating application is locating equivalence classes in identity-by-descent (IBD) graphs, graph structures in pedigree analysis that change over genetic marker location. The nodes of these graphs are unlabeled and identified only by their connecting edges, a constraint easily handled by our approach. The general framework introduced is powerful enough to build a range of testing functions for IBD graphs, dynamic populations, and other structures using a minimal set of operations. The theoretical and algorithmic properties of our approach are analyzed and proved. Computational results on several simulations demonstrate the effectiveness of our approach.

Mandated data archiving greatly improves access to research data

Mandated data archiving greatly improves access to research data
Timothy H. Vines, Rose L. Andrew, Dan G. Bock, Michelle T. Franklin, Kimberly J. Gilbert, Nolan C. Kane, Jean-S├ębastien Moore, Brook T. Moyers, S├ębastien Renaut, Diana J. Rennison, Thor Veen, Sam Yeaman
(Submitted on 16 Jan 2013)

The data underlying scientific papers should be accessible to researchers both now and in the future, but how best can we ensure that these data are available? Here we examine the effectiveness of four approaches to data archiving: no stated archiving policy, recommending (but not requiring) archiving, and two versions of mandating data deposition at acceptance. We control for differences between data types by trying to obtain data from papers that use a single, widespread population genetic analysis, STRUCTURE. At one extreme, we found that mandated data archiving policies that require the inclusion of a data availability statement in the manuscript improve the odds of finding the data online almost a thousand-fold compared to having no policy. However, archiving rates at journals with less stringent policies were only very slightly higher than those with no policy at all. We also assessed the effectiveness of asking for data directly from authors and obtained over half of the requested datasets, albeit with about 8 days delay and some disagreement with authors. Given the long term benefits of data accessibility to the academic community, we believe that journal based mandatory data archiving policies and mandatory data availability statements should be more widely adopted.

Loss of amyloid disaggregases during the evolution of Metazoa

Loss of amyloid disaggregases during the evolution of Metazoa
Albert Erives, Jan Fassler
(Submitted on 15 Jan 2013)

In yeast, phenotypic adaptations can evolve by natural selection of conformational variant prions and their variant amyloid fibers. This system requires the Hsp104 disaggregase, which fragments amyloid fibers into smaller seed prions that are passed on to mitotic descendants and meiotic spores. Interestingly, Hsp104 is found in diverse eukaryotes except metazoans. To investigate whether a prion-based transmission “genetics” was incompatible with the evolution of Metazoa, we identify genes conserved in fungi and choanoflagellates but lost in animals. We show that both eukaryotic clpB amyloid disaggregases, HSP104 and its nuclear-encoded mitochondrial endo-ortholog HSP78, were lost in the stem-metazoan lineage along with only a small number of other relevant genes. We show that these gene losses are not unrelated historical accidents because these loci comprise a very small regulon devoted to prion transmission in yeast. We propose that evolution of developmental asymmetric cell-specifications necessitated the evolutionary deprecation of the ancient clpB system.

Strong Purifying Selection at Synonymous Sites in D. melanogaster

Strong Purifying Selection at Synonymous Sites in D. melanogaster
David S. Lawrie, Philipp W. Messer, Ruth Hershberg, Dmitri A. Petrov
(Submitted on 15 Jan 2013)

Synonymous sites are generally assumed to be subject to weak selective constraint. For this reason, they are often neglected as a possible source of important functional variation. We use site frequency spectra from deep population sequencing data to show that, contrary to this expectation, 22% of four-fold synonymous (4D) sites in D. melanogaster evolve under very strong selective constraint while few, if any, appear to be under weak constraint. Linking polymorphism with divergence data, we further find that the fraction of synonymous sites exposed to strong purifying selection is higher for those positions that show slower evolution on the Drosophila phylogeny. The function underlying the inferred strong constraint appears to be separate from splicing enhancers, nucleosome positioning, and the translational optimization generating canonical codon bias. The fraction of synonymous sites under strong constraint within a gene correlates well with gene expression, particularly in the mid-late embryo, pupae, and adult developmental stages. Genes enriched in strongly constrained synonymous sites tend to be particularly functionally important and are often involved in key developmental pathways. Given that the observed widespread constraint acting on synonymous sites is likely not limited to Drosophila, the role of synonymous sites in genetic disease and adaptation should be reevaluated.

Does your gene need a background check? How genetic background impacts the analysis of mutations, genes, and evolution

Does your gene need a background check? How genetic background impacts the analysis of mutations, genes, and evolution
Chris H. Chandler, Sudarshan Chari, Ian Dworkin
(Submitted on 12 Jan 2013)

The premise of genetic analysis is that a causal link exists between phenotypic and allelic variation. Yet it has long been documented that mutant phenotypes are not a simple result of a single DNA lesion, but rather are due to interactions of the focal allele with other genes and the environment. Although an experimentally rigorous approach, focusing on individual mutations and isogenic control strains, has facilitated amazing progress within genetics and related fields, a glimpse back suggests that a vast complexity has been omitted from our current understanding of allelic effects. Armed with traditional genetic analyses and the foundational knowledge they have provided, we argue that the time and tools are ripe to return to the under-explored aspects of gene function and embrace the context-dependent nature of genetic effects. We assert that a broad understanding of genetic effects and the evolutionary dynamics of alleles requires identifying how mutational outcomes depend upon the wild-type genetic background. Furthermore, we discuss how best to exploit genetic background effects to broaden genetic research programs.