Enhanced Transcriptome Maps from Multiple Mouse Tissues Reveal Evolutionary Constraint in Gene Expression for Thousands of Genes
Dmitri Pervouchine, Sarah Djebali, Alessandra Breschi, Carrie A Davis, Pablo Prieto Barja, Alex Dobin, Andrea Tanzer, Julien Lagarde, Chris Zaleski, Lei-Hoon See, Meagan Fastuca, Jorg Drenkow, Huaien Wang, Giovanni Bussotti, Baikang Pei, Suganthi Balasubramanian, Jean Monlong, Arif Harmanci, Mark Gerstein, Michael A Beer, Cedric Notredame, Roderic Guigo, Thomas R Gingeras
We characterized by RNA-seq the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles obtained in human cell lines reveals substantial conservation of transcriptional programs, and uncovers a distinct class of genes with levels of expression across cell types and species, that have been constrained early in vertebrate evolution. This core set of genes capture a substantial and constant fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but it is associated with strong and conserved epigenetic marking, as well as to a characteristic post-transcriptional regulatory program in which sub-cellular localization and alternative splicing play comparatively large roles.
Comparative genomics reveals the origins and diversity of arthropod immune systems
William J Palmer, Francis M Jiggins
While the innate immune system of insects is well-studied, comparatively little is known about how other arthropods defend themselves against infection. We have characterised key immune components in the genomes of five chelicerates, a myriapod and a crustacean. We found clear traces of an ancient origin of innate immunity, with some arthropods having Tolllike receptors and C3-complement factors that are more closely related in sequence or structure to vertebrates than other arthropods. Across the arthropods some components of the immune system, like the Toll signalling pathway, are highly conserved. However, there is also remarkable diversity. The chelicerates apparently lack the Imd signalling pathway and BGRPs–a key class of pathogen recognition receptors. Many genes have large copy number variation across species, and this may sometimes be accompanied by changes in function. For example, peptidoglycan recognition proteins (PGRPs) have frequently lost their catalytic activity and switch between secreted and intracellular forms. There has been extensive duplication of the cellular immune receptor Dscam in several species, which may be an alternative way to generate the high diversity that produced by alternative splicing in insects. Our results provide a detailed analysis of the immune systems of several important groups of animals and lay the foundations for functional work on these groups.
Counterinsurgency Doctrine Applied to Infectious Disease
Benjamin C Kirkup
Recent scientific discoveries lead inexorably to the conclusion that the ‘total human’ incorporates a necessary body of numerous microbes, including bacteria. These bacteria play a very important role in immunity by actively resisting infections by outside bacteria; however, under certain conditions they can degrade their community. They can arrogate to themselves resources that normally flow through other metabolic pathways and form persistent biological structures. In this situation, these bacteria constitute an insurgency, with strategic ramifications.
Genome-wide comparative analysis reveals human- mouse regulatory landscape and evolution
Olgert Denas, Richard Sandstrom, Yong Cheng, Kathryn Beal, Javier Herrero, Ross Hardison, James Taylor
Background: Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, the relationships among sequence, conservation, and function are still poorly understood. Results: We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of TFos not showing conservation of occupancy, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest that a substantial amount of functional regulatory sequences is exapted from other biochemically active genomic material. Despite substantial repurposing of TFos, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TF – target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. Conclusion: We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.
When is selection effective?
Deleterious alleles are more likely to reach high frequency in small populations because of chance fluctuations in allele frequency. This may lead, over time, to reduced average fitness in the population. In that sense, selection is more `effective’ in larger populations. Many recent studies have considered whether the different demographic histories across human populations have resulted in differences in the number, distribution, and severity of deleterious variants, leading to an animated debate. This article seeks to clarify some terms of the debate by identifying differences in definitions and assumptions used in these studies and providing an intuitive explanation for the observed similarity in genetic load among populations. The intuition is verified through analytical and numerical calculations. First, even though rare variants contribute to load, they contribute little to load differences across populations. Second, the accumulation of non-recessive load after a bottleneck is slow for the weakly deleterious variants that contribute much of the long-term variation among populations. Whereas a bottleneck increases drift instantly, it affects selection only indirectly, so that fitness differences can keep accumulating long after a bottleneck is over. Third, drift and selection tend to have opposite effects on load differentiation under dominance models. Because of this competition, load differences across populations depend sensitively and intricately on past demographic events and on the distribution of fitness effects. A given bottleneck can lead to increased or decreased load for variants with identical fitness effects, depending on the subsequent population history. Because of this sensitivity, both classical population genetic intuition and detailed simulations are required to understand differences in load across populations.
Landscape and evolutionary dynamics of terminal-repeat retrotransposons in miniature (TRIMs) in 48 whole plant genomes
Dongying Gao, Yupeng Li, Brian Abernathy, Scott Jackson
Terminal-repeat retrotransposons in miniature (TRIMs) are structurally similar to long terminal repeat (LTR) retrotransposons except that they are extremely small and difficult to identify. Thus far, only a few TRIMs have been characterized in the euphyllophytes and the evolutionary and biological impacts and transposition mechanism of TRIMs are poorly understood. In this study, we combined de novo and homology-based methods to annotate TRIMs in 48 plant genome sequences, spanning land plants to algae. We found 156 TRIM families, 146 previously undescribed. Notably, we identified the first TRIMs in a lycophyte and non-vascular plants. The majority of the TRIM families were highly conserved and shared within and between plant families. Even though TRIMs contribute only a small fraction of any plant genome, they are enriched in or near genes and may play important roles in gene evolution. TRIMs were frequently organized into tandem arrays we called TA-TRIMs, another unique feature distinguishing them from LTR retrotransposons. Importantly, we identified putative autonomous retrotransposons that may mobilize specific TRIM elements and detected very recent transpositions of a TRIM in O. sativa. Overall, this comprehensive analysis of TRIMs across the entire plant kingdom provides insight into the evolution and conservation of TRIMs and the functional roles they may play in gene evolution.
Sharing and specificity of co-expression networks across 35 human tissues
Emma Pierson, GTEx Consortium, Daphne Koller, Alexis Battle, Sara Mostafavi
To understand the regulation of tissue-specific gene expression, the GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This data provides an opportunity for deriving shared and tissue-specific gene regulatory networks on the basis of co-expression between genes. However, a small number of samples are available for a majority of the tissues, and therefore statistical inference of networks in this setting is highly underpowered. To address this problem, we infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues. We show that this transfer learning approach increases the accuracy with which networks are learned. Analysis of these networks reveals that tissue-specific transcription factors are hubs that preferentially connect to genes with tissue-specific functions. Additionally, we observe that genes with tissue-specific functions lie at the peripheries of our networks. We identify numerous modules enriched for Gene Ontology functions, and show that modules conserved across tissues are especially likely to have functions common to all tissues, while modules that are upregulated in a particular tissue are often instrumental to tissue-specific function. Finally, we provide a web tool which allows exploration of gene function and regulation in a tissue-specific manner.