The developmental transcriptome of contrasting Arctic charr (Salvelinus alpinus) morphs
Jóhannes Gudbrandsson, Ehsan P Ahi, Kalina H Kapralova, Sigrídur R Franzdottir, Bjarni K Kristjánsson, Sophie S Steinhaeuser, Ísak M Jóhannesson, Valerie H Maier, Sigurdur S Snorrason, Zophonías O Jónsson, Arnar Pálsson
Species showing repeated evolution of similar traits can help illuminate the molecular and developmental basis of diverging traits and specific adaptations. Following the last glacial period, dwarfism and specialized bottom feeding morphology evolved rapidly in several landlocked Arctic charr (Salvelinus alpinus) populations in Iceland. In order to study the genetic divergence between small benthic morphs and larger morphs with limnetic morphotype, we conducted an RNA-seq transcriptome analysis of developing charr. We sequenced mRNA from whole embryos at four stages in early development of two stocks with very different morphologies, the small benthic (SB) charr from Lake Thingvallavatn and Holar aquaculture (AC) charr. The data reveal significant differences in expression of several biological pathways during charr development. There is also a difference between SB- and AC-charr in mitochondrial genes involved in energy metabolism and blood coagulation genes. We confirmed expression difference of five genes in whole embryos with qPCR, including lysozyme and natterin which was previously identified as a fish-toxin of a lectin family that may be a putative immunopeptide. We verified differential expression of 7 genes in developing heads, and the expression associated consistently with benthic v.s. limnetic charr (studied in 4 morphs total). Comparison of Single nucleotide polymorphism (SNP) frequencies reveals extensive genetic differentiation between the SB- and AC-charr (60 fixed SNPs and around 1300 differing more than 50% in frequency). In SB-charr the high frequency derived SNPs are in genes related to translation and oxidative processes. Curiously, several derived SNPs reside in the 12s and 16s mitochondrial ribosomal RNA genes, including a base highly conserved among fishes. The data implicate multiple genes and molecular pathways in divergence of small benthic charr and/or the response of aquaculture charr to domestication. Functional, genetic and population genetic studies on more freshwater and anadromous populations are needed to confirm the specific loci and mutations relating to specific ecological or domestication traits in Arctic charr.
Enhanced Transcriptome Maps from Multiple Mouse Tissues Reveal Evolutionary Constraint in Gene Expression for Thousands of Genes
Dmitri Pervouchine, Sarah Djebali, Alessandra Breschi, Carrie A Davis, Pablo Prieto Barja, Alex Dobin, Andrea Tanzer, Julien Lagarde, Chris Zaleski, Lei-Hoon See, Meagan Fastuca, Jorg Drenkow, Huaien Wang, Giovanni Bussotti, Baikang Pei, Suganthi Balasubramanian, Jean Monlong, Arif Harmanci, Mark Gerstein, Michael A Beer, Cedric Notredame, Roderic Guigo, Thomas R Gingeras
We characterized by RNA-seq the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles obtained in human cell lines reveals substantial conservation of transcriptional programs, and uncovers a distinct class of genes with levels of expression across cell types and species, that have been constrained early in vertebrate evolution. This core set of genes capture a substantial and constant fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but it is associated with strong and conserved epigenetic marking, as well as to a characteristic post-transcriptional regulatory program in which sub-cellular localization and alternative splicing play comparatively large roles.
Genome-wide comparative analysis reveals human- mouse regulatory landscape and evolution
Olgert Denas, Richard Sandstrom, Yong Cheng, Kathryn Beal, Javier Herrero, Ross Hardison, James Taylor
Background: Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, the relationships among sequence, conservation, and function are still poorly understood. Results: We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of TFos not showing conservation of occupancy, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest that a substantial amount of functional regulatory sequences is exapted from other biochemically active genomic material. Despite substantial repurposing of TFos, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TF – target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. Conclusion: We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.
Sharing and specificity of co-expression networks across 35 human tissues
Emma Pierson, GTEx Consortium, Daphne Koller, Alexis Battle, Sara Mostafavi
To understand the regulation of tissue-specific gene expression, the GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This data provides an opportunity for deriving shared and tissue-specific gene regulatory networks on the basis of co-expression between genes. However, a small number of samples are available for a majority of the tissues, and therefore statistical inference of networks in this setting is highly underpowered. To address this problem, we infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues. We show that this transfer learning approach increases the accuracy with which networks are learned. Analysis of these networks reveals that tissue-specific transcription factors are hubs that preferentially connect to genes with tissue-specific functions. Additionally, we observe that genes with tissue-specific functions lie at the peripheries of our networks. We identify numerous modules enriched for Gene Ontology functions, and show that modules conserved across tissues are especially likely to have functions common to all tissues, while modules that are upregulated in a particular tissue are often instrumental to tissue-specific function. Finally, we provide a web tool which allows exploration of gene function and regulation in a tissue-specific manner.
Genome-wide characterization of RNA editing in chicken: lack of evidence for non-A-to-I events
Laure Frésard, Sophie Leroux, Pierre-François Roux, C Klopp, Stéphane Fabre, Diane Esquerré, Patrice Dehais, Anis Djari, David Gourichon, Sandrine Lagarrigue, Frédérique Pitel
RNA editing corresponds to a post-transcriptional nucleotide change in the RNA sequence, creating an alternative nucleotide, not present in the DNA sequence. This leads to a diversification of transcription products with potential functional consequences. Two nucleotide substitutions are mainly described in animals, from adenosine to inosine (A-to-I) and from cytidine to uridine (C-to-U). This phenomenon is more and more described in mammals, notably since the availability of next generation sequencing technologies allowing a whole genome screening of RNA-DNA differences. The number of studies recording RNA editing in other vertebrates like chicken are still limited. We chose to use high throughput sequencing technologies to search for RNA editing in chicken, to understand to what extent this phenomenon is conserved in vertebrates. We performed RNA and DNA sequencing from 8 embryos. Being aware of common pitfalls inherent to sequence analyses leading to false positive discovery, we stringently filtered our datasets and found less than 40 reliable candidates. Conservation of particular sites of RNA editing was attested by the presence of 3 edited sites previously detected in mammals. We then characterized editing levels for selected candidates in several tissues and at different time points, from 4.5 days of embryonic development to adults, and observed a clear tissue-specificity and a gradual editing level increase with time. By characterizing the RNA editing landscape in chicken, our results highlight the extent of evolutionary conservation of this phenomenon within vertebrates, and provide support of an absence of non A-to-I events from the chicken transcriptome.
Functional analysis and co-evolutionary model of chromatin and DNA methylation networks in embryonic stem cells
Enrique Carrillo de Santa Pau, Juliane Perner, David Juan, Simone Marsili, David Ochoa, Ho-Ryun Chung, Daniel Rico, Martin Vingron, Alfonso Valencia
We have analyzed publicly available epigenomic data of mouse embryonic stem cells (ESCs) combining diverse next-generation sequencing (NGS) studies (139 experiments from 30 datasets with a total of 77 epigenomic features) into a homogeneous dataset comprising various cytosine modifications (5mC, 5hmC and 5fC), histone marks and Chromatin related Proteins (CrPs). We applied a set of newly developed statistical analysis methods with the goal of understanding the associations between chromatin states, detecting co-occurrence of DNA-protein binding and epigenetic modification events, as well as detecting coevolution of core CrPs. The resulting networks reveal the complex relations between cytosine modifications and protein complexes and their dependence on defined ESC chromatin contexts. A detailed analysis allows us to detect proteins associated to particular chromatin states whose functions are related to the different cytosine modifications, i.e. RYBP with 5fC and 5hmC, NIPBL with 5hmC and OGT with 5hmC. Moreover, in a co-evolutionary analysis suggesting a central role of the Cohesin complex in the evolution of the epigenomic network, as well as strong co-evolutionary links between proteins that co-locate in the ESC epigenome with DNA methylation (MBD2 and CBX3) and hydroxymethylation (TET1 and KDM2A). In summary, the new application of computational methodologies reveals the complex network of relations between cytosine modifications and epigenomic players that is essential in shaping the molecular state of ESCs.
Cross-population Meta-analysis of eQTLs: Fine Mapping and Functional Study
Xiaoquan Wen, Francesca Luca, Roger Pique-Regi
Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at the molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent cis-eQTL signals that are consistently presented across populations, accounting for population heterogeneity in allele frequencies and linkage disequilibrium patterns. Furthermore, by integrating genomic annotations, our analysis framework enables high-resolution functional analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) jointly analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTL. ii) many genes harbor multiple independent eQTLs in their cis regions iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10-22).