Differential meta-analysis of RNA-seq data from multiple studies
Andrea Rau (GABI), Guillemette Marot (INRIA Lille – Nord Europe, CERIM), Florence Jaffrézic (GABI)
(Submitted on 16 Jun 2013)
High-throughput sequencing is now regularly used for studies of the transcriptome (RNA-seq), particularly for comparisons among experimental conditions. For the time being, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential expression. As their cost continues to decrease, it is likely that additional follow-up studies will be conducted to re-address the same biological question. We demonstrate how p-value combination techniques previously used for microarray meta-analyses can be used for the differential analysis of RNA-seq data from multiple related studies. These techniques are compared to a negative binomial generalized linear model (GLM) including a fixed study effect on simulated data and real data on human melanoma cell lines. The GLM with fixed study effect performed well for low inter-study variation and small numbers of studies, but was outperformed by the meta-analysis methods for moderate to large inter-study variability and larger numbers of studies. To conclude, the p-value combination techniques illustrated here are a valuable tool to perform differential meta-analyses of RNA-seq data by appropriately accounting for biological and technical variability within studies as well as additional study-specific effects. An R package metaRNASeq is available on the R Forge.
This is a guest post by Yarden Katz [@yardenkatz] on his paper (along with coauthors): katz et al. Sashimi plots: Quantitative visualization of RNA sequencing read alignments arXived here
A first draft of our paper Sashimi plots: Quantitative visualization of RNA sequencing read alignments is now available. Sashimi plots are a simple visualization of RNA sequencing data, intended to make it easier to detect differentially spliced exons across multiple RNA-Seq samples. In a Sashimi plot, RNA-Seq reads are summarized as read densities, and junction reads are collapsed into arcs whose width is proportional to the number of reads spanning the exons connected by the arc. See the paper for examples.
We call it a Sashimi plot in part because of the impeccable resemblance of bumpy RNA-Seq read densities in exons to small pieces of Sashimi, and also because we tried to keep the plots as close to the “raw” data as possible. While Sashimi plots can display estimates of isoform abundance levels from programs like MISO, the goal here was to summarize the read alignments as they are, without further processing or inference, so that conclusions from probabilistic models can be visually verified.
The original Sashimi plot program is a command line utility that makes customizable Sashimi plots using Python (using the matplotlib library). Recently, the IGV genome browser team implemented a version of Sashimi plots in their browser (see installation instructions.) This allows Sashimi plots to be made dynamically for any genomic region of interest, at a resolution set by the zoom in/out features of the browser. The plot can be made for all or a subset of the tracks loaded, and the scales can be adjusted by the user as in the main IGV window. Both the static, Python-based version of Sashimi plots and the dynamic version within IGV are available and actively maintained, and code bases for both are available on GitHub.
Sashimi plots still have important limitations. First, the junction arcs can get messy for genes with many alternative isoforms. This can be partially addressed by looking at simplified event annotations (e.g. ones containing only two isoforms, or a handful of isoforms, as in these annotations) rather than making plots for the full set of isoforms of a gene. The second limitation is that sometimes subtle differences are not readily seen from junction arc widths. We’re considering alternative representations (such as circle area or diameter) for quantitatively representing junction read counts.
The paper is meant primarily as advertisement for the software. We hope that other members of the RNA processing/sequencing community will find this useful and come up with their own variants of these plots.
Analysis and rejection sampling of Wright-Fisher diffusion bridges
Joshua G. Schraiber, Robert C. Griffiths, Steven N. Evans
(Submitted on 14 Jun 2013)
We investigate the properties of a Wright-Fisher diffusion process started from frequency x at time 0 and conditioned to be at frequency y at time T. Such a process is called a bridge. Bridges arise naturally in the analysis of selection acting on standing variation and in the inference of selection from allele frequency time series. We establish a number of results about the distribution of neutral Wright-Fisher bridges and develop a novel rejection sampling scheme for bridges under selection that we use to study their behavior.
Phylogenetic analysis accounting for age-dependent death and sampling with applications to epidemics
Amaury Lambert, Helen K. Alexander, Tanja Stadler
(Submitted on 14 Jun 2013)
The reconstruction of phylogenetic trees based on viral genetic sequence data sequentially sampled from an epidemic provides estimates of the past transmission dynamics, by fitting epidemiological models to these trees. To our knowledge, none of the epidemiological models currently used in phylogenetics can account for recovery rates and sampling rates dependent on the time elapsed since transmission.
Here we introduce an epidemiological model where infectives leave the epidemic, either by recovery or sampling, after some random time which may follow an arbitrary distribution.
We derive an expression for the likelihood of the phylogenetic tree of sampled infectives under our general epidemiological model. The analytic concept developed in this paper will facilitate inference of past epidemiological dynamics and provide an analytical framework for performing very efficient simulations of phylogenetic trees under our model. The main idea of our analytic study is that the non-Markovian epidemiological model giving rise to phylogenetic trees growing vertically as time goes by, can be represented by a Markovian “coalescent point process” growing horizontally by the sequential addition of pairs of coalescence and sampling times.
As examples, we discuss two special cases of our general model, namely an application to influenza and an application to HIV. Though phrased in epidemiological terms, our framework can also be used for instance to fit macroevolutionary models to phylogenies of extant and extinct species, accounting for general species lifetime distributions.
Sashimi plots: Quantitative visualization of RNA sequencing read alignments
Yarden Katz, Eric T. Wang, Jacob Silterra, Schraga Schwartz, Bang Wong, Jill P. Mesirov, Edoardo M. Airoldi, Christopher B. Burge
(Submitted on 14 Jun 2013)
We introduce Sashimi plots, a quantitative multi-sample visualization of mRNA sequencing reads aligned to gene annotations. Sashimi plots are made using alignments (stored in the SAM/BAM format) and gene model annotations (in GFF format), which can be custom-made by the user or obtained from databases such as Ensembl or UCSC. We describe two implementations of Sashimi plots: (1) a stand-alone command line implementation aimed at making customizable publication quality figures, and (2) an implementation built into the Integrated Genome Viewer (IGV) browser, which enables rapid and dynamic creation of Sashimi plots for any genomic region of interest, suitable for exploratory analysis of alternatively spliced regions of the transcriptome. Isoform expression estimates outputted by the MISO program can be optionally plotted along with Sashimi plots. Sashimi plots can be used to quickly screen differentially spliced exons along genomic regions of interest and can be used in publication quality figures. The Sashimi plot software and documentation is available from: this http URL
Dynamic Transcript Profiling of Candida Albicans Infection in Zebrafish: a Pathogen-Host Interaction Study
Yan Yu Chen, Chun-Cheih Chao, Fu-Chen Liu, Po-Chen Hsu, Hsueh-Fen Chen, Shih-Chi Peng, Yung-Jen Chuang, Chung-Yu Lan, Wen-Ping Hsieh, David Shan Hill Wong
(Submitted on 14 Jun 2013)
Candida albicans is responsible for a number of life-threatening infections and causes considerable morbidity and mortality in immunocompromised patients. Previous studies of C. albicans pathogenesis have suggested several steps must occur before virulent infection, including early adhesion, invasion, and late tissue damage. However, the mechanism that triggers C. albicans transformation from yeast to hyphae form during infection has yet to be fully elucidated. This study used a systems biology approach to investigate C. albicans infection in zebrafish. The surviving fish were sampled at different post-infection time points to obtain time-lapsed, genome-wide transcriptomic data from both organisms, which were accompanied with in sync histological analyses. Principal component analysis (PCA) was used to analyze the dynamic gene expression profiles of significant variations in both C. albicans and zebrafish. The results categorized C. albicans infection into three progressing phases: adhesion, invasion, and damage. Such findings were highly supported by the corresponding histological analysis. Furthermore, the dynamic interspecies transcript profiling revealed that C. albicans activated its filamentous formation during invasion and the iron scavenging functions during the damage phases, whereas zebrafish ceased its iron homeostasis function following massive hemorrhage during the later stages of infection. This was followed by massive hemorrhaging toward the end stage of infection. Most of the immune related genes were expressed as the infection progressed from invasion to the damage phase. Such global, inter-species evidence of virulence-immune and iron competition dynamics during C. albicans infection could be crucial in understanding control fungal pathogenesis.
Predicting the loss of phylogenetic diversity under non-stationary diversification models
Amaury Lambert, Mike Steel
(Submitted on 12 Jun 2013)
For many taxa, the current high rates of extinction are likely to result in a significant loss of biodiversity. The evolutionary heritage of biodiversity is frequently quantified by a measure called phylogenetic diversity (PD). We predict the loss of PD under a wide class of phylogenetic tree models, where speciation rates and extinction rates may be time-dependent, and assuming independent random species extinctions at the present. We study the loss of PD when $K$ contemporary species are selected uniformly at random from the $N$ extant species as the surviving taxa, while the remaining $N-K$ become extinct. We consider two models of species sampling, the so-called field of bullets model, where each species independently survives the extinction event at the present with probability $p$, and a model for which the number of surviving species is fixed.
We provide explicit formulae for the expected remaining PD in both models, conditional on $N=n$, conditional on $K=k$, or conditional on both events. When $N=n$ is fixed, we show the convergence to an explicit deterministic limit of the ratio of new to initial PD, as $n\to\infty$, both under the field of bullets model, and when $K=k_n$ is fixed and depends on $n$ in such a way that $k_n/n$ converges to $p$. We also prove the convergence of this ratio as $T\to\infty$ in the supercritical, time-homogeneous case, where $N$ simultaneously goes to $\infty$, thereby strengthening previous results of Mooers et al. (2012).
The Moran model with selection: Fixation probabilities, ancestral lines, and an alternative particle representation
Sandra Kluth, Ellen Baake
(Submitted on 12 Jun 2013)
We reconsider the Moran model in continuous time with population size $N$, two types, and selection. We introduce a new particle representation, which we call labelled Moran model, and which has the same empirical type distribution as the original Moran model, provided the initial values are chosen appropriately. In the new model, individuals are labelled $1,2, \dots, N$; neutral resampling events may take place between arbitrary labels, whereas selective events only occur in the direction of increasing labels. With the help of elementary methods only, we do not only recover fixation probabilities, but obtain detailed insight into the number and nature of the selective events that play a role in the fixation process forward in time.
Incentive Processes in Finite Populations
Marc Harper, Dashiell Fryer
(Submitted on 11 Jun 2013)
We define the incentive process, a natural generalization of the Moran process incorporating evolutionary updating mechanisms corresponding to well-known evolutionary dynamics, such as the logit, projection, and best-reply dynamics. Fixation probabilities and internal stable states are given for a variety of incentives, including new closed-forms, as well as results relating fixation probabilities for members of two one-parameter families of incentive processes. We show that the behaviors of the incentive process can deviate significantly from the analogous properties of deterministic evolutionary dynamics in some ways but are similar in others. For example, while the fixation probabilities change, their ratio remains constant.
Interfertile oaks in an island environment: I. High nuclear genetic differentiation and high degree of chloroplast DNA sharing between Q. alnifolia and Q. coccifera in Cyprus. A multipopulation study
Charalambos Neophytou, Aikaterini Dounavi, Siegfried Fink, Filippos A. Aravanopoulos
(Submitted on 11 Jun 2013)
The evergreen Quercus alnifolia and Q. coccifera form the only interfertile pair of oak species growing in Cyprus. Hybridization between the two species has already been observed and studied morphologically. However, little evidence exists about the extent of genetic introgression. In the present study, we aimed to study the effects of introgressive hybridization mutually on both chloroplast and nuclear genomes. We sampled both pure and mixed populations of Q. alnifolia and Q. coccifera from several locations across their distribution area in Cyprus. We analyzed the genetic variation within and between species by conducting Analysis of Molecular Variance (AMOVA) based on nuclear microsatellites. Population genetic structure and levels of admixture were studied by means of a Bayesian analysis (STRUCTURE simulation analysis). Chloroplast DNA microsatellites were used for a spatial analysis of genetic barriers. The main part of the nuclear genetic variation was explained by partition into species groups. High interspecific differentiation and low admixture of nuclear genomes, both in pure and mixed populations, support limited genetic introgression between Q. alnifolia and Q. coccifera in Cyprus. On the contrary, chloroplast DNA haplotypes were shared between the species and were locally structured suggesting cytoplasmic introgression. Occasional hybridization events followed by backcrossings with both parental species might lead to this pattern of genetic differentiation.