# Our paper: The effects of transcription factor competition on gene regulation

This guest post is by Radu Zabet on his papers “The effects of transcription factor competition on gene regulation” and “The influence of transcription factor competition on the relationship between occupancy and affinity”

Transcription factors (TFs) find their genomic target sites by a combination of three-dimensional diffusion and one-dimensional translocation on the DNA. We previously developed the stochastic simulation framework GRiP (http://logic.sysbiol.cam.ac.uk/grip/) that allows the realistic representation of the target finding process. The following two papers show our application of GRiP to address a few interesting phenomena:

The effects of transcription factor competition on gene regulation
arXiv:1303.6793

The binding of site-specific TFs to their genomic target sites controls the transcription rate of the target genes. In this manuscript, we discuss the influence of TF abundance on the arrival time of TFs on their target sites as well as the time they stay bound to the DNA. We investigate the TF search process using stochastic simulations and found that molecular crowding on the DNA always leads to longer times required by TF molecules to locate their target sites as well as to lower occupancy. There is also an “emergent property” in cases where many molecules compete in some sort of molecular traffic jam on the DNA. This newly identified noise component may be a contributor to transcriptional noise, by affecting both the size of the fluctuations and the distribution of the arrival times (unimodal or bimodal).

The influence of transcription factor competition on the relationship between occupancy and affinity
arXiv:1303.6869

This manuscript deals with the discrepancy between “predicted occupancy” of a TF to a binding site on the basis of, say, a PWM, in contrast to a “measured occupancy” when we simulate the system with our GRiP framework. Again, we can show that absolute TF abundances play an important role in gene expression, and also provide a compelling case where selecting “the highest peaks” from a ChIP experiment may not necessarily identify the most affine binding sites. Our results showed that for medium and high affinity sites, TF competition does not play a significant role for genomic occupancy except in cases when the abundance of the TF is significantly increased, or when the PWM displays relatively low information content. Nevertheless, for medium and low affinity sites, an increase in TF abundance (for both cognate and non-cognate molecules) leads to an increase in occupancy at several sites.

# Our paper: Improving transcriptome assembly through error correction of high-throughput sequence reads

This guest post is by Matt MacManes on his preprint with Michael Eisen, “Improving transcriptome assembly through error correction of high-throughput sequence reads“, arXived here. This is cross-posted from his blog.

I am writing this blog post in support of a paper that I have just submitted to arXiv: Improving transcriptome assembly through error correction of high-throughput sequence reads. My goal is not to talk about the nuts and bolts of the paper so much as it is to ramble about its motivation and the writing process.

First, a little bit about me, as this is my 1st paper with my postdoctoral advisor, Mike Eisen. In short, I am a evolutionary biologist by training, having done my PhD on the relationship between mating systems and immunogenes in wild rodents. My postdoc work focuses on adaptation to desert life in rodents- I work on Peromyscus rodents in the Southern California deserts, combining field work and genomics. My overarching goals include the ability to operate in multiple domains– genomics, field biology, evolutionary biology to better understand basic questions– the links between genotype and phenotype, adaptation, etc… OK, enough.. on the the paper.

Abstract:

The study of functional genomics–particularly in non-model organisms has been dramatically improved over the last few years by use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure–the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on, and while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, by reducing assembly error by nearly 50%, and therefore should be applied to all datasets.

For the past couple of years, I have had an interest in better understanding the dynamics of de novo transcriptome assembly.. I had mostly selfish/practical reasons for wanting to understand–a large amount of my work depends on getting these assemblies ‘right’.. It was quickly evident that much of the computational research is directed at assembly itself, and very little on the pre- and post-assembly processes.. We know these things are important, but often an understanding of their effects is lacking…

How error correction of sequencing reads affects assembly accuracy has been one of the specific ideas I’ve been interested in thinking about for the past several months. The idea of simulating RNAseq reads, applying various error corrections, then understanding their effects is logical– so much so that I was really surprised that this has not been done before. So off I went..

I wrote this paper over the coarse of a couple of weeks. It is a short and simple paper, and was quite easy to write. Of note, about 75% of the paper was written on the playground in the UC Berkeley University Village, while (loosely) providing supervision for my 2 youngest daughters. How is that for work-life balance!

The read data will be available on Figshare, and I owe thanks to those guys for lifting the upload limit- the read file is 2.6Gb with .bz2 compression, so not huge, but not small either. The winning (AllPathsLG corrected) assembly is there as well.

This type of work is inspired, in a very real sense, by C. Titus Brown, who is quickly becoming to be the go-to guy for understanding the nuts and bolts of genome assembly (and also got tenure based on his klout score HA!). His post and paper on The challenges of mRNAseq analysis is the type of stuff that I aspire to…

Anyway, I’d be really interested in hearing what you all think of the paper, so read, enjoy, commentand get to error correcting those reads!

# Improving transcriptome assembly through error correction of high-throughput sequence reads

Improving transcriptome assembly through error correction of high-throughput sequence reads
Matthew D MacManes, Michael B Eisen
(Submitted on 3 Apr 2013)

The study of functional genomics–particularly in non-model organisms has been dramatically improved over the last few years by use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure–the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on, and while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, by reducing assembly error by nearly 50%, and therefore should be applied to all datasets.

# Concurrent and Accurate RNA Sequencing on Multicore Platforms

Concurrent and Accurate RNA Sequencing on Multicore Platforms
Héctor Martínez (1), Joaquín Tárraga (2), Ignacio Medina (2), Sergio Barrachina (1), Maribel Castillo (1), Joaquín Dopazo (2), Enrique S. Quintana-Ortí (1) ((1) Dpto. de Ingeniería y Ciencia de los Computadores, Universidad Jaume I, Castellón, Spain, (2) Computational Genomics Institute, Centro de Investigación Príncipe Felipe, Valencia, Spain)
(Submitted on 2 Apr 2013)

In this paper we introduce a novel parallel pipeline for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, named HPG-Aligner, leverages the speed of the Burrows-Wheeler Transform to map a large number of RNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, that is employed to deal with conflictive reads. The aligner is complemented with a careful strategy to detect splice junctions based on the division of RNA reads into short segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing useful information for the successful alignment of the complete reads.
Experimental results on platforms with AMD and Intel multicore processors report the remarkable parallel performance of HPG-Aligner, on short and long RNA reads, which excels in both execution time and sensitivity to an state-of-the-art aligner such as TopHat 2 built on top of Bowtie and Bowtie 2.

# Low-virulence Strains of Toxoplasma gondii Result in Permanent Loss of Innate Fear of Cats in Mice, Even after Parasite Clearance

Low-virulence Strains of Toxoplasma gondii Result in Permanent Loss of Innate Fear of Cats in Mice, Even after Parasite Clearance
Wendy Marie Ingram, Leeanne M Goodrich, Ellen A Robey, Michael B Eisen
(Submitted on 1 Apr 2013)

Toxoplasma gondii chronic infection in rodent secondary hosts has been reported to lead to a loss of innate, hard-wired fear toward cats, its primary host. However the generality of this response across T. gondii strains and the underlying mechanism for this pathogenmediated behavioral change remain unknown. To begin exploring these questions, we evaluated the effects of infection with isolates from the three major North American clonal lineages of T. gondii. Using an hour-long open field activity assay optimized for this purpose, we measured mouse aversion toward predator and non-predator urines. We show that loss of innate aversion of cat urine is a general trait caused by infection with all three major clonal lineages of parasite. Surprisingly, we found that infection with an attenuated Type I parasite results in sustained loss of fear at times post infection when neither parasite nor ongoing brain inflammation were detectable. This suggests that T. gondii-mediated interruption of mouse innate aversion of cats may occur during early acute infection in a permanent manner, not requiring persistence of parasite cysts or continuing brain inflammation.

# A new approach to estimate directional genetic differentiation and asymmetric migration patterns

A new approach to estimate directional genetic differentiation and asymmetric migration patterns
Lisa Sundqvist, Martin Zackrisson, David Kleinhans
(Submitted on 30 Mar 2013)

In the field of population genetics measures of genetic differentiation are widely used to gather information on the structure and the amount of gene flow between populations. These indirect measures are based on a number of simplifying assumptions, for instance equal population size and symmetric migration. Structured populations with asymmetric migration patterns, frequently occur in nature and information about directional gene flow would here be of great interest. Nevertheless current measures of genetic differentiation cannot be used in such systems without violating the assumptions. To get information on asymmetric migration patterns from genetic data rather complex models using maximum likelihood or Bayesian approaches generally need to be applied. In such models a large number of parameters are estimated simultaneously and this involves complex optimization algorithms. We here introduce a new approach that intends to fill the gap between the complex approaches and the symmetric measures of genetic differentiation. Our approach makes it possible to calculate a directional component of genetic differentiation at low computational effort using any of the classical measures of genetic differentiation. The approach is based on defining a pool of migrants for any pair of populations and calculating measures for genetic differentiation between the populations and the respective pools. The directional measures of genetic differentiation can further be used to calculate asymmetric migration. The procedure is demonstrated with a simulated data set with known migration pattern. A comparison of the estimation results with the migration pattern used for simulation suggests, that our method captures relevant properties of migration patterns even at low migration frequencies and with few marker loci.

# Detecting range expansions from genetic data

Detecting range expansions from genetic data

Benjamin M Peter, Montgomery Slatkin
(Submitted on 29 Mar 2013)

We propose a method that uses genetic data to test for the occurrence of a recent range expansion and to infer the location of the origin of the expansion. We introduce a statistic for pairs of populations $\psi$ (the directionality index) that detects asymmetries in the two-dimensional allele frequency spectrum caused by the series of founder events that happen during an expansion. Such asymmetry arises because low frequency alleles tend to be lost during founder events, thus creating clines in the frequencies of surviving low-frequency alleles. Using simulations, we further show that $\psi$ is more powerful for detecting range expansions than both $F_{ST}$ and clines in heterozygosity. We illustrate the utility of $\psi$ by applying it to a data set from modern humans and show how we can include more complicated scenarios such as multiple expansion origins or barriers to migration in the model.

# Most viewed on Haldane’s Sieve, March 2013

The most viewed preprints on Haldane’s Sieve in March 2013 were: