Nonspecic transcription factor binding reduces variability in transcription factor and target protein expression

Nonspecic transcription factor binding reduces variability in transcription factor and target protein expression

Mohammad Soltani, Pavol Bokes, Zachary Fox, Abhyudai Singh
(Submitted on 11 May 2014)

Transcription factors (TFs) interact with a multitude of binding sites on DNA and partner proteins inside cells. We investigate how nonspecific binding/unbinding to such decoy binding sites affects the magnitude and time-scale of random fluctuations in TF copy numbers arising from stochastic gene expression. A stochastic model of TF gene expression, together with decoy site interactions is formulated. Distributions for the total (bound and unbound) and free (unbound) TF levels are derived by analytically solving the chemical master equation under physiologically relevant assumptions. Our results show that increasing the number of decoy binding sides considerably reduces stochasticity in free TF copy numbers. The TF autocorrelation function reveals that decoy sites can either enhance or shorten the time-scale of TF fluctuations depending on model parameters. To understand how noise in TF abundances propagates downstream, a TF target gene is included in the model. Intriguingly, we find that noise in the expression of the target gene decreases with increasing decoy sites for linear TF-target protein dose-responses, even in regimes where decoy sites enhance TF autocorrelation times. Moreover, counterintuitive noise transmissions arise for nonlinear dose-responses. In summary, our study highlights the critical role of molecular sequestration by decoy binding sites in regulating the stochastic dynamics of TFs and target proteins at the single-cell level.

Author post: Tandem duplications and the limits of natural selection in Drosophila yakuba and Drosophila simulans

This guest post is by Rebekah Rogers (@evolscientist) on her paper with coauthors “Tandem duplications and the limits of natural selection in Drosophila yakuba and Drosophila simulans” arXived here.

Tandem duplications are widely recognized as a source of genetic novelty. Duplication of gene sequences can result in adaptive evolution through the development of novel functions or specialization in subsets of ancestral functions when ‘spare parts’ are relieved of evolutionary constraints. Additionally, tandem duplications have the potential to create entirely novel gene structures through chimeric gene formation and recruitment of formerly non-coding sequence. Here, we survey the limits of standing variation for tandem duplications in natural populations of D. yakuba and D. simulans, estimate the upper bound of mutation rates, and explore their role in rapid evolution.

Tandem duplicates on the X chromosome in D. simulans show an excess of high frequency variants consistent with adaptive evolution through tandem duplication. Furthermore, we identify an overrepresentation of genes involved in rapidly evolving phenotypes such as chorion development and oogenesis, drug and toxin metabolism, chitin cuticle formation, chemosensory processes, lipases and endopeptidases expressed in male reproduction, as well as immune response to pathogens in both D. yakuba and D. simulans. The enrichment of such rapidly evolving functional classes points to a role for tandem duplicates in Red Queen dynamics and responses to strong selective pressures.
In spite of the observed concordance across functional classes we observe few duplicated genes that are shared across species indicating that parallel recruitment of tandem duplications is rare. The span of duplicates in the population is quite limited, and we estimate that less than 15% of the genome is represented among the tandem duplications segregating in the entire population for the species. Moreover, many duplicates are present at low frequency and will have difficulty escaping the forces of drift during selective sweeps. This very limited standing variation combined with low mutation rates for tandem duplications results in severe limitations in the substrate of genetic novelty that is available for adaptation.

Thus, the limits of standing variation and the rate of new mutations are expected to play a vital role in defining evolutionary trajectories and the ability of organisms to adapt in the event of gross environmental change. Given the limited substrate of genetic novelty, we expect that if adaptation is dependent upon gene duplications, suboptimal outcomes in adaptive walks will be common, long wait times will occur for new phenotypic changes, and many multicellular eukaryotes will display limited ability to adapt to rapidly changing environments.

Diversity and evolution of centromere repeats in the maize genome

Diversity and evolution of centromere repeats in the maize genome

Paul Bilinski, Kevin Distor, Jose Gutierrez-Lopez, Gabriela Mendoza Mendoza, Jinghua Shi, R. Kelly Dawe, Jeffrey Ross-Ibarra

Centromere repeats are found in most eukaryotes and play a critical role in kinetochore formation. Though CentC repeats exhibit considerable diversity both within and among species, little is understood about the mechanisms that drive cen- tromere repeat evolution. Here, we use maize as a model to investigate how a complex history involving polyploidy, fractionation, and recent domestication has impacted the diversity of the maize CentC repeat. We first validate the existence of long tan- dem arrays of repeats in maize and other taxa in the genus Zea. Although we find considerable sequence diversity among CentC copies genome-wide, genetic similar- ity among repeats is highest within these arrays, suggesting that tandem duplica- tions are the primary mechanism for the generation of new copies. Genetic clustering analyses identify similar sequences among distant repeats, and simulations suggest that this pattern may be due to homoplasious mutation. Although the two ancestral subgenomes of maize have contributed nearly equal numbers of centromeres, our analysis shows that the vast majority of all CentC repeats derive from one of the parental genomes. Finally, by comparing maize with its wild progenitor teosinte, we find that the abundance of CentC has decreased through domestication while the peri- centromeric repeat Cent4 has drastically increased.

Quantifying evolutionary dynamics of the basic genome of E. coli

Quantifying evolutionary dynamics of the basic genome of E. coli

Purushottam Dixit, Tin Yau Pang, F. William Studier, Sergei Maslov
(Submitted on 11 May 2014)

The ~4-Mbp basic genome shared by 32 independent isolates of E. coli representing considerable population diversity has been approximated by whole-genome multiple-alignment and computational filtering designed to remove mobile elements and highly variable regions. Single nucleotide polymorphisms (SNPs) in the 496 basic-genome pairs are identified and clonally inherited stretches are distinguished from those acquired by horizontal transfer (HT) by sharp discontinuities in SNP density. The six least diverged genome-pairs each have only one or two HT stretches, each occupying 42-115-kbp of basic genome and containing at least one gene cluster known to confer selective advantage. At higher divergences, the typical mosaic pattern of interspersed clonal and HT stretches across the entire basic genome are observed, including likely fragmented integrations across a restriction barrier. A simple model suggests that individual HT events are of the order of 10-kbp and are the chief contributor to genome divergence, bringing in almost 12 times more SNPs than point mutations. As a result of continuing horizontal transfer of such large segments, 400 out of the 496 strain-pairs beyond genomic divergence of share virtually no genomic material with their common ancestor. We conclude that the active and continuing horizontal transfer of moderately large genomic fragments is likely to be mediated primarily by a co evolving population of phages that distribute random genome fragments throughout the population by generalized transduction, allowing efficient adaptation to environmental changes.

Quantifying MCMC Exploration of Phylogenetic Tree Space

Quantifying MCMC Exploration of Phylogenetic Tree Space
Christopher Whidden, Frederick A. Matsen IV
Comments: 30 pages, 10 figures
Subjects: Populations and Evolution (q-bio.PE)

In order to gain an understanding of the effectiveness of phylogenetic Markov chain Monte Carlo (MCMC), it is important to understand how quickly the empirical distribution of the MCMC converges to the true posterior distribution. In this paper we investigate this problem on phylogenetic tree topologies with a metric that is especially well suited to the task: the subtree-prune-regraft (SPR) metric. This metric directly corresponds to the minimum number of MCMC rearrangements required to move between trees in common phylogenetic MCMC implementations. We use a novel graph-based approach to analyze tree space and find that the SPR metric is much more informative than simpler metrics that are unrelated to MCMC moves. In doing so we show conclusively that topological peaks do occur in real Bayesian phylogenetic posteriors with standard MCMC moves, investigate the efficiency of Metropolis-coupled MCMC (MCMCMC) in traversing the valleys between peaks, and show that conditional clade probability (CCP) can have systematic problems when there are multiple peaks.

Background selection as baseline for nucleotide variation across the Drosophila genome

Background selection as baseline for nucleotide variation across the Drosophila genome
Josep M Comeron

The constant removal of deleterious mutations by natural selection causes a reduction in neutral diversity and efficacy of selection at genetically linked sites (a process called Background Selection, BGS). Population genetic studies, however, often ignore BGS effects when investigating demographic events or the presence of other types of selection. To obtain a more realistic evolutionary expectation that incorporates the unavoidable consequences of deleterious mutations, we generated high-resolution landscapes of variation across the Drosophila melanogaster genome under a BGS scenario independent of polymorphism data. We find that BGS plays a significant role in shaping levels of variation across the entire genome, including long introns and intergenic regions distant from annotated genes. We also find that a very large percentage of the observed variation in diversity across autosomes can be explained by BGS alone, up to 70% across individual chromosome arms, thus indicating that BGS predictions can be used as baseline to infer additional types of selection and demographic events. This approach allows detecting several outlier regions with signal of recent adaptive events and selective sweeps. The use of a BGS baseline, however, is particularly appropriate to investigate the presence of balancing selection and our study exposes numerous genomic regions with the predicted signature of higher polymorphism than expected when a BGS context is taken into account. Importantly, we show that these conclusions are robust to the mutation and selection parameters of the BGS model. Finally, analyses of protein evolution together with previous comparisons of genetic maps between Drosophila species, suggest temporally variable recombination landscapes and thus, local BGS effects that may differ between extant and past phases. Because genome-wide BGS and temporal changes in linkage effects can skew approaches to estimate demographic and selective events, future analyses should incorporate BGS predictions and capture local recombination variation across genomes and along lineages.

Genetic dissection of MAPK-mediated complex traits across S. cerevisiae

Genetic dissection of MAPK-mediated complex traits across S. cerevisiae
Sebastian Treusch, Frank W Albert, Joshua S Bloom, Iulia E Kotenko, Leonid Kruglyak

Signaling pathways enable cells to sense and respond to their environment. Many cellular signaling strategies are conserved from fungi to humans, yet their activity and phenotypic consequences can vary extensively among individuals within a species. A systematic assessment of the impact of naturally occurring genetic variation on signaling pathways remains to be conducted. In S. cerevisiae, both response and resistance to stressors that activate signaling pathways differ between diverse isolates. Here, we present a quantitative trait locus (QTL) mapping approach that enables us to identify genetic variants underlying such phenotypic differences across the genetic and phenotypic diversity of S. cerevisiae. Using a Round-robin cross between twelve diverse strains, we determined the genetic architectures of phenotypes critically dependent on MAPK signaling cascades. Genetic variants identified fell within MAPK signaling networks themselves as well as other interconnected signaling pathways, illustrating how genetic variation can shape the phenotypic output of highly conserved signaling cascades.

A novel method for the estimation of diversity in viral populations from next generation sequencing data

A novel method for the estimation of diversity in viral populations from next generation sequencing data
Jean P. Zukurov, Sieberth N. Brito, Luiz M. R. Janini, Fernando Antoneli
Comments: 17 pages, 6 figures, site: this http URL
Subjects: Quantitative Methods (q-bio.QM); Genomics (q-bio.GN)

In this paper we describe the structure and use of a computational tool for the analysis of viral genetic diversity on data generated by high- throughput sequencing. The main motivation for this work is to better understand the genetic diversity of viruses with high rates of nucleotide substitution, as HIV-1 and Influenza. This work focuses on two main fronts: the first is a novel alignment strategy that allows the recovery of the highest possible number of short-reads; the second is the estimation of the populational genetic diversity through a Bayesian approach based on Dirichlet distributions inspired by word count modeling. The software is available as an integrated platform capable of performing all operations described here, it is written in C# (Microsoft) and runs on Windows platforms. The executable, the documentation and the auxiliary files are freely available and may be obtained from: biocomp.epm.br/tanden.

A statistical test for lineage-specific natural selection on quantitative traits based on multiple-line crosses

A statistical test for lineage-specific natural selection on quantitative traits based on multiple-line crosses
Nico Riedel, Bhavin S. Khatri, Michael Lässig, Johannes Berg
Comments: 21 pages, 11 figures
Subjects: Populations and Evolution (q-bio.PE)

Phenotypic differences between species may be attributable to natural selection. However, it is a difficult task to quantify the strength of evidence for selection acting on a particular trait. Here we develop a population-genetic test for selection acting on a quantitative trait, which is based on multiple-line crosses. We show that using multiple lines increases both the power and the scope of selection inference. First, a test based on three or more lines detects selection on a quantitative trait with strongly increased statistical significance, which is quantified by our analysis. Second, a multiple-line test allows to distinguish selection from neutral evolution as well as lineage-specific selection from selection under uniform selection strength. This is in contrast to tests based on two lines, where only differences in selection coefficients can be inferred. Our analytical results are complemented by extensive numerical simulations. We apply the multiple-line test to QTL data on floral character traits in plant species of the Mimulus genus and on photoperiodic traits in different maize strains. In both cases, we find a signature of lineage-specific selection that is not seen in a two-line test. We also extend the multiple-line test to short divergence times.

Sequence co-evolution gives 3D contacts and structures of protein complexes

Sequence co-evolution gives 3D contacts and structures of protein complexes

Thomas A. Hopf, Charlotta P.I. Schärfe, João P.G.L.M. Rodrigues, Anna G. Green, Chris Sander, Alexandre M.J.J. Bonvin, Debora S. Marks

High-throughput experiments in bacteria and eukaryotic cells have identified tens of thousands of possible interactions between proteins. This genome-wide view of the protein interaction universe is coarse-grained, whilst fine-grained detail of macro- molecular interactions critically depends on lower throughput, labor-intensive experiments. Computational approaches using measures of residue co-evolution across proteins show promise, but have been limited to specific interactions. Here we present a new generalized method showing that patterns of evolutionary sequence changes across proteins reflect residues that are close in space, and with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We demonstrate that the inferred evolutionary coupling scores distinguish between interacting and non-interacting proteins and the accurate prediction of residue interactions. To illustrate the utility of the method, we predict unknown 3D interactions between subunits of ATP synthase and find results consistent with detailed experimental data. We expect that the method can be generalized to genome- wide interaction predictions at residue resolution.