The effects of transcription factor competition on gene regulation
Nicolae Radu Zabet, Boris Adryan
(Submitted on 27 Mar 2013)
We performed stochastic simulations of transcription factor (TF) molecules translocating by facilitated diffusion (a combination of 3D diffusion in the cytoplasm and 1D random walk on the DNA), and consider various abundances of cognate and non-cognate TFs to assess the influence of competitor molecules that also move along the DNA. We show that molecular crowding on the DNA always leads to longer times required by TF molecules to locate their target sites as well as to lower occupancy, which may confer a general mechanism to control gene activity levels globally. Finally, we show that crowding on the DNA may increase transcriptional noise through increased variability of the occupancy time of the target sites.
The influence of transcription factor competition on the relationship between occupancy and affinity
Nicolae Radu Zabet, Robert Foy, Boris Adryan
(Submitted on 27 Mar 2013)
Transcription factors (TFs) are proteins that bind to specific sites on the DNA and regulate gene activity. Identifying where TF molecules bind and how much time they spend on their target sites is key for understanding transcriptional regulation. It is usually assumed that the free energy of binding of a TF to the DNA (the affinity of the site) is highly correlated to the amount of time the TF remains bound (the occupancy of the site). However, knowing the binding energy is not sufficient to infer actual binding site occupancy. This mismatch between the occupancy predicted by the affinity and the observed occupancy may be caused by various factors, such as TF abundance, competition between TFs or the arrangement of the sites on the DNA. We investigated the relationship between the affinity of a TF for a set of binding sites and their occupancy. In particular, we considered the case of lac repressor (lacI) in E.coli and performed stochastic simulations of the TF dynamics on the DNA for various combinations of lacI abundance in competition with TFs that contribute to macromolecular crowding. Our results showed that for medium and high affinity sites, TF competition does not play a significant role in genomic occupancy, except in cases when the abundance of lacI is significantly increased or when a low-information content PWM was used. Nevertheless, for medium and low affinity sites, an increase in TF abundance (for both lacI or other molecules) leads to an increase in occupancy at several sites. Keywords: facilitated diffusion, Position Weight Matrix, thermodynamic equilibrium, motif information content, molecular crowding
SICLE: A high-throughput tool for extracting evolutionary relationships from phylogenetic trees
Dan DeBlasio, Jennifer Wiscaver
(Submitted on 22 Mar 2013)
We present the phylogeny analysis software SICLE (Sister Clade Extractor), an easy to use, adaptable, and high-throughput tool to describe the nearest neighbors to a node of interest in a phylogenetic tree as well as the support value for the relationship. With SICLE it is possible to summarize the phylogenetic information produced by automated phylogenetic pipelines to rapidly identify and quantify the possible evolutionary relationships that merit further investigation. The program is a simple command line utility and is easy to adapt and implement in any phylogenetic pipeline. As a test case, we applied this new tool to published gene phylogenies to identify potential instances of horizontal gene transfer in Salinibacter ruber.
The Convergence of eQTL Mapping, Heritability Estimation and Polygenic Modeling: Emerging Spectrum of Risk Variation in Bipolar Disorder
Eric R. Gamazon, Hae Kyung Im, Chunyu Liu, Members of the Bipolar Disorder Genome Study (BiGS) Consortium, Dan L. Nicolae, Nancy J. Cox
(Submitted on 25 Mar 2013)
It is widely held that a substantial genetic component underlies Bipolar Disorder (BD) and other neuropsychiatric disease traits. Recent efforts have been aimed at understanding the genetic basis of disease susceptibility, with genome-wide association studies (GWAS) unveiling some promising associations. Nevertheless, the genetic etiology of BD remains elusive with a substantial proportion of the heritability – which has been estimated to be 80% based on twin and family studies – unaccounted for by the specific genetic variants identified by large-scale GWAS. Furthermore, functional understanding of associated loci generally lags discovery. Studies we report here provide considerable support to the claim that substantially more remains to be gained from GWAS on the genetic mechanisms underlying BD susceptibility, and that a large proportion of the variation in disease risk may be uncovered through integrative functional genomic approaches. We combine recent analytic advances in heritability estimation and polygenic modeling and leverage recent technological advances in the generation of -omics data to evaluate the nature and scale of the contribution of functional classes of genetic variation to a relatively intractable disorder. We identified cis eQTLs in cerebellum and parietal cortex that capture more than half of the total heritability attributable to SNPs interrogated through GWAS and showed that eQTL-based heritability estimation is highly tissue-dependent. Our findings show that a much greater resolution may be attained than has been reported thus far on the number of common loci that capture a substantial proportion of the heritability to disease risk and that the functional nature of contributory loci may be clarified en masse.
An algebraic framework to sample the rearrangement histories of a cancer metagenome with double cut and join, duplication and deletion events
Daniel R. Zerbino, Benedict Paten, Glenn Hickey, David Haussler
(Submitted on 22 Mar 2013)
Algorithms to study structural variants (SV) in whole genome sequencing (WGS) cancer datasets are currently unable to sample the entire space of rearrangements while allowing for copy number variations (CNV). In addition, rearrangement theory has up to now focused on fully assembled genomes, not on fragmentary observations on mixed genome populations. This affects the applicability of current methods to actual cancer datasets, which are produced from short read sequencing of a heterogeneous population of cells. We show how basic linear algebra can be used to describe and sample the set of possible sequences of SVs, extending the double cut and join (DCJ) model into the analysis of metagenomes. We also describe a functional pipeline which was run on simulated as well as experimental cancer datasets.
Natural selection reduced diversity on human Y chromosomes
Melissa A. Wilson Sayres, Kirk E. Lohmueller, Rasmus Nielsen
(Submitted on 20 Mar 2013)
The human Y chromosome exhibits surprisingly low levels of genetic diversity. This could result from neutral processes if the effective population size of males is reduced relative to females due to a higher variance in the number of offspring from males than from females. Alternatively, selection acting on new mutations, and affecting linked neutral sites, could reduce variability on the Y chromosome. Here, using genome-wide analyses of X, Y, autosomal and mitochondrial DNA, in combination with extensive population genetic simulations, we show that low observed Y chromosome variability is not consistent with a purely neutral model. Instead, we show that models of purifying selection are consistent with observed Y diversity. Further, the number of sites estimated to be under purifying selection greatly exceeds the number of Y-linked coding sites, suggesting the importance of the highly repetitive ampliconic regions. Because the functional significance of the ampliconic regions is poorly understood, our findings should motivate future research in this area.
Genomic evidence of rapid and stable adaptive oscillations over seasonal time scales in Drosophila
Alan O. Bergland, Emily L. Behrman, Katherine R. O’Brien, Paul S. Schmidt, Dmitri A. Petrov
(Submitted on 20 Mar 2013)
In many species, genomic data have revealed pervasive adaptive evolution indicated by the near fixation of beneficial alleles. However, when selection pressures are highly variable along a species range or through time adaptive alleles may persist at intermediate frequencies for long periods. So called balanced polymorphisms have long been understood to be an important component of standing genetic variation yet direct evidence of the ubiquity of balancing selection has remained elusive. We hypothesized that environmental fluctuations between seasons in a North American orchard would impose temporally variable selection on Drosophila melanogaster and consequently maintain allelic variation at polymorphisms adaptively evolving in response climatic variation. We identified hundreds of polymorphisms whose frequency oscillates among seasons and argue that these loci are subject to strong, temporally variable selection. We show that adaptively oscillating polymorphisms are often millions of years old, predating the divergence between D. melanogaster and D. simulans and that a subset of these polymorphisms respond predictably to an acute frost event. Taken together, our results demonstrate that rapid temporal fluctuations in climate over generational scales is a predominant force that maintains adaptive alleles and promotes genetic diversity.
Genomic Sequence Diversity and Population Structure of Saccharomyces cerevisiae Assessed by RAD-seq
Gareth A. Cromie, Katie E. Hyma, Catherine L. Ludlow, Cecilia Garmendia-Torres, Teresa L. Gilbert, Patrick May, Angela A. Huang, Aimée M. Dudley, Justin C. Fay
(Submitted on 20 Mar 2013)
The budding yeast Saccharomyces cerevisiae is important for human food production and as a model organism for biological research. The genetic diversity contained in the global population of yeast strains represents a valuable resource for a number of fields, including genetics, bioengineering, and studies of evolution and population structure. Here, we apply a multiplexed, reduced genome sequencing strategy (known as RAD-seq) to genotype a large collection of S. cerevisiae strains, isolated from a wide range of geographical locations and environmental niches. The method permits the sequencing of the same 1% of all genomes, producing a multiple sequence alignment of 116,880 bases across 262 strains. We find diversity among these strains is principally organized by geography, with European, North American, Asian and African/S. E. Asian populations defining the major axes of genetic variation. At a finer scale, small groups of strains from cacao, olives and sake are defined by unique variants not present in other strains. One population, containing strains from a variety of fermentations, exhibits high levels of heterozygosity and mixtures of alleles from European and Asian populations, indicating an admixed origin for this group. In the context of this global diversity, we demonstrate that a collection of seven strains commonly used in the laboratory encompasses only one quarter of the genetic diversity present in the full collection of strains, underscoring the relatively limited genetic diversity captured by the current set of lab strains. We propose a model of geographic differentiation followed by human-associated admixture, primarily between European and Asian populations and more recently between European and North American populations. The large collection of genotyped yeast strains characterized here will provide a useful resource for the broad community of yeast researchers.
Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM
(Submitted on 16 Mar 2013)
Summary: BWA-MEM is a new alignment algorithm for aligning sequence reads or long query sequences against a large reference genome such as human. It automatically chooses between local and end-to-end alignments, supports paired-end reads and performs split alignment. The algorithm is robust to sequencing errors and applicable to a wide range of sequence lengths from 70bp to a few megabases. For short-read mapping, BWA-MEM shows better performance than several state-of-art read aligners to date.
Availability and implementation: BWA-MEM is implemented as a component of BWA, which is available at this http URL
Major changes in the core developmental pathways of nematodes: Romanomermis culicivorax reveals the derived status of the Caenorhabditis elegans model
Philipp H. Schiffer, Michael Kroiher, Christopher Kraus, Georgios D. Koutsovoulos, Sujai Kumar, Julia I. R. Camps, Ndifon A. Nsah, Dominik Stappert, Krystalynne Morris, Peter Heger, Janine Altmüller, Peter Frommolt, Peter Nürnberg, W. Kelley Thomas, Mark L. Blaxter, Einhard Schierenberg
(Submitted on 17 Mar 2013)
Background Despite its status as a model organism, the development of Caenorhabditis elegans is not necessarily archetypical for nematodes. The phylum Nematoda is divided into the Chromadorea (indcludes C. elegans) and the Enoplea. Compared to C. elegans, enoplean nematodes have very different patterns of cell division and determination. Embryogenesis of the enoplean Romanomermis culicivorax has been studied in great detail, but the genetic circuitry underpinning development in this species is unknown. Results We created a draft genome of R. culicivorax and compared its developmental gene content with that of two nematodes, C. elegans and Trichinella spiralis (another enoplean), and a representative arthropod Tribolium castaneum. This genome evidence shows that R. culicivorax retains components of the conserved metazoan developmental toolkit lost in C. elegans. T. spiralis has independently lost even more of the toolkit than has C. elegans. However, the C. elegans toolkit is not simply depauperate, as many genes essential for embryogenesis in C. elegans are unique to this lineage, or have only extremely divergent homologues in R. culicivorax and T. spiralis. These data imply fundamental differences in the genetic programmes for early cell specification, inductive interactions, vulva formation and sex determination. Conclusions Thus nematodes, despite their apparent phylum-wide morphological conservatism, have evolved major differences in the molecular logic of their development. R. culicivorax serves as a tractable, contrasting model to C. elegans for understanding how divergent genomic and thus regulatory backgrounds can generate a conserved phenotype. The availability of the draft genome will promote use of R. culicivorax as a research model.