Protein binding and methylation on looping chromatin accurately predict distal regulatory interactions

Protein binding and methylation on looping chromatin accurately predict distal regulatory interactionsSean Whalen, Rebecca M. Truty, Katherine S. Pollard
doi: http://dx.doi.org/10.1101/022293

Identifying the gene targets of distal regulatory sequences is a challenging problem with the potential to illuminate the causal underpinnings of complex diseases. However, current experimental methods to map enhancer-promoter interactions genome-wide are limited by their cost and complexity. We present TargetFinder, a computational method that reconstructs a cell’s three-dimensional regulatory landscape from two-dimensional genomic features. TargetFinder achieves outstanding predictive accuracy across diverse cell lines with a false discovery rate up to fifteen times smaller than common heuristics, and reveals that distal regulatory interactions are characterized by distinct signatures of protein interactions and epigenetic marks on the DNA loop between an active enhancer and targeted promoter. Much of this signature is shared across cell types, shedding light on the role of chromatin organization in gene regulation and establishing TargetFinder as a method to accurately map long-range regulatory interactions using a small number of easily acquired datasets.

Coalescent models for developmental biology and the spatio-temporal dynamics of growing tissues.

Coalescent models for developmental biology and the spatio-temporal dynamics of growing tissues.
Patrick Smadbeck, Michael P.H. Stumpf
doi: http://dx.doi.org/10.1101/022251

Development is a process that needs to tightly coordinated in both space and time. Cell tracking and lineage tracing have become important experimental techniques in developmental biology and allow us to map the fate of cells and their progeny in both space and time. A generic feature of developing (as well as homeostatic) tissues that these analyses have revealed is that relatively few cells give rise to the bulk of the cells in a tissue; the lineages of most cells come to an end fairly quickly. This has spurned the interest also of computational and theoretical biologists/physicists who have developed a range of modelling — perhaps most notably are the agent-based modelling (ABM) — approaches. These can become computationally prohibitively expensive but seem to capture some of the features observed in experiments. Here we develop a complementary perspective that allows us to understand the dynamics leading to the formation of a tissue (or colony of cells). Borrowing from the rich population genetics literature we develop genealogical models of tissue development that trace the ancestry of cells in a tissue back to their most recent common ancestors. We apply this approach to tissues that grow under confined conditions — as would, for example, be appropriate for the neural crest — and unbounded growth — illustrative of the behaviour of 2D tumours or bacterial colonies. The classical coalescent model from population genetics is readily adapted to capture tissue genealogies for different models of tissue growth and development. We show that simple but universal scaling relationships allow us to establish relationships between the coalescent and different fractal growth models that have been extensively studied in many different contexts, including developmental biology. Using our genealogical perspective we are able to study the statistical properties of the processes that give rise to tissues of cells, without the need for large-scale simulations.

SimPhy: Phylogenomic Simulation of Gene, Locus and Species Trees

SimPhy: Phylogenomic Simulation of Gene, Locus and Species Trees
Diego Mallo, Leonardo de Oliveira Martins, David Posada
doi: http://dx.doi.org/10.1101/021709
We present here a fast and flexible software–SimPhy–for the simulation of multiple gene families evolving under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to the species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible. We validate SimPhy’s output using theoretical expectations and other programs, and show that it scales extremely well with complex models and/or large trees, being an order of magnitude faster than the most similar program (DLCoal-Sim). In addition, we demonstrate how SimPhy can be useful to understand interactions among different evolutionary processes, conducting a simulation study to characterize the systematic overestimation of the duplication time when using standard reconciliation methods. SimPhy is available at https://github.com/adamallo/SimPhy, where users can find the source code, pre-compiled executables, a detailed manual and example cases.

Mendelian randomization: a premature burial?

Mendelian randomization: a premature burial?
George Davey Smith
doi: http://dx.doi.org/10.1101/021386
Mendelian randomization is a promising approach to help improve causal inference in observational studies, with widespread potential applications, including to prioritization of pharmacotherapeutic targets for evaluation in RCTs. From its initial proposal the limitations of Mendelian randomization approaches have been widely recognised and discussed, and recently Pickrell has reiterated these1. However this critique did not acknowledge recent developments in both methodological and empirical research, nor did it recognise many future opportunities for application of the Mendelian randomization approach. These issues are briefly reviewed here.

Evolution in spatial and spatiotemporal variable metapopulations changes a herbivore’s host plant range

Evolution in spatial and spatiotemporal variable metapopulations changes a herbivore’s host plant rangeAnnelies De Roissart, Nicky Wybouw, David Renault, Thomas Van Leeuwen, Dries Bonte
doi: http://dx.doi.org/10.1101/021683

The persistence and dynamics of populations largely depends on the way they are configured and integrated into space and the ensuing eco-evolutionary dynamics. We manipulated spatial and temporal variation in patch size in replicated experimental metapopulations of the herbivore mite Tetranychus urticae. Evolution over approximately 30 generations in the spatially and spatiotemporally variable metapopulations induced a significant divergence in life history traits, physiological endpoints and gene expression, but also a remarkable convergence relative to the stable reference patchy metapopulation in traits related to size and fecundity and in its transcriptional regulation. The observed evolutionary dynamics are tightly linked to demographic changes, more specifically frequent episodes of resource shortage, and increased the reproductive performance of mites on tomato, a challenging host plant. This points towards a general, adaptive stress response in stable spatial variable and spatiotemporal variable metapopulations that pre-adapts a herbivore arthropod to novel environmental stressors.

Collective Fluctuations in models of adaptation

Collective Fluctuations in models of adaptation
Oskar Hallatschek, Lukas Geyrhofer
Subjects: Populations and Evolution (q-bio.PE); Statistical Mechanics (cond-mat.stat-mech); Biological Physics (physics.bio-ph)

The dynamics of adaptation is difficult to predict because it is highly stochastic even in large populations. The uncertainty emerges from number fluctuations, called genetic drift, arising in the small number of particularly fit individuals of the population. Random genetic drift in this evolutionary vanguard also limits the speed of adaptation, which diverges in deterministic models that ignore these chance effects. Several approaches have been developed to analyze the crucial role of noise on the expected dynamics of adaptation, including the mean fitness of the entire population, or the fate of newly arising beneficial deleterious mutations. However, very little is known about how genetic drift causes fluctuations to emerge on the population level, including fitness distribution variations and speed variations. Yet, these phenomena control the replicability of experimental evolution experiments and are key to a truly predictive understanding of evolutionary processes. Here, we develop an exact approach to these emergent fluctuations by a combination of computational and analytical methods. We show, analytically, that the infinite hierarchy of moment equations can be closed at any arbitrary order by a suitable choice of a dynamical constraint. This constraint regulates (rather than fixes) the population size, accounting for resource limitations. The resulting linear equations, which can be accurately solved numerically, exhibit fluctuation-induced terms that amplify short-distance correlations and suppress long-distance ones. Importantly, by accounting for the dynamics of sub-populations, we provide a systematic route to key population genetic quantities, such as fixation probabilities and decay rates of the genetic diversity.

Evolution of organismal stoichiometry in a 50,000-generation experiment with Escherichia coli

Evolution of organismal stoichiometry in a 50,000-generation experiment with Escherichia coli
Caroline B. Turner, Brian D. Wade, Justin R. Meyer, Richard E. Lenski
doi: http://dx.doi.org/10.1101/021360

Organismal stoichiometry refers to the relative proportion of chemical elements in the biomass of organisms, and it can have important effects on ecological interactions from population to ecosystem scales. Although stoichiometry has been studied extensively from an ecological perspective, little is known about rates and directions of evolutionary changes in elemental composition in response to nutrient limitation. We measured carbon, nitrogen, and phosphorus content of Escherichia coli evolved under controlled carbon-limited conditions for 50,000 generations. The bacteria evolved higher relative nitrogen and phosphorus content, consistent with selection for increased use of the more abundant elements. Total carbon assimilated also increased, indicating more efficient use of the limiting element. Altogether, our study shows that stoichiometry evolved over a relatively short time-period, and that it did so in a predictable direction given the carbon-limiting environment.

The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genome

The power of single molecule real-time sequencing technology in the de novo assembly of a eukaryotic genomeHiroaki Sakai, Naito Ken, Eri Ogiso-Tanaka, Yu Takahashi, Kohtaro Iseki, Chiaki Muto, Kazuhito Satou, Kuniko Teruya, Akino Shiroma, Makiko Shimoji, Takashi Hirano, Takeshi Itoh, Akito Kaga, Norihiko Tomooka
doi: http://dx.doi.org/10.1101/021634

Second-generation sequencers (SGS) have been game-changing, achieving cost-effective whole genome sequencing in many non-model organisms. However, a large portion of the genomes still remains unassembled. We reconstructed azuki bean (Vigna angularis) genome using single molecule real-time (SMRT) sequencing technology and achieved the best contiguity and coverage among currently assembled legume crops. The SMRT-based assembly produced 100 times longer contigs with 100 times smaller amount of gaps compared to the SGS-based assemblies. A detailed comparison between the assemblies revealed that the SMRT-based assembly enabled a more comprehensive gene annotation than the SGS-based assemblies where thousands of genes were missing or fragmented. A chromosome-scale assembly was generated based on the high-density genetic map, covering 86% of the azuki bean genome. We demonstrated that SMRT technology, though still needed to be assisted by SGS data, can achieve a near-complete assembly of a eukaryotic genome.

BGT: efficient and flexible genotype query across many samples

BGT: efficient and flexible genotype query across many samples Heng Li
Subjects: Genomics (q-bio.GN)

Summary: BGT is a compact format, a fast command line tool and a simple web application for efficient and convenient query of whole-genome genotypes and frequencies across tens to hundreds of thousands of samples. On real data, it encodes the haplotypes of 32,488 samples across 39.2 million SNPs into a 7.4GB database and decodes a couple of hundred million genotypes per CPU second. The high performance enables real-time responses to complex queries.
Availability and implementation: https://github.com/lh3/bgt
Contact: hengli@broadinstitute.org

Coalescent inference using serially sampled, high-throughput sequencing data from HIV infected patients

Coalescent inference using serially sampled, high-throughput sequencing data from HIV infected patientsKevin Dialdestoro, Jonas Andreas Sibbesen, Lasse Maretty, Jayna Raghwani, Astrid Gall, Paul Kellam, Oliver Pybus, Jotun Hein, Paul Jenkins
doi: http://dx.doi.org/10.1101/020552
Human immunodeficiency virus (HIV) is a rapidly evolving pathogen that causes chronic infections, so genetic diversity within a single infection can be very high. High-throughput “deep” sequencing can now measure this diversity in unprecedented detail, particularly since it can be performed at different timepoints during an infection, and this offers a potentially powerful way to infer the evolutionary dynamics of the intra-host viral population. However, population genomic inference from HIV sequence data is challenging because of high rates of mutation and recombination, rapid demographic changes, and ongoing selective pressures. In this paper we develop a new method for inference using HIV deep sequencing data using an approach based on importance sampling of ancestral recombination graphs under a multi-locus coalescent model. The approach further extends recent progress in the approximation of so-called ‘conditional sampling distributions’, a quantity of key interest when approximating coalescent likelihoods. The chief novelties of our method are that it is able to infer rates of recombination and mutation, as well as the effective population size, while handling sampling over different timepoints and missing data without extra computational difficulty. We apply our method to a dataset of HIV-1, in which several hundred sequences were obtained from an infected individual at seven timepoints over two years. We find mutation rate and effective population size estimates to be comparable to those produced by the software BEAST. Additionally, our method is able to produce local recombination rate estimates. The software underlying our method, Coalescenator, is freely available.