No association between plant mating system & geographic range overlap

No association between plant mating system & geographic range overlap
Dena Grossenbacher , Ryan Briscoe Runquist , Emma Goldberg , Yaniv Brandvain
doi: http://dx.doi.org/10.1101/016261

Both evolutionary theory and numerous case studies suggest that selfing taxa are more likely to co-occur with outcrossing relatives than are outcrossing taxa. Despite suggestions that this pattern may be general, the extent to which mating system influences range overlap in close relatives has not been tested formally across a diverse group of plant species pairs. We test for a difference in range overlap between species pairs where zero, one, or both species are selfers with data from 98 sister species pairs in 20 genera. We also use divergence time estimates from time-calibrated phylogenies to ask how range overlap changes with divergence time and whether this effect depends on mating system. We find no evidence that self-pollination influences range overlap of closely related species. While the extent of range overlap decreased modestly with the divergence time of sister species, this effect did not depend on mating system. The absence of a strong influence of mating system on range overlap suggests that of the many mechanisms potentially influencing the co-occurrence of close relatives, mating system plays a minor and/or inconsistent role.

A Comparison of Methods to Measure Fitness in Escherichia coli

A Comparison of Methods to Measure Fitness in Escherichia coli
Michael J Wiser , Richard E Lenski
doi: http://dx.doi.org/10.1101/016121

In order to characterize the dynamics of adaptation, it is important to be able to quantify how a population’s mean fitness changes over time. Such measurements are especially important in experimental studies of evolution using microbes. The Long-Term Evolution Experiment (LTEE) with Escherichia coli provides one such system in which mean fitness has been measured by competing derived and ancestral populations. The traditional method used to measure fitness in the LTEE and many similar experiments, though, is subject to a potential limitation. As the relative fitness of the two competitors diverges, the measurement error increases because the less-fit population becomes increasingly small and cannot be enumerated as precisely. Here, we present and employ two alternatives to the traditional method. One is based on reducing the fitness differential between the competitors by using a common reference competitor from an intermediate generation that has intermediate fitness; the other alternative increases the initial population size of the less-fit, ancestral competitor. We performed a total of 480 competitions to compare the statistical properties of estimates obtained using these alternative methods with those obtained using the traditional method for samples taken over 50,000 generations from one of the LTEE populations. On balance, neither alternative method yielded measurements that were more precise than the traditional method.

Tools and best practices for allelic expression analysis

Tools and best practices for allelic expression analysis

Stephane E Castel , Ami Levy-Moonshine , Pejman Mohammadi , Eric Banks , Tuuli Lappalainen
doi: http://dx.doi.org/10.1101/016097

Allelic expression (AE) analysis has become an important tool for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. In this paper, we systematically analyze the properties of AE read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting and filtering for such errors, and show that the resulting AE data has extremely low technical noise. Finally, we introduce novel software for high-throughput production of AE data from RNA-sequencing data, implemented in the GATK framework. These improved tools and best practices for AE analysis yield higher quality AE data by reducing technical bias. This provides a practical framework for wider adoption of AE analysis by the genomics community.

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads
Hung-I Harry Chen , Yuanhang Liu , Yi Zou , Zhao Lai , Devanand Sarkar , Yufei Huang , Yidong Chen
doi: http://dx.doi.org/10.1101/016196

Background RNA sequencing (RNA-seq) is a powerful tool for genome-wide expression profiling of biological samples with the advantage of high-throughput and high resolution. There are many existing algorithms nowadays for quantifying expression levels and detecting differential gene expression, but none of them takes the misaligned reads that are mapped to non-exonic regions into account. We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes. Results We implemented our novel XBSeq algorithm and evaluated it by using a set of simulated expression datasets under different conditions, using a combination of negative binomial and Poisson distributions with parameters derived from real RNA-seq data. We compared the performance of our method with other commonly used differential expression analysis algorithms. We also evaluated the changes in true and false positive rates with variations in biological replicates, differential fold changes, and expression levels in non-exonic regions. We also tested the algorithm on a set of real RNA-seq data where the common and different detection results from different algorithms were reported. Conclusions In this paper, we proposed a novel XBSeq, a differential expression analysis algorithm for RNA-seq data that takes non-exonic mapped reads into consideration. When background noise is at baseline level, the performance of XBSeq and DESeq are mostly equivalent. However, our method surpasses DESeq and other algorithms with the increase of non-exonic mapped reads. Only in very low read count condition XBSeq had a slightly higher false discovery rate, which may be improved by adjusting the background noise effect in this situation. Taken together, by considering non-exonic mapped reads, XBSeq can provide accurate expression measurement and thus detect differential expressed genes even in noisy conditions.

Pathway based factor analysis of gene expression data produces highly heritable phenotypes that associate with age

Pathway based factor analysis of gene expression data produces highly heritable phenotypes that associate with age
Andrew Anand Brown , Zhihao Ding , Ana Viñuela , Dan Glass , Leopold Parts , Timothy Spector , John Winn , Richard Durbin
doi: http://dx.doi.org/10.1101/016154

Statistical factor analysis methods have previously been used to remove noise components from high dimensional data prior to genetic association mapping, and in a guided fashion to summarise biologically relevant sources of variation. Here we show how the derived factors summarising pathway expression can be used to analyse the relationships between expression, heritability and ageing. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarise patterns of gene expression, both to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 “pathway phenotypes” which summarised patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold (P<5.38E-5). These phenotypes are more heritable (h^2=0.32) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolising sugars and fatty acids, others with insulin signalling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors.

svviz: a read viewer for validating structural variants

svviz: a read viewer for validating structural variants
Noah Spies , Justin M Zook , Marc Salit , Arend Sidow
doi: http://dx.doi.org/10.1101/016063

Visualizing read alignments is the most effective way to validate candidate SVs with existing data. We present svviz, a sequencing read visualizer for structural variants (SVs) that sorts and displays only reads relevant to a candidate SV. svviz works by searching input bam(s) for potentially relevant reads, realigning them against the inferred sequence of the putative variant allele as well as the reference allele, and identifying reads that match one allele better than the other. Reads are assigned to the proper allele based on alignment score, read pair orientation and insert size. Separate views of the two alleles are then displayed in a scrollable web browser view, enabling a more intuitive visualization of each allele, compared to the single reference genome-based view common to most current read browsers. The web view facilitates examining the evidence for or against a putative variant, estimating zygosity, visualizing affected genomic annotations, and manual refinement of breakpoints. An optional command-line-only interface allows summary statistics and graphics to be exported directly to standard graphics file formats. svviz is open source and freely available from github, and requires as input only structural variant coordinates (called using any other software package), reads in bam format, and a reference genome. Reads from any high-throughput sequencing platform are supported, including Illumina short-read, mate-pair, synthetic long-read (assembled), Pacific Biosciences, and Oxford Nanopore. svviz is open source and freely available from https://github.com/svviz/svviz. 

Bacterial Infection Remodels the DNA Methylation Landscape of Human Dendritic Cells

Bacterial Infection Remodels the DNA Methylation Landscape of Human Dendritic Cells

Alain Pacis , Ludovic Tailleux , John Lambourne , Vania Yotova , Anne Dumaine , Anne Danckaert , Francesca Luca , Jean-Christophe Grenier , Kasper Hansen , Brigitte Gicquel , Miao Yu , Athma Pai , Jenny Tung , Chuan He , Tomi Pastinen , Roger Pique-Regi , Yoav Gilad , Luis Barreiro
doi: http://dx.doi.org/10.1101/016022

DNA methylation is thought to be robust to environmental perturbations on a short time scale. Here, we challenge that view by demonstrating that the infection of human dendritic cells (DCs) with a pathogenic bacteria is associated with rapid changes in methylation at thousands of loci. Infection-induced changes in methylation occur primarily at distal enhancer elements, including those associated with the activation of key immune transcription factors and genes involved in the crosstalk between DCs and adaptive immunity. Active demethylation is associated with extensive epigenetic remodeling and is strongly predictive of changes in the expression levels of nearby genes. Collectively, our observations show that rapid changes in methylation play a previously unappreciated role in regulating the transcriptional response of DCs to infection.

Efficient computation of the joint sample frequency spectra for multiple populations

Efficient computation of the joint sample frequency spectra for multiple populations

John A. Kamm, Jonathan Terhorst, Yun S. Song
(Submitted on 3 Mar 2015)

A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences. In particular, recently there has been growing interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. Although much methodological progress has been made, existing SFS-based inference methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable efficient computation of the expected joint SFS for multiple populations related by a complex demographic model with arbitrary population size histories (including piecewise exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study involving tens of populations, we demonstrate our improvements to numerical stability and computational complexity.

The origins of a novel butterfly wing patterning gene from within a family of conserved cell cycle regulators

The origins of a novel butterfly wing patterning gene from within a family of conserved cell cycle regulators
Nicola Nadeau , Carolina Pardo-Diaz , Annabel Whibley , Megan Ann Supple , Richard Wallbank , Grace C. Wu , Luana Maroja , Laura Ferguson , Heather Hines , Camilo Salazar , Richard ffrench-Constant , Mathieu Joron , William Owen McMillan , Chris Jiggins
doi: http://dx.doi.org/10.1101/016006

A major challenge in evolutionary biology is to understand the origins of novel structures. The wing patterns of butterflies and moths are derived phenotypes unique to the Lepidoptera. Here we identify a gene that we name poikilomousa (poik), which regulates colour pattern switches in the mimetic Heliconius butterflies. Strong associations between phenotypic variation and DNA sequence variation are seen in three different Heliconius species, in addition to associations between gene expression and colour pattern. Colour pattern variants are also associated with differences in splicing of poik transcripts. poik is a member of the conserved fizzy family of cell cycle regulators. It belongs to a faster evolving subfamily, the closest functionally characterised orthologue being the cortex gene in Drosophila, a female germ-line specific protein involved in meiosis. poik appears to have adopted a novel function in the Lepidoptera and become a major target for natural selection acting on colour and pattern variation in this group.

The interplay between DNA methylation and sequence divergence in recent human evolution

The interplay between DNA methylation and sequence divergence in recent human evolution

Irene Hernando-Herraez , Holger Heyn , Marcos Fernandez-Callejo , Enrique Vidal , Hugo Fernandez-Bellon , Javier Prado-Martinez , Andrew J Sharp , Manel Esteller , Tomas Marques-Bonet
doi: http://dx.doi.org/10.1101/015966

DNA methylation is a key regulatory mechanism in mammalian genomes. Despite the increasing knowledge about this epigenetic modification, the understanding of human epigenome evolution is in its infancy. We used whole genome bisulfite sequencing to study DNA methylation and nucleotide divergence between human and great apes. We identified 360 and 210 differentially hypo- and hypermethylated regions (DMRs) in humans compared to non-human primates and estimated that 20% and 36% of these regions, respectively, were detectable throughout several human tissues. Human DMRs were enriched for specific histone modifications and contrary to expectations, the majority were located distal to transcription start sites, highlighting the importance of regions outside the direct regulatory context. We also found a significant excess of endogenous retrovirus elements in human-specific hypomethylated regions suggesting their association with local epigenetic changes. We also reported for the first time a close interplay between inter-species genetic and epigenetic variation in regions of incomplete lineage sorting, transcription factor binding sites and human differentially hypermethylated regions. Specifically, we observed an excess of human-specific substitutions in transcription factor binding sites located within human DMRs, suggesting that alteration of regulatory motifs underlies some human-specific methylation patterns. We also found that the acquisition of DNA hypermethylation in the human lineage is frequently coupled with a rapid evolution at nucleotide level in the neighborhood of these CpG sites. Taken together, our results reveal new insights into the mechanistic basis of human-specific DNA methylation patterns and the interpretation of inter-species non-coding variation.