Environmental fluctuations do not select for increased variation or population-based resistance in Escherichia coli

Environmental fluctuations do not select for increased variation or population-based resistance in Escherichia coli

Shraddha Madhav Karve, Kanishka Tiwary, S Selveshwari, Sutirth Dey
doi: http://dx.doi.org/10.1101/021030

In nature, organisms often face unpredictably fluctuating environments. However, little is understood about the mechanisms that allow organisms to cope with such unpredictability. To address this issue, we used replicate populations of Escherichia coli selected under complex, randomly changing environments. We assayed growth at the level of single cells under four different novel stresses that had no known correlation with the selection environments. Under such conditions, the individuals of the selected populations had significantly lower lag and greater yield compared to the controls. More importantly, there were no outliers in terms of growth, thus ruling out the evolution of population-based resistance. We also assayed the standing phenotypic variation of the selected populations, in terms of their growth on 94 different substrates. Contrary to extant theoretical predictions, there was no increase in the standing variation of the selected populations, nor was there any significant divergence from the ancestors. This suggested that the greater fitness in novel environments is brought about by selection at the level of the individuals, which restricts the suite of traits that can potentially evolve through this mechanism. Given that day-to-day climatic variability of the world is rising, these results have potential public health implications. Our results also underline the need for a very different kind of theoretical approach to study the effects of fluctuating environments.

Leveraging distant relatedness to quantify human mutation and gene conversion rates

Leveraging distant relatedness to quantify human mutation and gene conversion rates

Pier Francesco Palamara, Laurent Francioli, Giulio Genovese, Peter Wilton, Alexander Gusev, Hilary Finucane, Sriram Sankararaman, Shamil Sunyaev, Paul Debakker, John Wakeley, Itsik Pe’er, Alkes L. Price, The Genome of the Netherlands Consortium
doi: http://dx.doi.org/10.1101/020776

The rate at which human genomes mutate is a central biological parameter that has many implications for our ability to understand demographic and evolutionary phenomena. We present a method for inferring mutation and gene conversion rates using the number of sequence differences observed in identical-by-descent (IBD) segments together with a reconstructed model of recent population size history. This approach is robust to, and can quantify, the presence of substantial genotyping error, as validated in coalescent simulations. We applied the method to 498 trio-phased Dutch individuals from the Genome of the Netherlands (GoNL) project, sequenced at an average depth of 13×. We infer a point mutation rate of 1.66 ± 0.04 × 10-8 per base per generation, and a rate of 1.26 ± 0.06 × 10-9 for <20 bp indels. Our estimated average genome-wide mutation rate is higher than most pedigree-based estimates reported thus far, but lower than estimates obtained using substitution rates across primates. By quantifying how estimates vary as a function of allele frequency, we infer the probability that a site is involved in non-crossover gene conversion as 5.99 ± 0.69 × 10-6, consistent with recent reports. We find that recombination does not have observable mutagenic effects after gene conversion is accounted for, and that local gene conversion rates reflect recombination rates. We detect a strong enrichment for recent deleterious variation among mismatching variants found within IBD regions, and observe summary statistics of local IBD sharing to closely match previously proposed metrics of background selection, but find no significant effects of selection on our estimates of mutation rate. We detect no evidence for strong variation of mutation rates in a number of genomic annotations obtained from several recent studies.

Signatures of Dobzhansky-Muller Incompatibilities in the Genomes of Recombinant Inbred Lines

Signatures of Dobzhansky-Muller Incompatibilities in the Genomes of Recombinant Inbred Lines

Maria Colomé-Tatché, Frank Johannes
doi: http://dx.doi.org/10.1101/021006

In the construction of Recombinant Inbred Lines (RILs) from two divergent inbred parents certain genotype (or epigenotype) combinations may be functionally “incompatible” when brought together in the genomes of the progeny, thus resulting in sterility or lower fertility. Natural selection against these epistatic combinations during inbreeding can change haplotype frequencies and distort linkage disequilibrium (LD) relations between loci within and across chromosomes. These LD distortions have received increased experimental attention, because they point to genomic regions that may drive Dobzhansky-Muller-type of reproductive isolation and, ultimately, speciation in the wild. Here we study the selection signatures of two-locus epistatic incompatibility models and quantify their impact on the genetic composition of the genomes of 2-way RILs obtained by selfing. We also consider the biases introduced by breeders when trying to counteract the loss of lines by selectively propagating only viable seeds. Building on our theoretical results, we develop model-based maximum likelihood (ML) tests which can be employed in pairwise genome scans for incompatibility loci using multi-locus genotype data. We illustrate this ML approach in the context of two published A. thaliana RIL panels. Our work lays the theoretical foundation for studying more complex systems such as RILs obtained by sibling mating and/or from multi-parental crosses.

Linkage disequilibrium between single nucleotide polymorphisms and hypermutable loci

Linkage disequilibrium between single nucleotide polymorphisms and hypermutable loci

Sterling Sawaya, Matt Jones, Matt Keller
doi: http://dx.doi.org/10.1101/020909

Some diseases are caused by genetic loci with a high rate of change, and heritability in complex traits is likely to be partially caused by variation at these loci. These hypermutable elements, such as tandem repeats, change at rates that are orders of magnitude higher than the rates at which most single nucleotides mutate. However, single nucleotide polymorphisms, or SNPs, are currently the primary focus of genetic studies of human disease. Here we quantify the degree to which SNPs are correlated with hypermutable loci, examining a range of mutation rates that correspond to mutation rates at tandem repeat loci. We use established population genetics theory to relate mutation rates to recombination rates and compare the theoretical predictions to simulations. Both simulations and theory agree that, at the highest mutation rates, almost all correlation is lost between a hypermutable locus and surrounding SNPs. The theoretical predictions break down for middle to low mutation rates, differing widely from the simulated results. The simulation results suggest that some correlation remains between SNPs and hypermutable loci when mutation rates are on the lower end of the mutation spectrum. Consequently, in some cases SNPs can tag variation caused by tandem repeat loci. We also examine the linkage between SNPs and other SNPs and uncover ways in which the linkage disequilibrium of rare SNPs differs from that of hypermutable loci.

MERS-CoV recombination: implications about the reservoir and potential for adaptation

MERS-CoV recombination: implications about the reservoir and potential for adaptation

Gytis Dudas, Andrew Rambaut
doi: http://dx.doi.org/10.1101/020834

Recombination is a process that unlinks neighbouring loci allowing for independent evolutionary trajectories within genomes of many organisms. If not properly accounted for, recombination can compromise many evolutionary analyses. In addition, when dealing with organisms that are not obligately sexually reproducing, recombination gives insight into the rate at which distinct genetic lineages come into contact. Since June, 2012, Middle East respiratory syndrome coronavirus (MERS-CoV) has caused 1106 laboratory-confirmed infections, with 421 MERS-CoV associated deaths as of April 16, 2015. Although bats are considered as the likely ultimate source of zoonotic betacoronaviruses, dromedary camels have been consistently implicated as the source of current human infections in the Middle East. In this paper we use phylogenetic methods and simulations to show that MERS-CoV genome has likely undergone numerous recombinations recently. Recombination in MERS-CoV implies frequent co-infection with distinct lineages of MERS-CoV, probably in camels given the current understanding of MERS-CoV epidemiology.

Folding and unfolding phylogenetic trees and networks

Folding and unfolding phylogenetic trees and networks

Katharina T. Huber, Vincent Moulton, Mike Steel, Taoyang Wu
(Submitted on 14 Jun 2015)

Phylogenetic networks are rooted, labelled directed acyclic graphs which are commonly used to represent reticulate evolution. There is a close relationship between phylogenetic networks and multi-labelled trees (MUL-trees). Indeed, any phylogenetic network N can be ‘unfolded’ to obtain a MUL-tree U(N) and, conversely, a MUL-tree T can in certain circumstances be ‘folded’ to obtain a phylogenetic network F(T) that exhibits T. In this paper, we study properties of the operations U and F in more detail. In particular, we introduce the class of stable networks, phylogenetic networks N for which F(U(N)) is isomorphic to N, characterise such networks, and show that that they are related to the well-known class of tree-sibling networks. We also explore how the concept of displaying a tree in a network N can be related to displaying the tree in the MUL-tree U(N). To do this, we develop a phylogenetic analogue of graph fibrations. This allows us to view U(N) as the analogue of the universal cover of a digraph, and to establish a close connection between displaying trees in U(N) and reconciling phylogenetic trees with networks.

Identification of Slco1a6 as a candidate gene that broadly affects gene expression in mouse pancreatic islets

Identification of Slco1a6 as a candidate gene that broadly affects gene expression in mouse pancreatic islets

Jianan Tian, Mark Keller, Angie Oler, Mary Rabagalia, Kathryn Schueler, Donald Stapleton, Aimee Teo Broman, Wen Zhao, Christina Kendziorski, Brian S. Yandell, Bruno Hagenbuch, Karl W Broman, Alan D. Attie
doi: http://dx.doi.org/10.1101/020974

We surveyed gene expression in six tissues in an F2 intercross between mouse strains C57BL/6J (abbreviated B6) and BTBR T+ tf /J (abbreviated BTBR) made genetically obese with the Leptin(ob) mutation. We identified a number of expression quantitative trait loci (eQTL) affecting the expression of numerous genes distal to the locus, called trans-eQTL hotspots. Some of these trans-eQTL hotspots showed effects in multiple tissues, whereas some were specific to a single tissue. An unusually large number of transcripts (7% of genes) mapped in trans to a hotspot on chromosome 6, specifically in pancreatic islets. By considering the first two principal components of the expression of genes mapping to this region, we were able to convert the multivariate phenotype into a simple Mendelian trait. Fine-mapping the locus by traditional methods reduced the QTL interval to a 298 kb region containing only three genes, including Slco1a6, one member of a large family of organic anion transporters. Direct genomic sequencing of all Slco1a6 exons identified a non-synonymous coding SNP that converts a highly conserved proline residue at amino acid position 564 to serine. Molecular modeling suggests that Pro564 faces an aqueous pore within this 12-transmembrane domain-spanning protein. When transiently overexpressed in HEK293 cells, BTBR OATP1A6-mediated cellular uptake of the bile acid taurocholic acid (TCA) was enhanced compared to B6 OATP1A6. Our results suggest that genetic variation in Slco1a6 leads to altered transport of TCA (and potentially other bile acids) by pancreatic islets, resulting in broad gene regulation.

bModelTest: Bayesian site model selection for nucleotide data

bModelTest: Bayesian site model selection for nucleotide data

Remco Bouckaert
doi: http://dx.doi.org/10.1101/020792

bModelTest allows for a Bayesian approach to inferring a site model for phylogenetic analysis. It is based on trans dimensional MCMC proposals that allow switching between substitution models, whether gamma rate heterogeneity is used and whether a proportion of the sites is invariant. The model can be used with the set of reversible models on nucleotides, but we also introduce other sets of substitution models, and show how to use these sets of models. With the method, the site model can be inferred during the MCMC analysis and does not need to be pre-determined, as is now often the case in practice, by likelihood based methods.

Excess False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data

Excess False Positive Rates in Methods for Differential Gene Expression Analysis using RNA-Seq Data

David M Rocke, Luyao Ruan, Yilun Zhang, J. Jared Gossett, Blythe Durbin-Johnson, Sharon Aviran
doi: http://dx.doi.org/10.1101/020784

Motivation: An important property of a valid method for testing for differential expression is that the false positive rate should at least roughly correspond to the p-value cutoff, so that if 10,000 genes are tested at a p-value cutoff of 10−4, and if all the null hypotheses are true, then there should be only about 1 gene declared to be significantly differentially expressed. We tested this by resampling from existing RNA-Seq data sets and also by matched negative binomial simulations. Results: Methods we examined, which rely strongly on a negative binomial model, such as edgeR, DESeq, and DESeq2, show large numbers of false positives in both the resampled real-data case and in the simulated negative binomial case. This also occurs with a negative binomial generalized linear model function in R. Methods that use only the variance function, such as limma-voom, do not show excessive false positives, as is also the case with a variance stabilizing transformation followed by linear model analysis with limma. The excess false positives are likely caused by apparently small biases in estimation of negative binomial dispersion and, perhaps surprisingly, occur mostly when the mean and/or the dis-persion is high, rather than for low-count genes.

Independent molecular basis of convergent highland adaptation in maize

Independent molecular basis of convergent highland adaptation in maize

Shohei Takuno, Peter Ralph, Kelly Swarts, Rob J Elshire, Jeffrey C Glaubitz, Edward S. Buckler, Matthew B Hufford, Jeffrey Ross-Ibarra
doi: http://dx.doi.org/10.1101/013607

Convergent evolution is the independent evolution of similar traits in different species or lineages of the same species; this often is a result of adaptation to similar environments, a process referred to as convergent adaptation.} We investigate here the molecular basis of convergent adaptation in maize to highland climates in Mesoamerica and South America using genome-wide SNP data. Taking advantage of archaeological data on the arrival of maize to the highlands, we infer demographic models for both populations, identifying evidence of a strong bottleneck and rapid expansion in South America. We use these models to then identify loci showing an excess of differentiation as a means of identifying putative targets of natural selection, and compare our results to expectations from recently developed theory on convergent adaptation. Consistent with predictions across a wide parameter space, we see limited evidence for convergent evolution at the nucleotide level in spite of strong similarities in overall phenotypes. Instead, we show that selection appears to have predominantly acted on standing genetic variation, and that introgression from wild teosinte populations appears to have played a role in highland adaptation in Mexican maize.