Estimating the Relative Rate of Recombination to Mutation in Bacteria from Single-Locus Variants using Composite Likelihood Methods

Estimating the Relative Rate of Recombination to Mutation in Bacteria from Single-Locus Variants using Composite Likelihood Methods

Paul Fearnhead, Shoukai Yu, Patrick Biggs, Barbara Holland, Nigel French
(Submitted on 5 Nov 2014)

A number of studies have suggested using comparisons between DNA sequences of closely related bacterial isolates to estimate the relative rate of recombination to mutation for that bacterial species. We consider such an approach which uses single locus variants: pairs of isolates whose DNA differ at a single gene locus. One way of deriving point estimates for the relative rate of recombination to mutation from such data is to use composite likelihood methods. We extend recent work in this area so as to be able to construct confidence intervals for our estimates, without needing to resort to computationally-intensive bootstrap procedures, and to develop a test for whether the relative rate varies across loci. Both our test and method for constructing confidence intervals are obtained by modelling the dependence structure in the data, and then applying asymptotic theory regarding the distribution of estimators obtained using a composite likelihood. We applied these methods to multi-locus sequence typing (MLST) data from eight bacteria, finding strong evidence for considerable rate variation in three of these: Bacillus cereus, Enterococcus faecium and Klebsiella pneumoniae.

GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands.

GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands.
Florent Lassalle, Séverine Périan, Thomas Bataillon, Xavier Nesme, Laurent Duret, Vincent Daubin
doi: http://dx.doi.org/10.1101/011023

The characterization of functional elements in genomes relies on the identification of the footprints of natural selection. In this quest, taking into account neutral evolutionary processes such as mutation and genetic drift is crucial because these forces can generate patterns that may obscure or mimic signatures of selection. In mammals, and probably in many eukaryotes, another such confounding factor called GC-Biased Gene Conversion (gBGC) has been documented. This mechanism generates patterns identical to what is expected under selection for higher GC-content, specifically in highly recombining genomic regions. Recent results have suggested that a mysterious selective force favouring higher GC-content exists in Bacteria but the possibility that it could be gBGC has been excluded. Here, we show that gBGC is probably at work in most if not all bacterial species. First we find a consistent positive relationship between the GC-content of a gene and evidence of intra-genic recombination throughout a broad spectrum of bacterial clades. Second, we show that the evolutionary force responsible for this pattern is acting independently from selection on codon usage, and could potentially interfere with selection in favor of optimal AU-ending codons. A comparison with data from human populations shows that the intensity of gBGC in Bacteria is comparable to what has been reported in mammals. We propose that gBGC is not restricted to sexual Eukaryotes but also widespread among Bacteria and could therefore be an ancestral feature of cellular organisms. We argue that if gBGC occurs in bacteria, it can account for previously unexplained observations, such as the apparent non-equilibrium of base substitution patterns and the heterogeneity of gene composition within bacterial genomes. Because gBGC produces patterns similar to positive selection, it is essential to take this process into account when studying the evolutionary forces at work in bacterial genomes.

Ancestries of a Recombining Diploid Population

Ancestries of a Recombining Diploid Population,
R Sainudiin, B. Thatte and A. Veber, UCDMS Research Report 2014/3, 42 pages, 2014

We derive the exact one-step transition probabilities of the number of lineages
that are ancestral to a random sample from the current generation of a bi-parental
population that is evolving under the discrete Wright-Fisher model with n diploid
individuals. Our model allows for a per-generation recombination probability of
r. When r = 1, our model is equivalent to Chang’s model [4] for the karyotic
pedigree. When r = 0, our model is equivalent to Kingman’s discrete coalescent
model [16] for the cytoplasmic tree or sub-karyotic tree containing a DNA locus that
is free of intra-locus recombination. When 0 < r < 1 our model can be thought to
track a sub-karyotic ancestral graph containing a DNA sequence from an autosomal
chromosome that has an intra-locus recombination probability r. Thus, our family
of models indexed by r 2 [0; 1] connects Kingman's discrete coalescent to Chang's
pedigree in a continuous way as r goes from 0 to 1. For large populations, we
also study three properties of the r-specific ancestral process: the time Tn to a
most recent common ancestor (MRCA) of the population, the time Un at which all
individuals are either common ancestors to all present day individuals or ancestral
to none of them, and the fraction of individuals that are common ancestors at time
Un. These results generalize the three main results in [4]. When we appropriately
rescale time and recombination probability by the population size, our model leads
to the continuous time Markov chain called the ancestral recombination graph of
Hudson [12] and Griffiths [9].

Tackling drug resistant infection outbreaks of global pandemic Escherichia coli ST131 using evolutionary and epidemiological genomics

Tackling drug resistant infection outbreaks of global pandemic Escherichia coli ST131 using evolutionary and epidemiological genomics
Tim Downing
(Submitted on 4 Nov 2014)

High-throughput molecular approaches are required to investigate the origin and diffusion of antimicrobial resistance in rapidly radiating pathogen outbreaks. The most frequent cause of human infection is Escherichia coli, which is dominated by ST131, a single pandemic clone. This epidemic subtype possesses an extensive array of virulence elements and tolerates many drugs. Frequent global sweeps of new dominant ST131 varieties necessitate deep genomic scrutiny of their spread, evolution and lateral transfer of drug resistance genes. Phylogenetic methods that decipher past events can predict future patterns of virulence and transmission based on genetic signatures of adaptation and recombination. Antibiotic tolerance is controlled by natural variation in gene expression levels, which can initiate delayed cell growth. This dormancy allows survival despite drug exposure, and yet may only be present in part of the infecting cell population. Consequently, genomic epidemiology needs to explore the scale of phenotypic regulatory control acting on RNA. A multi-faceted approach can comprehensively assess antimicrobial resistance in E. coli ST131 in terms of within-host genetic heterogeneity, regulation of gene expression, and transmission dynamics between hosts to achieve a goal of pre-empting resistance before it emerges by optimising drug treatment protocols.

The genetic architecture of local adaptation I: The genomic landscape of foxtail pine (Pinus balfouriana Grev. & Balf.) as revealed from a high-density linkage map

The genetic architecture of local adaptation I: The genomic landscape of foxtail pine (Pinus balfouriana Grev. & Balf.) as revealed from a high-density linkage map
Christopher J Friedline, Brandon M Lind, Erin M Hobson, Douglas E Harwood, Annette Delfino-Mix, Patricia E Maloney, Andrew J Eckert
doi: http://dx.doi.org/10.1101/011106

Explaining the origin and evolutionary dynamics of the genetic architecture of adaptation is a major research goal of evolutionary genetics. Despite controversy surrounding success of the attempts to accomplish this goal, a full understanding of adaptive genetic variation necessitates knowledge about the genomic location and patterns of dispersion for the genetic components affecting fitness-related phenotypic traits. Even with advances in next generation sequencing technologies, the production of full genome sequences for non-model species is often cost prohibitive, especially for tree species such as pines where genome size often exceeds 20 to 30 Gbp. We address this need by constructing a dense linkage map for fox- tail pine (Pinus balfouriana Grev. & Balf.), with the ultimate goal of uncovering and explaining the origin and evolutionary dynamics of adaptive genetic variation in natural populations of this forest tree species. We utilized megagametophyte arrays (n = 76–95 megagametophytes/tree) from four maternal trees in combination with double-digestion restriction site associated DNA sequencing (ddRADseq) to produce a consensus linkage map covering 98.58% of the foxtail pine genome, which was estimated to be 1276 cM in length (95% CI: 1174cM to 1378cM). A novel bioinformatic approach using iterative rounds of marker ordering and imputation was employed to produce single-tree linkage maps (507–17066 contigs/map; lengths: 1037.40–1572.80 cM). These linkage maps were collinear across maternal trees, with highly correlated marker orderings (Spearman’s ρ > 0.95). A consensus linkage map derived from these single-tree linkage maps contained 12 linkage groups along which 20 655 contigs were non-randomly distributed across 901 unique positions (n = 23 contigs/position), with an average spacing of 1.34 cM between adjacent positions. Of the 20 655 contigs positioned on the consensus linkage map, 5627 had enough sequence similarity to contigs contained within the most recent build of the loblolly pine (P. taeda L.) genome to identify them as putative homologs containing both genic and non-genic loci. Importantly, all 901 unique positions on the consensus linkage map had at least one contig with putative homology to loblolly pine. When combined with the other biological signals that predominate in our data (e.g., correlations of recombination fractions across single trees), we show that dense linkage maps for non-model forest tree species can be efficiently constructed using next generation sequencing technologies. We subsequently discuss the usefulness of these maps as community-wide resources and as tools with which to test hypotheses about the genetic architecture of adaptation.

CauseMap: Fast inference of causality from complex time series

CauseMap: Fast inference of causality from complex time series
M. Cyrus Maher​, Ryan D. Hernandez

Background: Establishing health-related causal relationships is a central pursuit in biomedical research. Yet, the interdependent non-linearity of biological systems renders causal dynamics laborious and at times impractical to disentangle. This pursuit is further impeded by the dearth of time series that are sufficiently long to observe and understand recurrent patterns of flux. However, as data generation costs plummet and technologies like wearable devices democratize data collection, we anticipate a coming surge in the availability of biomedically-relevant time series data. Given the life-saving potential of these burgeoning resources, it is critical to invest in the development of open source software tools that are capable of drawing meaningful insight from vast amounts of time series data.

Results: Here we present CauseMap, the first open source implementation of convergent cross mapping (CCM), a method for establishing causality from long time series data (> ~25 observations). Compared to existing time series methods, CCM has the advantage of being model-free and robust to unmeasured confounding that could otherwise induce spurious associations. CCM builds on Takens’ Theorem, a well-established result from dynamical systems theory that requires only mild assumptions. This theorem allows us to reconstruct high dimensional system dynamics using a time series of only a single variable. These reconstructions can be thought of as shadows of the true causal system. If the reconstructed shadows can predict points from the opposing time series, we can infer that the corresponding variables are providing views of the same causal system, and so are causally related. Unlike traditional metrics, this test can establish the directionality of causation, even in the presence of feedback loops. Furthermore, since CCM can extract causal relationships from times series of, e.g. a single individual, it may be a valuable tool to personalized medicine. We implement CCM in Julia, a high-performance programming language designed for facile technical computing. Our software package, CauseMap, is platform-independent and freely available as an official Julia package.

Conclusions: CauseMap is an efficient implementation of a state-of-the-art algorithm for detecting causality from time series data. We believe this tool will be a valuable resource for biomedical research and personalized medicine.

Most viewed on Haldane’s Sieve: October 2014

The most viewed preprints this month were:

Analyses of Eurasian wild and domestic pig genomes reveals long-term gene-flow during domestication

Analyses of Eurasian wild and domestic pig genomes reveals long-term gene-flow during domestication

Laurent A.F. Frantz, Joshua Schraiber, Ole Madsen, Hendrik-Jan Megens, Alex Cagan, Mirte Bosse, Yogesh Paudel, Richard P.M.A. Crooijmans, Greger Larson, Martien A.M. Groenen
doi: http://dx.doi.org/10.1101/010959

Traditionally, the process of domestication is assumed to be initiated by people, involve few individuals and rely on reproductive isolation between wild and domestic forms. However, an emerging zooarcheological consensus depicts animal domestication as a long-term process without reproductive isolation or strong intentional selection. Here, we ask whether pig domestication followed a traditional linear model, or a complex, reticulate model as predicted by zooarcheologists. To do so, we fit models of domestication to whole genome data from over 100 wild and domestic pigs. We found that the assumptions of traditional models, such as reproductive isolation and strong domestication bottlenecks, are incompatible with the genetic data and provide support for the zooarcheological theory of a complex domestication process. In particular, gene-flow from wild to domestic pigs was a ubiquitous feature of the domestication of pigs. In addition, we show that despite gene-flow, the genomes of domestic pigs show strong signatures of selection at loci that affect behaviour and morphology. Specifically, our results are consistent with independent parallel sweeps in two independent domestication areas (China and Anatolia) at loci linked to morphological traits. We argue that recurrent selection for domestic traits likely counteracted the homogenising effect of gene-flow from wild boars and created “islands of domestication” in the genome. Overall, our results suggest that genomic approaches that allow for more complex models of domestication to be embraced should be employed. The results from these studies will have significant ramifications for studies that attempt to infer the origin of domesticated animals.

E. coli populations in unpredictably fluctuating environments evolve to face novel stresses through enhanced efflux activity

E. coli populations in unpredictably fluctuating environments evolve to face novel stresses through enhanced efflux activity

Shraddha Madhav Karve, Sachit Daniel, Yashraj Chavhan, Abhishek Anand, Somendra Singh Kharola, Sutirth Dey
doi: http://dx.doi.org/10.1101/011007

There is considerable understanding about how laboratory populations respond to predictable (constant or deteriorating-environment) selection for single environmental variables like temperature or pH. However, such insights may not apply when selection environments comprise multiple variables that fluctuate unpredictably, as is common in nature. To address this issue, we grew replicate laboratory populations of E. coli in nutrient broth whose pH and concentrations of salt (NaCl) and hydrogen peroxide (H2O2) were randomly changed daily. After ~170 generations, the fitness of the selected populations had not increased in any of the three selection environments. However, these selected populations had significantly greater fitness in four novel environments which have no known fitness-correlation with tolerance to pH, NaCl or H2O2. Interestingly, contrary to expectations, hypermutators did not evolve. Instead, the selected populations evolved an increased ability for energy dependent efflux activity that might enable them to throw out toxins, including antibiotics, from the cell at a faster rate. This provides an alternate mechanism for how evolvability can evolve in bacteria and potentially lead to broad-spectrum antibiotic resistance, even in the absence of prior antibiotic exposure. Given that environmental variability is increasing in nature, this might have serious consequences for public-health.