Slowing evolution is more effective than enhancing drug development for managing resistance
Nathan S. McClure, Troy Day
(Submitted on 29 Apr 2013)
Drug resistance is a serious public health problem that threatens to thwart our ability to treat many infectious diseases. Repeatedly, the introduction of new drugs has been followed by the evolution of resistance. In principle there are two ways to address this problem: (i) enhancing drug development, and (ii) slowing drug resistance. We present data and a modeling approach based on queueing theory that explores how interventions aimed at these two facets affect the ability of the entire drug supply system to provide service. Analytical and simulation-based results show that, all else equal, slowing the evolution of drug resistance is more effective at ensuring an adequate supply of effective drugs than is enhancing the rate at which new drugs are developed. This lends support to the idea that evolution management is not only a significant component of the solution to the problem of drug resistance, but may in fact be the most important component.
Positive selection drives faster-Z evolution in silkmoths
Timothy B. Sackton (1), Russell B. Corbett-Detig (1), Javaregowda Nagaraju (2), R. Lakshmi Vaishna (2), Kallare P. Arunkumar (2), Daniel L. Hartl (1) ((1) Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, USA, (2) Centre of Excellence for Genetics and Genomics of Silkmoths, Laboratory of Molecular Genetics, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India)
(Submitted on 29 Apr 2013)
Genes linked to X or Z chromosomes, which are hemizygous in the heterogametic sex, are predicted to evolve at different rates than those on autosomes. This faster-X effect can arise either as a consequence of hemizygosity which leads to more efficient selection for recessive beneficial mutations in the heterogametic sex, or as a consequence of reduced effective population size on the hemizygous chromosome, which leads to increased fixation of weakly deleterious mutations due to random genetic drift. Empirical results to date have suggested that, while the overall pattern across taxa is complicated, in general systems with male-heterogamy show a faster-X effect primarily attributable to more efficient selection, whereas systems with female-heterogamy show a faster-Z effect primarily attributable to increased drift. However, to date only a single female-heterogamic taxa has been investigated. In order to test the generality of the faster-Z pattern seen in birds, we sequenced the genome of the Lepidopteran insect Bombyx huttoni, a close outgroup of the domesticated silkmoth Bombyx mori. We show that silkmoths experience faster-Z evolution, but unlike in birds, the faster-Z effect appears to be attributable to more efficient positive selection in females. These results suggest that female-heterogamy alone is unlikely to be sufficient to explain the reduced efficacy of selection on the bird Z chromosome. Instead, it is likely that a combination of patterns of dosage compensation and overall effective population size, among other factors, influence patterns of faster-Z evolution.
Remote Homology Detection in Proteins Using Graphical Models
Noah M. Daniels
(Submitted on 24 Apr 2013)
Given the amino acid sequence of a protein, researchers often infer its structure and function by finding homologous, or evolutionarily-related, proteins of known structure and function. Since structure is typically more conserved than sequence over long evolutionary distances, recognizing remote protein homologs from their sequence poses a challenge.
We first consider all proteins of known three-dimensional structure, and explore how they cluster according to different levels of homology. An automatic computational method reasonably approximates a human-curated hierarchical organization of proteins according to their degree of homology.
Next, we return to homology prediction, based only on the one-dimensional amino acid sequence of a protein. Menke, Berger, and Cowen proposed a Markov random field model to predict remote homology for beta-structural proteins, but their formulation was computationally intractable on many beta-strand topologies.
We show two different approaches to approximate this random field, both of which make it computationally tractable, for the first time, on all protein folds. One method simplifies the random field itself, while the other retains the full random field, but approximates the solution through stochastic search. Both methods achieve improvements over the state of the art in remote homology detection for beta-structural protein folds.
Timing of ancient human Y lineage depends on the mutation rate: A comment on Mendez et al
Melissa A. Wilson Sayres
(Submitted on 22 Apr 2013)
Mendez et al. recently report the identification of a Y chromosome lineage from an African American that is an outgroup to all other known Y haplotypes, and report a time to most recent common ancestor, TMRCA, for human Y lineages that is substantially longer than any previous estimate. The identification of a novel Y haplotype is always exciting, and this haplotype, in particular, is unique in its basal position on the Y haplotype tree. However, at 338 (237-581) thousand years ago, kya, the extremely ancient TMRCA reported by Mendez et al. is inconsistent with the known human fossil record (which estimate the age of anatomically modern humans at 195 +- 5 kya), with estimates from mtDNA (176.6 +- 11.3 kya, and 204.9 (116.8-295.7) kya) and with population genetic theory. The inflated TMRCA can quite easily be attributed to the extremely low Y chromosome mutation rate used by the authors.
Methods to study splicing from high-throughput RNA Sequencing data
Gael P. Alamancos, Eneritz Agirre, Eduardo Eyras
(Submitted on 22 Apr 2013)
The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data. We group the methods according to the different questions they address: 1) Assignment of the sequencing reads to their likely gene of origin. This is addressed by methods that map reads to the genome and/or to the available gene annotations. 2) Recovering the sequence of splicing events and isoforms. This is addressed by transcript reconstruction and de novo assembly methods. 3) Quantification of events and isoforms. Either after reconstructing transcripts or using an annotation, many methods estimate the expression level or the relative usage of isoforms and/or events. 4) Providing an isoform or event view of differential splicing or expression. These include methods that compare relative event/isoform abundance or isoform expression across two or more conditions. 5) Visualizing splicing regulation. Various tools facilitate the visualization of the RNA-Seq data in the context of alternative splicing. In this review, we do not describe the specific mathematical models behind each method. Our aim is rather to provide an overview that could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.
The standard lateral gene transfer model is statistically consistent for pectinate four-taxon trees
Andreas Sand, Mike Steel
(Submitted on 22 Apr 2013)
Evolutionary events such as incomplete lineage sorting and lateral gene transfer constitute major problems for inferring species trees from gene trees, as they can sometimes lead to gene trees which conflict with the underlying species tree. One particularly simple and efficient way to infer species trees from gene trees under such conditions is to combine three-taxon analyses for several genes using a majority vote approach. For incomplete lineage sorting this method is known to be statistically consistent, however, in the case of lateral gene transfer it is known that a zone of inconsistency does exist for a specific four-taxon tree topology. In this paper we analyze all remaining four-taxon topologies and show that no other inconsistencies exist.
Informed and Automated k-Mer Size Selection for Genome Assembly
Rayan Chikhi, Paul Medvedev
(Submitted on 20 Apr 2013)
Genome assembly tools based on the de Bruijn graph framework rely on a parameter k, which represents a trade-off between several competing effects that are difficult to quantify. There is currently a lack of tools that would automatically estimate the best k to use and/or quickly generate histograms of k-mer abundances that would allow the user to make an informed decision.
We develop a fast and accurate sampling method that constructs approximate abundance histograms with a several orders of magnitude performance improvement over traditional methods. We then present a fast heuristic that uses the generated abundance histograms for putative k values to estimate the best possible value of k. We test the effectiveness of our tool using diverse sequencing datasets and find that its choice of k leads to some of the best assemblies.
Our tool KmerGenie is freely available at: this http URL
Comparing DNA sequence collections by direct comparison of compressed text indexes
Anthony J. Cox, Tobias Jakobi, Giovanna Rosone, Ole B. Schulz-Trieglaff
(Submitted on 19 Apr 2013)
Popular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored.
Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have high overlap with results from more standard reference-based methods.
Code to construct and compare the BWT of large genomic data sets is available at this http URL as part of the BEETL library.
This post is by Josh Schraiber on his paper (along with coauthors): Schraiber et al. Inferring non-neutral regulatory change in pathways from transcriptional profiling data arXived here.
We’ve known for a long time now that gene sequence alone does not determine phenotype. From the trivial example of differentiated cell types (which all have the same DNA) to now-common examples where species adapt to their environment by changing something other than protein-coding sequence, it’s clear that the expression level of a gene plays just as important a role in phenotypic development as does its sequence. Despite this fact, we still lack the kinds of tools that are widely available for detecting non-neutral evolution at the level of gene expression (in packages like PAML). Part of this problem lies in a fundamental lack of power. A single gene may have hundreds of sites, and the patterns that occur at all of those sites give us plenty of information to learn about accelerated substation rates and the like. But a gene (in a given environment) has just one expression level, so the sample size is often small and power is reduced.
This same problem occurs, of course, in phylogenetic studies of quantitative characters at the organismal level. The difference is that in those cases, researchers typically have access to tens, if not hundreds, of species with good quality measurements. Unfortunately, transcriptome-wide gene expression data can be difficult and costly to collect, so large-scale studies are few and far between.
Instead of trying to leverage large collections of species, we sought to utilize one of the benefits of transcriptome-wide profiles: data from lots and lots of genes. A common practice in molecular evolution is to run tests for selection on a gene-by-gene basis and then look for functional groups that are overrepresented (e.g. Gene Ontology enrichment). We turned that around and instead started with a priori defined gene groups (in our case, from Gene Ontology), looking to detect signal for a history of lineage-specific gene expression evolution, by jointly analyzing all the genes in a group simultaneously.
Doing this would potentially run into a problem of overfitting: should we try to fit a separate rate of evolution for each gene in the group? Instead, we borrowed a page from Ziheng Yang’s book and assumed that the rate of evolution across genes was inverse-gamma distributed. We chose this distribution mostly for for computational convenience, but it is important to note that it can cover a wide range of possibilities—from a model in which every gene evolves at the same rate to a distribution so fat-tailed that there is no average rate of evolution across the group! By fitting a distribution of rates across genes in a group, we are able to look for examples of lineage-specific evolution without being confounded by outlying genes.
We encourage you to check out our paper and let us know what you think
of our approach. In addition, our method will soon be available as an
R package (once I get around to doing all the documentation…) and we
would love to see people using it. If you are interested in getting an
early version of our package, please don’t hesitate to contact me:
Inferring non-neutral regulatory change in pathways from transcriptional profiling data
Joshua G. Schraiber, Yulia Mostovoy, Tiffany Y. Hsu, Rachel B. Brem
(Submitted on 19 Apr 2013)
An outstanding question in comparative genomics is the evolutionary importance of gene expression differences between species. Rigorous molecular-evolution methods to infer evidence for natural selection from transcriptional profiling data are at a premium in the field, and to date, phylogenetic approaches have not been well-suited to address the question in the small sets of taxa profiled in standard surveys of gene expression. To meet this challenge, we have developed a strategy to infer evolutionary histories from expression data by analyzing suites of genes of common function. In a manner conceptually similar to molecular-evolution models in which the evolutionary rates of DNA sequence at multiple loci follow a gamma distribution, we modeled expression of the genes of an a priori-defined pathway with rates drawn from an inverse-gamma distribution. We then developed a fitting strategy to infer the parameters of this distribution from expression measurements, and to identify gene groups whose expression patterns were consistent with evolutionary constraint or rapid evolution in particular species. Simulations confirmed the power and accuracy of our inference method. As an experimental testbed for our approach, we generated and analyzed transcriptional profiles of four Saccharomyces yeasts. The results revealed pathways with signatures of constrained and accelerated regulatory evolution in individual yeasts, and across the phylogeny, highlighting the prevalence of pathway- level expression change during the divergence of yeast species. We anticipate that our pathway-based phylogenetic approach will be of broad utility in the search to understand the evolutionary relevance of regulatory change.