Phase transition on the convergence rate of parameter estimation under an Ornstein-Uhlenbeck diffusion on a tree

Phase transition on the convergence rate of parameter estimation under an Ornstein-Uhlenbeck diffusion on a tree

Cécile Ané, Lam Si Tung Ho, Sebastien Roch
(Submitted on 6 Jun 2014)

Diffusion processes on trees are commonly used in evolutionary biology to model the joint distribution of continuous traits, such as body mass, across species. Estimating the parameters of such processes from tip values presents challenges because of the intrinsic correlation between the observations produced by the shared evolutionary history, thus violating the standard independence assumption of large-sample theory. For instance Ho and An\’e \cite{HoAne13} recently proved that the mean (also known in this context as selection optimum) of an Ornstein-Uhlenbeck process on a tree cannot be estimated consistently from an increasing number of tip observations if the tree height is bounded. Here, using a fruitful connection to the so-called reconstruction problem in probability theory, we study the convergence rate of parameter estimation in the unbounded height case. For the mean of the process, we provide a necessary and sufficient condition for the consistency of the maximum likelihood estimator (MLE) and establish a phase transition on its convergence rate in terms of the growth of the tree. In particular we show that a loss of n‾‾√-consistency (i.e., the variance of the MLE becomes Ω(n−1), where n is the number of tips) occurs when the tree growth is larger than a threshold related to the phase transition of the reconstruction problem. For the covariance parameters, we give a novel, efficient estimation method which achieves n‾‾√-consistency under natural assumptions on the tree.

Testing the Toxicofera: comparative reptile transcriptomics casts doubt on the single, early evolution of the reptile venom system

Testing the Toxicofera: comparative reptile transcriptomics casts doubt on the single, early evolution of the reptile venom system

Adam D Hargreaves, Martin T Swain, Darren W Logan, John F Mulley

Background The identification of apparently conserved gene complements in the venom and salivary glands of a diverse set of reptiles led to the development of the Toxicofera hypothesis – the idea that there was a single, early evolution of the venom system in reptiles. However, this hypothesis is based largely on relatively small scale EST-based studies of only venom or salivary glands and toxic effects have been assigned to only some of these putative Toxcoferan toxins in some species. We set out to investigate the distribution of these putative venom toxin transcripts in order to investigate to what extent conservation of gene complements may reflect a bias in previous sampling efforts. Results We have carried out the first large-scale test of the Toxicofera hypothesis and found it lacking in a number of regards. Our quantitative transcriptomic analyses of venom and salivary glands and other body tissues in five species of reptile, together with the use of available RNA-Seq datasets for additional species shows that the majority of genes used to support the establishment and expansion of the Toxicofera are in fact expressed in multiple body tissues and most likely represent general maintenance or “housekeeping” genes. The apparent conservation of gene complements across the Toxicofera therefore reflects an artefact of incomplete tissue sampling. In other cases, the identification of a non-toxic paralog of a gene encoding a true venom toxin has led to confusion about the phylogenetic distribution of that venom component. Conclusions Venom has evolved multiple times in reptiles. In addition, the misunderstanding regarding what constitutes a toxic venom component, together with the misidentification of genes and the classification of identical or near-identical sequences as distinct genes has led to an overestimation of the complexity of reptile venoms in general, and snake venom in particular, with implications for our understanding of (and development of treatments to counter) the molecules responsible for the physiological consequences of snakebite.

Restriction and recruitment – gene duplication and the origin and evolution of snake venom toxins

Restriction and recruitment – gene duplication and the origin and evolution of snake venom toxins

Adam D Hargreaves, Martin T Swain, Matthew J Hegarty, Darren W Logan, John F Mulley

The genetic and genomic mechanisms underlying evolutionary innovations are of fundamental importance to our understanding of animal evolution. Snake venom represents one such innovation and has been hypothesised to have originated and diversified via a process that involves duplication of genes encoding body proteins and subsequent recruitment of the copy to the venom gland where natural selection can act to develop or increase toxicity. However, gene duplication is known to be a rare event in vertebrate genomes and the recruitment of duplicated genes to a novel expression domain (neofunctionalisation) is an even rarer process that requires the evolution of novel combinations of transcription factor binding sites in upstream regulatory regions. This hypothesis concerning the evolution of snake venom is therefore very unlikely. Nonetheless, it is often assumed to be established fact and this has hampered research into the true origins of snake venom toxins. We have generated transcriptomic data for a diversity of body tissues and salivary and venom glands from venomous and non-venomous reptiles, which has allowed us to critically evaluate this hypothesis. Our comparative transcriptomic analysis of venom and salivary glands and body tissues in five species of reptile reveals that snake venom does not evolve via the hypothesised process of duplication and recruitment of body proteins. Indeed, our results show that many proposed venom toxins are in fact expressed in a wide variety of body tissues, including the salivary gland of non-venomous reptiles and have therefore been restricted to the venom gland following duplication, not recruited. Thus snake venom evolves via the duplication and subfunctionalisation of genes encoding existing salivary proteins. These results highlight the danger of the “just-so story: in evolutionary biology, where an elegant and intuitive idea is repeated so often that it assumes the mantle of established fact, to the detriment of the field as a whole.

Simultaneous estimation of transcript abundances and transcript specific fragment distributions of RNA-Seq data with the Mix2 model

Simultaneous estimation of transcript abundances and transcript specific fragment distributions of RNA-Seq data with the Mix2 model

Andreas Tuerk, Gregor Wiktorin

Quantification of RNA transcripts with RNA-Seq is inaccurate due to positional fragmentation bias, which is not represented appropriately by current statistical models of RNA-Seq data. Another, less investigated, source of error is the inaccuracy of transcript start and end annotations. This article introduces the Mix2 (rd. ”mixquare”) model, which uses a mixture of probability distributions to model the transcript specific positional fragment bias. The parameters of the Mix2 model can be efficiently trained with the EM algorithm and are tied between similar transcripts. Transcript specific shift and scale parameters allow the Mix2 model to automatically correct inaccurate transcript start and end annotations. Experiments are conducted on synthetic data covering 7 genes of different complexity, 4 types of fragment bias and correct as well as incorrect transcript start and end annotations. Abundance estimates obtained by Cufflinks 2.2.0, PennSeq and the Mix2 model show superior performance of the Mix2 model in the vast majority of test conditions.

Complete plastid genome assembly of invasive plant, Centaurea diffusa

Complete plastid genome assembly of invasive plant, Centaurea diffusa

Kathryn G Turner, Christopher J Grassa

Invasive plants present both problems and possibilities for discovery, which may be addressed utilizing new genomic tools. Here we present the completed plastome assembly for the problematic invasive weed, Centaurea diffusa. This new tool represents a significant contribution to future studies of the ecological genomics of invasive plants, particularly this weedy genus, and studies of the Asteraceae in general.

Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits

Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits

Darren Kessner, John Novembre

Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTLs) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides produces qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTLs under selection impacts the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50–100%) can be explained by detected QTLs in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates.

Genomic, transcriptomic and phenomic variation reveals the complex adaptation of modern maize breeding

Genomic, transcriptomic and phenomic variation reveals the complex adaptation of modern maize breeding

Haijun Liu, Xiaqing Wang, Marilyn Warburton, Weiwei Wen, Minliang Jin, Min Deng, Jie Liu, Hao Tong, Qingchun Pan, Xiaohong Yang, Jianbing Yan

The temperate-tropical division of early maize germplasm to different agricultural environments was arguably the greatest adaptation process associated with the success and near ubiquitous importance of global maize production. Deciphering this history is challenging, but new insight has been gained from the genomic, transcriptomic and phenotypic variation collected from 368 diverse temperate and tropical maize inbred lines in this study. This is the first attempt to systematically explore the mechanisms of the adaptation process. Our results indicated that divergence between tropical and temperate lines seem occur 3,400-6,700 years ago. A number of genomic selection signals and transcriptomic variants including differentially expressed individual genes and rewired co-expression networks of genes were identified. These candidate signals were found to be functionally related to stress response and most were associated with directionally selected traits, which may have been an advantage under widely varying environmental conditions faced by maize as it was migrated away from its domestication center. It?s also clear in our study that such stress adaptation could involve evolution of protein-coding sequences as well as transcriptome-level regulatory changes. This latter process may be a more flexible and dynamic way for maize to adapt to environmental changes over this dramatically short evolutionary time frame.

Natural variation in teosinte at the domestication locus teosinte branched1 (tb1)

Natural variation in teosinte at the domestication locus teosinte branched1 (tb1)

Laura Vann, Thomas Kono, Tanja Pyha ̈j ̈arvi, Matthew B Hufford, Jeffrey Ross-Ibarra

Premise of the study: The teosinte branched1 (tb1) gene is a major QTL controlling branching differences between maize and its wild progenitor, teosinte. The insertion of a transposable element (Hopscotch) upstream of tb1 is known to enhance the gene’s expression, causing reduced tillering in maize. Observations of the maize tb1 allele in teosinte and estimates of an insertion age of the Hopscotch that predates domestication led us to investigate its prevalence and potential role in teosinte. Methods: Prevalence of the Hopscotch element was assessed across an Americas-wide sample of 1110 maize and teosinte individuals using a co-dominant PCR assay. Population genetic summaries were calculated for a subset of individuals from four teosinte populations in central Mexico. Phenotypic data were also collected from a single teosinte population where Hopscotch was found segregating. Key results: Genotyping results suggest the Hopscotch element is at higher than expected frequency in teosinte. Analysis of linkage disequilibrium near tb1 does not support recent introgression of the Hopscotch allele from maize into teosinte. Population genetic signatures are consistent with selection on this locus revealing a potential ecological role for Hopscotch in teosinte. Finally, two greenhouse experiments with teosinte do not suggest tb1 controls tillering in natural populations. Conclusions: Our findings suggest the role of Hopscotch differs between maize and teosinte. Future work should assess tb1 expression levels in teosinte with and without the Hopscotch and more comprehensively phenotype teosinte to assess the ecological significance of the Hopscotch insertion and, more broadly, the tb1 locus in teosinte. Key words: domestication; maize; teosinte; teosinte branched1; transposable element

How the tortoise beats the hare: Slow and steady adaptation in structured populations suggests a rugged fitness landscape in bacteria

How the tortoise beats the hare: Slow and steady adaptation in structured populations suggests a rugged fitness landscape in bacteria

Joshua R. Nahum, Peter Godfrey-Smith, Brittany N. Harding, Joseph H. Marcus, Jared Carlson-Stevermer, Benjamin Kerr

In the context of Wright’s adaptive landscape, genetic epistasis can yield a multi-peaked or “rugged” topography. In an unstructured population, a lineage with selective access to multiple peaks is expected to rapidly fix on one, which may not be the highest peak. Contrarily, beneficial mutations in a population with spatially restricted migration take longer to fix, allowing distant parts of the population to explore the landscape semi-independently. Such a population can simultaneous discover multiple peaks and the genotype at the highest discovered peak is expected to fix eventually. Thus, structured populations sacrifice initial speed of adaptation for breadth of search. As in the Tortoise-Hare fable, the structured population (Tortoise) starts relatively slow, but eventually surpasses the unstructured population (Hare) in average fitness. In contrast, on single-peak landscapes (e.g., systems lacking epistasis), all uphill paths converge. Given such “smooth” topography, breadth of search is devalued, and a structured population only lags behind an unstructured population in average fitness (ultimately converging). Thus, the Tortoise-Hare pattern is an indicator of ruggedness. After verifying these predictions in simulated populations where ruggedness is manipulable, we then explore average fitness in metapopulations of Escherichia coli. Consistent with a rugged landscape topography, we find a Tortoise-Hare pattern. Further, we find that structured populations accumulate more mutations, suggesting that distant peaks are higher. This approach can be used to unveil landscape topography in other systems, and we discuss its application for antibiotic resistance, engineering problems, and elements of Wright’s Shifting Balance Process.

Predicting evolution from the shape of genealogical trees

Predicting evolution from the shape of genealogical trees

Richard A. Neher, Colin A. Russell, Boris I. Shraiman
(Submitted on 3 Jun 2014)

Given a sample of genome sequences from an asexual population, can one predict its evolutionary future? Here we demonstrate that the branching pattern of reconstructed genealogical trees contains information about the relative fitness of the sampled sequences and that this information can be used to infer the closest extant relative of future populations. Our approach is based on the assumption that evolution proceeds predominantly by accumulation of small effect mutations and does not require any species specific input. Hence, the resulting inference algorithm can be applied to any asexual population under persistent selection pressure. We demonstrate its performance using historical data on seasonal influenza A/H3N2 virus. We predict the progenitor lineage of the upcoming influenza season with near optimal performance in 30% of cases and makes informative predictions in 16 out of 18 years. Beyond providing a practical tool for prediction, our results suggest that continuous adaptation by small effect mutations is a major component of influenza virus evolution.