Natural selection helps explain the small range of genetic variation within species

Natural selection helps explain the small range of genetic variation within species

Russell B. Corbett-Detig, Daniel L. Hartl, Timothy B. Sackton

The range of genetic diversity observed within natural populations is much more narrow than expected based on models of neutral molecular evolution. Although the increased efficacy of natural selection in larger populations has been invoked to explain this paradox, to date no tests of this hypothesis have been conducted. Here, we present an analysis of whole-genome polymorphism data and genetic maps from 39 species to estimate for each species the reduction in genetic variation attributable to the operation of natural selection on the genome. We find that species with larger population sizes do in fact show greater reductions in genetic variation. This finding provides the first experimental support for the hypothesis that natural selection contributes to the restricted range of within-species genetic diversity.

Recombination impacts damaging and disease mutations accumulation in human populations

Recombination impacts damaging and disease mutations accumulation in human populations

Julie Hussin, Alan Hodgkinson, Youssef Idaghdour, Jean-Christophe Grenier, Jean-Philippe Goulet, Elias Gbeha, Elodie Hip-Ki, Philip Awadalla

Many decades of theory have demonstrated that in non-recombining systems, slightly deleterious mutations accumulate non-reversibly, potentially driving the extinction of many asexual species. Non-recombining chromosomes in sexual organisms are thought to have degenerated in a similar fashion, however it is not clear the extent to which these processes operate along recombining chromosomes with highly variable rates of crossing over. Using high coverage sequencing data from over 1400 individuals, we show that recombination rate modulates the genomic distribution of putatively deleterious variants across the entire human genome. We find that exons in regions of low recombination are significantly enriched for deleterious and disease variants, a signature that varies in strength across worldwide human populations with different demographic histories. As low recombining regions are enriched for highly conserved genes with essential cellular functions and show an excess of mutations with demonstrated effect on health, this phenomenon likely affects disease susceptibility in humans.

Transcriptomic analysis of the lesser spotted catshark (Scyliorhinus canicula) pancreas, liver and brain reveals molecular level conservation of vertebrate pancreas function

Transcriptomic analysis of the lesser spotted catshark (Scyliorhinus canicula) pancreas, liver and brain reveals molecular level conservation of vertebrate pancreas function

John F Mulley, Adam D Hargreaves, Matthew J Hegarty, R. Scott Heller, Martin T Swain

Background Understanding the evolution of the vertebrate pancreas is key to understanding its functions. The chondrichthyes (cartilaginous fish such as sharks and rays) have been suggested to possess the most ancient example of a distinct pancreas with both hormonal (endocrine) and digestive (exocrine) roles, although the lack of genetic, genomic and transcriptomic data for cartilaginous fish has hindered a more thorough understanding of the molecular-level functions of the chondrichthyan pancreas, particularly with respect to their “unusual” energy metabolism (where ketone bodies and amino acids are the main oxidative fuel source) and their paradoxical ability to both maintain stable blood glucose levels and tolerate extensive periods of hypoglycemia. In order to shed light on some of these processes we have carried out the first large-scale comparative transcriptomic survey of multiple cartilaginous fish tissues: the pancreas, brain and liver of the lesser spotted catshark, Scyliorhinus canicula. Results We generated a mutli-tissue assembly comprising 86,006 contigs, of which 44,794 were assigned to a particular tissue or combination of tissue based on mapping of sequencing reads. We have characterised transcripts encoding genes involved in insulin regulation, glucose sensing, transcriptional regulation, signaling and digestion, as well as many peptide hormone precursors and their receptors for the first time. Comparisons to published mammalian pancreas transcriptomes reveals that mechanisms of glucose sensing and insulin regulation used to establish and maintain a stable internal environment are conserved across jawed vertebrates and likely pre-date the vertebrate radiation. Conservation of pancreatic hormones and genes encoding digestive proteins support the single, early evolution of a distinct pancreatic gland with endocrine and exocrine functions in vertebrates, although the peptide diversity of the early vertebrate pancreas has been overestimated as a result of the use of cross-reacting antisera in earlier studies. A three hormone islet organ is therefore the basal vertebrate condition, later elaborated upon only in the tetrapod lineage. Conclusions The cartilaginous fish are a great untapped resource for the reconstruction of patterns and processes of vertebrate evolution and new approaches such as those described in this paper will greatly facilitate their incorporation into the rank of “model organism”.

iRAP – an integrated RNA-seq Analysis Pipeline

iRAP – an integrated RNA-seq Analysis Pipeline

Nuno A. Fonseca, Robert Petryszak, John Marioni, Alvis Brazma

RNA-sequencing (RNA-Seq) has become the technology of choice for whole-transcriptome profiling. However, processing the millions of sequence reads generated requires considerable bioinformatics skills and computational resources. At each step of the processing pipeline many tools are available, each with specific advantages and disadvantages. While using a specific combination of tools might be desirable, integrating the different tools can be time consuming, often due to specificities in the formats of input/output files required by the different programs. Here we present iRAP, an integrated RNA-seq analysis pipeline that allows the user to select and apply their preferred combination of existing tools for mapping reads, quantifying expression, testing for differential expression. iRAP also includes multiple tools for gene set enrichment analysis and generates web browsable reports of the results obtained in the different stages of the pipeline. Depending upon the application, iRAP can be used to quantify expression at the gene, exon or transcript level. iRAP is aimed at a broad group of users with basic bioinformatics training and requires little experience with the command line. Despite this, it also provides more advanced users with the ability to customise the options used by their chosen tools.

Polyester: simulating RNA-seq datasets with differential transcript expression

Polyester: simulating RNA-seq datasets with differential transcript expression

Alyssa C Frazee, Andrew E Jaffe, Ben Langmead, Jeffrey Leek

Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially-constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data. Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with col- lections of RNA-seq reads. The main advantage of Polyester is the ability to simulate isoform-level differential expression across biological replicates for a variety of experimental designs at the read level. Differential expression signal can be simulated with either built-in or user-defined statistical models. Polyester is available on GitHub at https://github.com/alyssafrazee/polyester.

Phase transition on the convergence rate of parameter estimation under an Ornstein-Uhlenbeck diffusion on a tree

Phase transition on the convergence rate of parameter estimation under an Ornstein-Uhlenbeck diffusion on a tree

Cécile Ané, Lam Si Tung Ho, Sebastien Roch
(Submitted on 6 Jun 2014)

Diffusion processes on trees are commonly used in evolutionary biology to model the joint distribution of continuous traits, such as body mass, across species. Estimating the parameters of such processes from tip values presents challenges because of the intrinsic correlation between the observations produced by the shared evolutionary history, thus violating the standard independence assumption of large-sample theory. For instance Ho and An\’e \cite{HoAne13} recently proved that the mean (also known in this context as selection optimum) of an Ornstein-Uhlenbeck process on a tree cannot be estimated consistently from an increasing number of tip observations if the tree height is bounded. Here, using a fruitful connection to the so-called reconstruction problem in probability theory, we study the convergence rate of parameter estimation in the unbounded height case. For the mean of the process, we provide a necessary and sufficient condition for the consistency of the maximum likelihood estimator (MLE) and establish a phase transition on its convergence rate in terms of the growth of the tree. In particular we show that a loss of n‾‾√-consistency (i.e., the variance of the MLE becomes Ω(n−1), where n is the number of tips) occurs when the tree growth is larger than a threshold related to the phase transition of the reconstruction problem. For the covariance parameters, we give a novel, efficient estimation method which achieves n‾‾√-consistency under natural assumptions on the tree.

Testing the Toxicofera: comparative reptile transcriptomics casts doubt on the single, early evolution of the reptile venom system

Testing the Toxicofera: comparative reptile transcriptomics casts doubt on the single, early evolution of the reptile venom system

Adam D Hargreaves, Martin T Swain, Darren W Logan, John F Mulley

Background The identification of apparently conserved gene complements in the venom and salivary glands of a diverse set of reptiles led to the development of the Toxicofera hypothesis – the idea that there was a single, early evolution of the venom system in reptiles. However, this hypothesis is based largely on relatively small scale EST-based studies of only venom or salivary glands and toxic effects have been assigned to only some of these putative Toxcoferan toxins in some species. We set out to investigate the distribution of these putative venom toxin transcripts in order to investigate to what extent conservation of gene complements may reflect a bias in previous sampling efforts. Results We have carried out the first large-scale test of the Toxicofera hypothesis and found it lacking in a number of regards. Our quantitative transcriptomic analyses of venom and salivary glands and other body tissues in five species of reptile, together with the use of available RNA-Seq datasets for additional species shows that the majority of genes used to support the establishment and expansion of the Toxicofera are in fact expressed in multiple body tissues and most likely represent general maintenance or “housekeeping” genes. The apparent conservation of gene complements across the Toxicofera therefore reflects an artefact of incomplete tissue sampling. In other cases, the identification of a non-toxic paralog of a gene encoding a true venom toxin has led to confusion about the phylogenetic distribution of that venom component. Conclusions Venom has evolved multiple times in reptiles. In addition, the misunderstanding regarding what constitutes a toxic venom component, together with the misidentification of genes and the classification of identical or near-identical sequences as distinct genes has led to an overestimation of the complexity of reptile venoms in general, and snake venom in particular, with implications for our understanding of (and development of treatments to counter) the molecules responsible for the physiological consequences of snakebite.

Restriction and recruitment – gene duplication and the origin and evolution of snake venom toxins

Restriction and recruitment – gene duplication and the origin and evolution of snake venom toxins

Adam D Hargreaves, Martin T Swain, Matthew J Hegarty, Darren W Logan, John F Mulley

The genetic and genomic mechanisms underlying evolutionary innovations are of fundamental importance to our understanding of animal evolution. Snake venom represents one such innovation and has been hypothesised to have originated and diversified via a process that involves duplication of genes encoding body proteins and subsequent recruitment of the copy to the venom gland where natural selection can act to develop or increase toxicity. However, gene duplication is known to be a rare event in vertebrate genomes and the recruitment of duplicated genes to a novel expression domain (neofunctionalisation) is an even rarer process that requires the evolution of novel combinations of transcription factor binding sites in upstream regulatory regions. This hypothesis concerning the evolution of snake venom is therefore very unlikely. Nonetheless, it is often assumed to be established fact and this has hampered research into the true origins of snake venom toxins. We have generated transcriptomic data for a diversity of body tissues and salivary and venom glands from venomous and non-venomous reptiles, which has allowed us to critically evaluate this hypothesis. Our comparative transcriptomic analysis of venom and salivary glands and body tissues in five species of reptile reveals that snake venom does not evolve via the hypothesised process of duplication and recruitment of body proteins. Indeed, our results show that many proposed venom toxins are in fact expressed in a wide variety of body tissues, including the salivary gland of non-venomous reptiles and have therefore been restricted to the venom gland following duplication, not recruited. Thus snake venom evolves via the duplication and subfunctionalisation of genes encoding existing salivary proteins. These results highlight the danger of the “just-so story: in evolutionary biology, where an elegant and intuitive idea is repeated so often that it assumes the mantle of established fact, to the detriment of the field as a whole.

Simultaneous estimation of transcript abundances and transcript specific fragment distributions of RNA-Seq data with the Mix2 model

Simultaneous estimation of transcript abundances and transcript specific fragment distributions of RNA-Seq data with the Mix2 model

Andreas Tuerk, Gregor Wiktorin

Quantification of RNA transcripts with RNA-Seq is inaccurate due to positional fragmentation bias, which is not represented appropriately by current statistical models of RNA-Seq data. Another, less investigated, source of error is the inaccuracy of transcript start and end annotations. This article introduces the Mix2 (rd. ”mixquare”) model, which uses a mixture of probability distributions to model the transcript specific positional fragment bias. The parameters of the Mix2 model can be efficiently trained with the EM algorithm and are tied between similar transcripts. Transcript specific shift and scale parameters allow the Mix2 model to automatically correct inaccurate transcript start and end annotations. Experiments are conducted on synthetic data covering 7 genes of different complexity, 4 types of fragment bias and correct as well as incorrect transcript start and end annotations. Abundance estimates obtained by Cufflinks 2.2.0, PennSeq and the Mix2 model show superior performance of the Mix2 model in the vast majority of test conditions.

Complete plastid genome assembly of invasive plant, Centaurea diffusa

Complete plastid genome assembly of invasive plant, Centaurea diffusa

Kathryn G Turner, Christopher J Grassa

Invasive plants present both problems and possibilities for discovery, which may be addressed utilizing new genomic tools. Here we present the completed plastome assembly for the problematic invasive weed, Centaurea diffusa. This new tool represents a significant contribution to future studies of the ecological genomics of invasive plants, particularly this weedy genus, and studies of the Asteraceae in general.