Determining Exon Connectivity in Complex mRNAs by Nanopore Sequencing

Determining Exon Connectivity in Complex mRNAs by Nanopore Sequencing

Mohan Bolisetty, Gopinath Rajadinakaran, Brenton Graveley
doi: http://dx.doi.org/10.1101/019752

Though powerful, short-read high throughput RNA sequencing is limited in its ability to directly measure exon connectivity in mRNAs containing multiple alternative exons located farther apart than the maximum read lengths. Here, we use the Oxford Nanopore MinION™ sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be an important method for comprehensive transcriptome characterization.

Genomic epidemiology of the current wave of artemisinin resistant malaria

Genomic epidemiology of the current wave of artemisinin resistant malaria

Roberto Amato, Olivo Miotto, Charles Woodrow, Jacob Almagro-Garcia, Ipsita Sinha, Susana Campino, Daniel Mead, Eleanor Drury, Mihir Kekre, Mandy Sanders, Alfred Amambua-Ngwa, Chanaki Amaratunga, Lucas Amenga-Etego, Tim JC Anderson, Voahangy Andrianaranjaka, Tobias Apinjoh, Elizabeth Ashley, Sarah Auburn, Gordon A Awandare, Vito Baraka, Alyssa Barry, Maciej F Boni, Steffen Borrmann, Teun Bousema, Oralee Branch, Peter C Bull, Kesinee Chotivanich, David J Conway, Alister Craig, Nicholas P Day, Abdoulaye Djimdé, Christiane Dolecek, Arjen M Dondorp, Chris Drakeley, Patrick Duffy, Diego F Echeverri-Garcia, Thomas G Egwang, Rick M Fairhurst, Md. Abul Faiz, Caterina I Fanello, Tran Tinh Hien, Abraham Hodgson, Mallika Imwong, Deus Ishengoma, Pharath Lim, Chanthap Lon, Jutta Marfurt, Kevin Marsh, Mayfong Mayxay, Victor Mobegi, Olugbenga Mokuolu, Jacqui Montgomery, Ivo Mueller, Myat Phone Kyaw, Paul N Newton, Francois Nosten, Rintis Noviyanti, Alexis Nzila, Harold Ocholla, Abraham Oduro, Marie Onyamboko, Jean-Bosco Ouedraogo, Aung Pyae Phyo, Christopher V Plowe, Ric N Price, Sasithon Pukrittayakamee, Milijaona Randrianarivelojosia, Pascal Ringwald, Lastenia Ruiz, David Saunders, Alex Shayo, Peter Siba, Shannon Takala-Harrison, Thuy-Nhien Nguyen Thanh, Vandana Thathy, Federica Verra, Nicholas J White, Ye Htut, Victoria J Cornelius, Rachel Giacomantonio, Dawn Muddyman, Christa Henrichs, Cinzia Malangone, Dushyanth Jyothi, Richard D Pearson, Julian C Rayner, Gilean McVean, Kirk Rockett, Alistair Miles, Paul Vauterin, Ben Jeffery, Magnus Manske, Jim Stalker, Bronwyn MacInnis, Dominic P Kwiatkowski, for the MalariaGEN Plasmodium falciparum Community
doi: http://dx.doi.org/10.1101/019737

Artemisinin resistant Plasmodium falciparum is advancing across Southeast Asia in a soft selective sweep involving at least 20 independent kelch13 mutations. In a large global survey, we find that kelch13 mutations which cause resistance in Southeast Asia are present at low frequency in Africa. We show that African kelch13 mutations have originated locally, and that kelch13 shows a normal variation pattern relative to other genes in Africa, whereas in Southeast Asia there is a great excess of non‐synonymous mutations, many of which cause radical amino‐acid changes. Thus, kelch13 is not currently undergoing strong selection in Africa, despite a deep reservoir of standing variation that could potentially allow resistance to emerge rapidly. The practical implications are that public health surveillance for artemisinin resistance should not rely on kelch13 data alone, and interventions to prevent resistance must account for local evolutionary conditions, shown by genomic epidemiology to differ greatly between geographical regions.

Ancestral chromatin configuration constrains chromatin evolution on differentiating sex chromosomes in Drosophila

Ancestral chromatin configuration constrains chromatin evolution on differentiating sex chromosomes in Drosophila

Qi Zhou, Doris Bachtrog
doi: http://dx.doi.org/10.1101/019786

Sex chromosomes evolve distinctive types of chromatin from a pair of ancestral autosomes that are usually euchromatic. In Drosophila, the dosage-compensated X becomes enriched for hyperactive chromatin in males (mediated by H4K16ac), while the Y chromosome acquires silencing heterochromatin (enriched for H3K9me2/3). Drosophila autosomes are typically mostly euchromatic but the small dot chromosome has evolved a heterochromatin-like milieu (enriched for H3K9me2/3) that permits the normal expression of dot-linked genes, but which is different from typical pericentric heterochromatin. In Drosophila busckii, the dot chromosomes have fused to the ancestral sex chromosomes, creating a pair of ‘neo-sex’ chromosomes. Here we collect genomic, transcriptomic and epigenomic data from D. busckii, to investigate the evolutionary trajectory of sex chromosomes from a largely heterochromatic ancestor. We show that the neo-sex chromosomes formed <1 million years ago, but nearly 60% of neo-Y linked genes have already become non-functional. Expression levels are generally lower for the neo-Y alleles relative to their neo-X homologs, and the silencing heterochromatin mark H3K9me2, but not H3K9me3, is significantly enriched on silenced neo-Y genes. Despite rampant neo-Y degeneration, we find that the neo-X is deficient for the canonical histone modification mark of dosage compensation (H4K16ac), relative to autosomes or the compensated ancestral X chromosome, possibly reflecting constraints imposed on evolving hyperactive chromatin in an originally heterochromatic environment. Yet, neo-X genes are transcriptionally more active in males, relative to females, suggesting the evolution of incipient dosage compensation on the neo-X. Our data show that Y degeneration proceeds quickly after sex chromosomes become established through genomic and epigenetic changes, and are consistent with the idea that the evolution of sex-linked chromatin is influenced by its ancestral configuration.

A Coalescent Model of a Sweep from a Uniquely Derived Standing Variant

A Coalescent Model of a Sweep from a Uniquely Derived Standing Variant

Jeremy J Berg, Graham Coop
doi: http://dx.doi.org/10.1101/019612

The use of genetic polymorphism data to understand the dynamics of adaptation and identify the loci that are involved has become a major pursuit of modern evolutionary genetics. In addition to the classical “hard sweep” hitchhiking model, recent research has drawn attention to the fact that the dynamics of adaptation can play out in a variety of different ways, and that the specific signatures left behind in population genetic data may depend somewhat strongly on these dynamics. One particular model for which a large number of empirical examples are already known is that in which a single derived mutation arises and drifts to some low frequency before an environmental change causes the allele to become beneficial and sweeps to fixation. Here, we pursue an analytical investigation of this model, bolstered and extended via simulation study. We use coalescent theory to develop an analytical approximation for the effect of a sweep from standing variation on the genealogy at the locus of the selected allele and sites tightly linked to it. We show that the distribution of haplotypes that the selected allele is present on at the time of the environmental change can be approximated by considering recombinant haplotypes as alleles in the infinite alleles model. We show that this approximation can be leveraged to make accurate predictions regarding patterns of genetic polymorphism following such a sweep. We then use simulations to highlight which sources of haplotypic information are likely to be most useful in distinguishing this model from neutrality, as well as from other sweep models, such as the classic hard sweep, and multiple mutation soft sweeps. We find that in general, adaptation from a uniquely derived standing variant will be difficult to detect on the basis of genetic polymorphism data alone, and when it can be detected, it will be difficult to distinguish from other varieties of selective sweeps.

An explicit Poisson-Kolmogorov-Smirnov test for the molecular clock in phylogenies

An explicit Poisson-Kolmogorov-Smirnov test for the molecular clock in phylogenies

Fernando Marcon, Fernando Antoneli, Marcelo R. S. Briones
(Submitted on 21 May 2015)

Divergence dates estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests for the molecular clock. Testing for global and local clocks generally compare a clock-constrained tree versus a non-clock tree (e.g. the likelihood ratio test). These tests verify the evolutionary rate homogeneity among taxa and usually employ the chi-square test for rejection/acceptance of the “clock-like” phylogeny. The paradox is that the molecular clock hypothesis, as proposed, is a Poisson process, and therefore, non-homogeneous. Here we propose a method for testing the molecular clock in phylogenies that is built upon the assumption of Poisson stochastic process that accommodates rate heterogeneity and is based on ensembles of trees inferred by the Bayesian method. The observed distribution of branch lengths (number of substitutions) is obtained from the ensemble of post burn-in Bayesian search. The parameter λ of the expected Poisson distribution is given by the average branch length of this ensemble. The goodness-of-fit test is performed using a modified Kolmogorov-Smirnov test for Poisson distributions. The method here introduced uses a large number of statistically equivalent phylogenies to obtain the observed distribution. This circumvents problems of small sample size (lack of power and lack of information), because the power of the test is asymptotic to unity. Also, the observed distribution obtained is very robust in the sense that for a sufficient number of trees (700) the empirical distribution stabilizes. Therefore, the estimated parameter λ, used to define the expected distribution, is essentially independent of sample size.

A flexible, efficient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data

A flexible, efficient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data

Amanda J Lea, Susan C Albert, Jenny Tung, Xiang Zhou
doi: http://dx.doi.org/10.1101/019562

Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for estimating DNA methylation levels at base-pair resolution, and for investigating the major drivers of epigenetic variation. However, modeling bisulfite sequencing data presents several challenges. Methylation levels are estimated from proportional read counts, yet coverage can vary dramatically across sites and samples. Further, methylation levels are influenced by genetic variation, and controlling for genetic covariance (e.g., kinship or population structure) is crucial for avoiding potential false positives. To address these challenges, we combine a binomial mixed model with an efficient sampling-based algorithm (MACAU) for approximate parameter estimation and p-value computation. This framework allows us to account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Furthermore, by leveraging the advantages of an auxiliary variable-based sampling algorithm and recent mixed model innovations, MACAU substantially reduces computational complexity and can thus be applied to large, genome-wide data sets. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that, compared to existing approaches, our method provides better calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at http://www.xzlab.org/software.html.

An empirical approach to demographic inference

An empirical approach to demographic inference

Peter L. Ralph
(Submitted on 21 May 2015)

Inference with population genetic data usually treats the population pedigree as a nuisance parameter, the unobserved product of a past history of random mating. However, the history of genetic relationships in a given population is a fixed, unobserved object, and so an alternative approach is to treat this network of relationships as a complex object we wish to learn about, by observing how genomes have been noisily passed down through it. This paper explores this point of view, showing how to translate questions about population genetic data into calculations with a Poisson process of mutations on all ancestral genomes. This method is applied to give a robust interpretation to the f4 statistic used to identify admixture, and to design a new statistic that measures covariances in mean times to most recent common ancestor between two pairs of sequences. The method more generally interprets population genetic statistics in terms of sums of specific functions over ancestral genomes, thereby providing concrete, broadly interpretable interpretations for these statistics. This provides a method for describing demographic history without simplified demographic models. More generally, it brings into focus the population pedigree, which is averaged over in model-based demographic inference.

Dynamics of Wolbachia pipientis gene expression across the Drosophila melanogaster life cycle

Dynamics of Wolbachia pipientis gene expression across the Drosophila melanogaster life cycle

Florence Gutzwiller, Catarina R. Carmo, Danny E. Miller, Danny W. Rice, Irene L. Newton, Luis Teixeira, Casey M. Bergman
(Submitted on 21 May 2015)

Symbiotic interactions between microbes and their multicellular hosts have manifold impacts on molecular, cellular and organismal biology. To identify candidate bacterial genes involved in maintaining endosymbiotic associations with insect hosts, we analyzed genome-wide patterns of gene expression in the alpha-proteobacteria Wolbachia pipientis across the life cycle of Drosophila melanogaster using public data from the modENCODE project that was generated in a Wolbachia-infected version of the ISO1 reference strain. We find that the majority of Wolbachia genes are expressed at detectable levels in D. melanogaster across the entire life cycle, but that only 7.8% of 1195 Wolbachia genes exhibit robust stage- or sex-specific expression differences when studied in the “holo-organism” context. Wolbachia genes that are differentially expressed during development are typically up-regulated after D. melanogaster embryogenesis, and include many bacterial membrane, secretion system and ankyrin-repeat containing proteins. Sex-biased genes are often organised as small operons of uncharacterised genes and are mainly up-regulated in adult males D. melanogaster in an age-dependent manner suggesting a potential role in cytoplasmic incompatibility. Our results indicate that large changes in Wolbachia gene expression across the Drosophila life-cycle are relatively rare when assayed across all host tissues, but that candidate genes to understand host-microbe interaction in facultative endosymbionts can be successfully identified using holo-organism expression profiling. Our work also shows that mining public gene expression data in D. melanogaster provides a rich set of resources to probe the functional basis of the Wolbachia-Drosophila symbiosis and annotate the transcriptional outputs of the Wolbachia genome.

Inference of Ancestral Recombination Graphs through Topological Data Analysis

Inference of Ancestral Recombination Graphs through Topological Data Analysis

Pablo G. Camara, Arnold J. Levine, Raul Rabadan
(Submitted on 21 May 2015)

The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relations within and across species. Recombination, reassortment, horizontal gene transfer, and species hybridization constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from tens or hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs (ARGs) represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, ARGs are computationally costly to reconstruct and usually become infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build on previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. For that aim, we extend the notion of barcodes in persistent homology, largely increasing their sensitivity to recombination, and present a new type of summary graph (topological ARG, or tARG), analogous to ARGs, that capture ensembles of minimal recombination histories. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations and horizontal evolution in finches inhabiting the Gal\’apagos Islands.

Author post: Coalescent times and patterns of genetic diversity in species with facultative sex

This guest post is by Matthew Hartfield (@mathyhartfield) on “Coalescent times and patterns of genetic diversity in species with facultative sex”.

Our paper “Coalescent times and patterns of genetic diversity in species with facultative sex”, in which we investigate the genealogies of facultative sexuals, is now available from the biorxiv.

Most evolutionary biologists are obsessed with sex. Explaining why organisms reproduce sexually by combining genetic material is a tough problem. The main issue lies with the fact that asexuality (reproduction via clonality) should be able to outcompete sexuals due to sheer weight of numbers. Various theories have been put forward to explain why sex is so widespread. The majority of these revolve around the idea that exchanging genetic material enables the fittest possible genotype to be created, while that of asexuals should degrade over time.

While such theories are ubiquitous, data to test them has been scarce. Recent years have seen a boom in exploring the evolution of sex experimentally using facultative sexual organisms: species that can switch between sexual and asexual reproduction. Such experiments have demonstrated how sexual reproduction can evolve when exposed to stressful environments, or when moving between environmentally different areas. Yet major questions remain regarding what the underlying genetic causes of these transitions are. In addition, there are plenty of organisms that undergo ‘cryptic’ sex, which cannot be observed directly but can with genomic sequence analyses.

Coalescent models are important for analysing genomic data. These tools determine the relationship between neutral markers, and hence make predictions on how genetic diversity is affected depending on environmental structuring, localised natural selection, or other effects. However, classic models cannot be applied to systems with partial asexuality, as they assume the population reproduces entirely sexually.

We worked on introducing partial rates of sex into these models. In the simplest case (one population with a fixed rate of sex), we recovered a classic prediction that extensive divergence between alleles at the same site arises. This phenomenon occurs since lack of sex keeps the two alleles distinct over evolutionary time; only a rare bout of sex has any chance of creating the segregation needed for them to be descended from the same allele.

A schematic of Allelic Sequence Divergence (ASD) in asexuals: Xs are distinct mutations at each neutral site.

A schematic of Allelic Sequence Divergence (ASD) in asexuals: Xs are distinct mutations at each neutral site.

After recovering this familiar result, we worked to extend coalescent theory in partial asexuals to include various other biological phenomena. Two effects we looked at were gene conversion, and heterogeneity in sex rates that change over time or space.

Gene conversion, where one DNA sequence replaces part of a homologous chromosome, is usually regarded as being of minor evolutionary importance. Yet numerous studies of facultative sexuals often observe it as a common force, especially in species not exhibiting allelic sequence divergence (ASD). Could the two be related? Excitingly, we found that low rates of gene conversion become important in organisms with low rates of sex. That is, once sex becomes so rare as to caused ASD, small rates of gene conversion can then reverse the process, homogenizing alleles again. Rather than having higher diversity than otherwise similar sexual populations as expected with ASD (in the absence of gene conversion), asexual populations will have less diversity than comparable sexual populations if gene conversion is not too low.

It is also known that many organisms change their rates of sex over time or location, which can be triggered by environmental cues or organismal stress. By investigating such variation in the rate of sex, the analysis elegantly shows how even a short burst to obligate sex (over tens of generations) is enough to jumble genomes in the population, hence giving the same outcome as long-term obligate sex. If rates of sex are also different in separate geographical locals, then these differences can be detected if there is little gene flow between regions. Otherwise, both areas display intermediate rates of sex.

Coalescent tools are popular since they can be used to simulate complex evolutionary outcomes, which are then tested against genomic data. We used the mathematical analyses to outline a coalescent algorithm to account for partial rates of sex, and predict genetic diversity, under numerous scenarios. The code is available online (http://github.com/MattHartfield/FacSexCoalescent) for others to use.

These are exciting times for population genetics and evolution, with cheaper sequencing costs making it possible to wade through the genomes of more individuals than before. Yet accurately exploring the genetic landscape requires the creation of mathematical tools that accounts for organismal life history. These results will provide the first of many buildings blocks to determine the effects of selection and the environment on the evolution of facultative sexuals. They might eventually reveal why sex is so prevalent in nature.