Distance from Sub-Saharan Africa Predicts Mutational Load in Diverse Human Genomes

Distance from Sub-Saharan Africa Predicts Mutational Load in Diverse Human Genomes

Brenna M. Henn, Laura R Botigue, Stephan Peischl, Isabelle Dupanloup, Mikhail Lipatov, Brian K Maples, Alicia R Martin, Shaila Musharoff, Howard Cann, Michael Snyder, Laurent Excoffier, Jeffrey Kidd, Carlos D Bustamante
doi: http://dx.doi.org/10.1101/019711

The Out-of-Africa (OOA) dispersal ~50,000 years ago is characterized by a series of founder events as modern humans expanded into multiple continents. Population genetics theory predicts an increase of mutational load in populations undergoing serial founder effects during range expansions. To test this hypothesis, we have sequenced full genomes and high-coverage exomes from 7 geographically divergent human populations from Namibia, Congo, Algeria, Pakistan, Cambodia, Siberia and Mexico. We find that individual genomes vary modestly in the overall number of predicted deleterious alleles. We show via spatially explicit simulations that the observed distribution of deleterious allele frequencies is consistent with the OOA dispersal, particularly under a model where deleterious mutations are recessive. We conclude that there is a strong signal of purifying selection at conserved genomic positions within Africa, but that many predicted deleterious mutations have evolved as if they were neutral during the expansion out of Africa. Under a model where selection is inversely related to dominance, we show that OOA populations are likely to have a higher mutation load due to increased allele frequencies of nearly neutral variants that are recessive or partially recessive.

Determining Exon Connectivity in Complex mRNAs by Nanopore Sequencing

Determining Exon Connectivity in Complex mRNAs by Nanopore Sequencing

Mohan Bolisetty, Gopinath Rajadinakaran, Brenton Graveley
doi: http://dx.doi.org/10.1101/019752

Though powerful, short-read high throughput RNA sequencing is limited in its ability to directly measure exon connectivity in mRNAs containing multiple alternative exons located farther apart than the maximum read lengths. Here, we use the Oxford Nanopore MinION™ sequencer to identify 7,899 ‘full-length’ isoforms expressed from four Drosophila genes, Dscam1, MRP, Mhc, and Rdl. These results demonstrate that nanopore sequencing can be used to deconvolute individual isoforms and that it has the potential to be an important method for comprehensive transcriptome characterization.

Genomic epidemiology of the current wave of artemisinin resistant malaria

Genomic epidemiology of the current wave of artemisinin resistant malaria

Roberto Amato, Olivo Miotto, Charles Woodrow, Jacob Almagro-Garcia, Ipsita Sinha, Susana Campino, Daniel Mead, Eleanor Drury, Mihir Kekre, Mandy Sanders, Alfred Amambua-Ngwa, Chanaki Amaratunga, Lucas Amenga-Etego, Tim JC Anderson, Voahangy Andrianaranjaka, Tobias Apinjoh, Elizabeth Ashley, Sarah Auburn, Gordon A Awandare, Vito Baraka, Alyssa Barry, Maciej F Boni, Steffen Borrmann, Teun Bousema, Oralee Branch, Peter C Bull, Kesinee Chotivanich, David J Conway, Alister Craig, Nicholas P Day, Abdoulaye Djimdé, Christiane Dolecek, Arjen M Dondorp, Chris Drakeley, Patrick Duffy, Diego F Echeverri-Garcia, Thomas G Egwang, Rick M Fairhurst, Md. Abul Faiz, Caterina I Fanello, Tran Tinh Hien, Abraham Hodgson, Mallika Imwong, Deus Ishengoma, Pharath Lim, Chanthap Lon, Jutta Marfurt, Kevin Marsh, Mayfong Mayxay, Victor Mobegi, Olugbenga Mokuolu, Jacqui Montgomery, Ivo Mueller, Myat Phone Kyaw, Paul N Newton, Francois Nosten, Rintis Noviyanti, Alexis Nzila, Harold Ocholla, Abraham Oduro, Marie Onyamboko, Jean-Bosco Ouedraogo, Aung Pyae Phyo, Christopher V Plowe, Ric N Price, Sasithon Pukrittayakamee, Milijaona Randrianarivelojosia, Pascal Ringwald, Lastenia Ruiz, David Saunders, Alex Shayo, Peter Siba, Shannon Takala-Harrison, Thuy-Nhien Nguyen Thanh, Vandana Thathy, Federica Verra, Nicholas J White, Ye Htut, Victoria J Cornelius, Rachel Giacomantonio, Dawn Muddyman, Christa Henrichs, Cinzia Malangone, Dushyanth Jyothi, Richard D Pearson, Julian C Rayner, Gilean McVean, Kirk Rockett, Alistair Miles, Paul Vauterin, Ben Jeffery, Magnus Manske, Jim Stalker, Bronwyn MacInnis, Dominic P Kwiatkowski, for the MalariaGEN Plasmodium falciparum Community
doi: http://dx.doi.org/10.1101/019737

Artemisinin resistant Plasmodium falciparum is advancing across Southeast Asia in a soft selective sweep involving at least 20 independent kelch13 mutations. In a large global survey, we find that kelch13 mutations which cause resistance in Southeast Asia are present at low frequency in Africa. We show that African kelch13 mutations have originated locally, and that kelch13 shows a normal variation pattern relative to other genes in Africa, whereas in Southeast Asia there is a great excess of non‐synonymous mutations, many of which cause radical amino‐acid changes. Thus, kelch13 is not currently undergoing strong selection in Africa, despite a deep reservoir of standing variation that could potentially allow resistance to emerge rapidly. The practical implications are that public health surveillance for artemisinin resistance should not rely on kelch13 data alone, and interventions to prevent resistance must account for local evolutionary conditions, shown by genomic epidemiology to differ greatly between geographical regions.

Ancestral chromatin configuration constrains chromatin evolution on differentiating sex chromosomes in Drosophila

Ancestral chromatin configuration constrains chromatin evolution on differentiating sex chromosomes in Drosophila

Qi Zhou, Doris Bachtrog
doi: http://dx.doi.org/10.1101/019786

Sex chromosomes evolve distinctive types of chromatin from a pair of ancestral autosomes that are usually euchromatic. In Drosophila, the dosage-compensated X becomes enriched for hyperactive chromatin in males (mediated by H4K16ac), while the Y chromosome acquires silencing heterochromatin (enriched for H3K9me2/3). Drosophila autosomes are typically mostly euchromatic but the small dot chromosome has evolved a heterochromatin-like milieu (enriched for H3K9me2/3) that permits the normal expression of dot-linked genes, but which is different from typical pericentric heterochromatin. In Drosophila busckii, the dot chromosomes have fused to the ancestral sex chromosomes, creating a pair of ‘neo-sex’ chromosomes. Here we collect genomic, transcriptomic and epigenomic data from D. busckii, to investigate the evolutionary trajectory of sex chromosomes from a largely heterochromatic ancestor. We show that the neo-sex chromosomes formed <1 million years ago, but nearly 60% of neo-Y linked genes have already become non-functional. Expression levels are generally lower for the neo-Y alleles relative to their neo-X homologs, and the silencing heterochromatin mark H3K9me2, but not H3K9me3, is significantly enriched on silenced neo-Y genes. Despite rampant neo-Y degeneration, we find that the neo-X is deficient for the canonical histone modification mark of dosage compensation (H4K16ac), relative to autosomes or the compensated ancestral X chromosome, possibly reflecting constraints imposed on evolving hyperactive chromatin in an originally heterochromatic environment. Yet, neo-X genes are transcriptionally more active in males, relative to females, suggesting the evolution of incipient dosage compensation on the neo-X. Our data show that Y degeneration proceeds quickly after sex chromosomes become established through genomic and epigenetic changes, and are consistent with the idea that the evolution of sex-linked chromatin is influenced by its ancestral configuration.

A Coalescent Model of a Sweep from a Uniquely Derived Standing Variant

A Coalescent Model of a Sweep from a Uniquely Derived Standing Variant

Jeremy J Berg, Graham Coop
doi: http://dx.doi.org/10.1101/019612

The use of genetic polymorphism data to understand the dynamics of adaptation and identify the loci that are involved has become a major pursuit of modern evolutionary genetics. In addition to the classical “hard sweep” hitchhiking model, recent research has drawn attention to the fact that the dynamics of adaptation can play out in a variety of different ways, and that the specific signatures left behind in population genetic data may depend somewhat strongly on these dynamics. One particular model for which a large number of empirical examples are already known is that in which a single derived mutation arises and drifts to some low frequency before an environmental change causes the allele to become beneficial and sweeps to fixation. Here, we pursue an analytical investigation of this model, bolstered and extended via simulation study. We use coalescent theory to develop an analytical approximation for the effect of a sweep from standing variation on the genealogy at the locus of the selected allele and sites tightly linked to it. We show that the distribution of haplotypes that the selected allele is present on at the time of the environmental change can be approximated by considering recombinant haplotypes as alleles in the infinite alleles model. We show that this approximation can be leveraged to make accurate predictions regarding patterns of genetic polymorphism following such a sweep. We then use simulations to highlight which sources of haplotypic information are likely to be most useful in distinguishing this model from neutrality, as well as from other sweep models, such as the classic hard sweep, and multiple mutation soft sweeps. We find that in general, adaptation from a uniquely derived standing variant will be difficult to detect on the basis of genetic polymorphism data alone, and when it can be detected, it will be difficult to distinguish from other varieties of selective sweeps.

An explicit Poisson-Kolmogorov-Smirnov test for the molecular clock in phylogenies

An explicit Poisson-Kolmogorov-Smirnov test for the molecular clock in phylogenies

Fernando Marcon, Fernando Antoneli, Marcelo R. S. Briones
(Submitted on 21 May 2015)

Divergence dates estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests for the molecular clock. Testing for global and local clocks generally compare a clock-constrained tree versus a non-clock tree (e.g. the likelihood ratio test). These tests verify the evolutionary rate homogeneity among taxa and usually employ the chi-square test for rejection/acceptance of the “clock-like” phylogeny. The paradox is that the molecular clock hypothesis, as proposed, is a Poisson process, and therefore, non-homogeneous. Here we propose a method for testing the molecular clock in phylogenies that is built upon the assumption of Poisson stochastic process that accommodates rate heterogeneity and is based on ensembles of trees inferred by the Bayesian method. The observed distribution of branch lengths (number of substitutions) is obtained from the ensemble of post burn-in Bayesian search. The parameter λ of the expected Poisson distribution is given by the average branch length of this ensemble. The goodness-of-fit test is performed using a modified Kolmogorov-Smirnov test for Poisson distributions. The method here introduced uses a large number of statistically equivalent phylogenies to obtain the observed distribution. This circumvents problems of small sample size (lack of power and lack of information), because the power of the test is asymptotic to unity. Also, the observed distribution obtained is very robust in the sense that for a sufficient number of trees (700) the empirical distribution stabilizes. Therefore, the estimated parameter λ, used to define the expected distribution, is essentially independent of sample size.

A flexible, efficient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data

A flexible, efficient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data

Amanda J Lea, Susan C Albert, Jenny Tung, Xiang Zhou
doi: http://dx.doi.org/10.1101/019562

Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for estimating DNA methylation levels at base-pair resolution, and for investigating the major drivers of epigenetic variation. However, modeling bisulfite sequencing data presents several challenges. Methylation levels are estimated from proportional read counts, yet coverage can vary dramatically across sites and samples. Further, methylation levels are influenced by genetic variation, and controlling for genetic covariance (e.g., kinship or population structure) is crucial for avoiding potential false positives. To address these challenges, we combine a binomial mixed model with an efficient sampling-based algorithm (MACAU) for approximate parameter estimation and p-value computation. This framework allows us to account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Furthermore, by leveraging the advantages of an auxiliary variable-based sampling algorithm and recent mixed model innovations, MACAU substantially reduces computational complexity and can thus be applied to large, genome-wide data sets. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that, compared to existing approaches, our method provides better calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at http://www.xzlab.org/software.html.

An empirical approach to demographic inference

An empirical approach to demographic inference

Peter L. Ralph
(Submitted on 21 May 2015)

Inference with population genetic data usually treats the population pedigree as a nuisance parameter, the unobserved product of a past history of random mating. However, the history of genetic relationships in a given population is a fixed, unobserved object, and so an alternative approach is to treat this network of relationships as a complex object we wish to learn about, by observing how genomes have been noisily passed down through it. This paper explores this point of view, showing how to translate questions about population genetic data into calculations with a Poisson process of mutations on all ancestral genomes. This method is applied to give a robust interpretation to the f4 statistic used to identify admixture, and to design a new statistic that measures covariances in mean times to most recent common ancestor between two pairs of sequences. The method more generally interprets population genetic statistics in terms of sums of specific functions over ancestral genomes, thereby providing concrete, broadly interpretable interpretations for these statistics. This provides a method for describing demographic history without simplified demographic models. More generally, it brings into focus the population pedigree, which is averaged over in model-based demographic inference.

Dynamics of Wolbachia pipientis gene expression across the Drosophila melanogaster life cycle

Dynamics of Wolbachia pipientis gene expression across the Drosophila melanogaster life cycle

Florence Gutzwiller, Catarina R. Carmo, Danny E. Miller, Danny W. Rice, Irene L. Newton, Luis Teixeira, Casey M. Bergman
(Submitted on 21 May 2015)

Symbiotic interactions between microbes and their multicellular hosts have manifold impacts on molecular, cellular and organismal biology. To identify candidate bacterial genes involved in maintaining endosymbiotic associations with insect hosts, we analyzed genome-wide patterns of gene expression in the alpha-proteobacteria Wolbachia pipientis across the life cycle of Drosophila melanogaster using public data from the modENCODE project that was generated in a Wolbachia-infected version of the ISO1 reference strain. We find that the majority of Wolbachia genes are expressed at detectable levels in D. melanogaster across the entire life cycle, but that only 7.8% of 1195 Wolbachia genes exhibit robust stage- or sex-specific expression differences when studied in the “holo-organism” context. Wolbachia genes that are differentially expressed during development are typically up-regulated after D. melanogaster embryogenesis, and include many bacterial membrane, secretion system and ankyrin-repeat containing proteins. Sex-biased genes are often organised as small operons of uncharacterised genes and are mainly up-regulated in adult males D. melanogaster in an age-dependent manner suggesting a potential role in cytoplasmic incompatibility. Our results indicate that large changes in Wolbachia gene expression across the Drosophila life-cycle are relatively rare when assayed across all host tissues, but that candidate genes to understand host-microbe interaction in facultative endosymbionts can be successfully identified using holo-organism expression profiling. Our work also shows that mining public gene expression data in D. melanogaster provides a rich set of resources to probe the functional basis of the Wolbachia-Drosophila symbiosis and annotate the transcriptional outputs of the Wolbachia genome.

Inference of Ancestral Recombination Graphs through Topological Data Analysis

Inference of Ancestral Recombination Graphs through Topological Data Analysis

Pablo G. Camara, Arnold J. Levine, Raul Rabadan
(Submitted on 21 May 2015)

The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relations within and across species. Recombination, reassortment, horizontal gene transfer, and species hybridization constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from tens or hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs (ARGs) represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, ARGs are computationally costly to reconstruct and usually become infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build on previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. For that aim, we extend the notion of barcodes in persistent homology, largely increasing their sensitivity to recombination, and present a new type of summary graph (topological ARG, or tARG), analogous to ARGs, that capture ensembles of minimal recombination histories. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations and horizontal evolution in finches inhabiting the Gal\’apagos Islands.