Dynamics of Wolbachia pipientis gene expression across the Drosophila melanogaster life cycle

Dynamics of Wolbachia pipientis gene expression across the Drosophila melanogaster life cycle

Florence Gutzwiller, Catarina R. Carmo, Danny E. Miller, Danny W. Rice, Irene L. Newton, Luis Teixeira, Casey M. Bergman
(Submitted on 21 May 2015)

Symbiotic interactions between microbes and their multicellular hosts have manifold impacts on molecular, cellular and organismal biology. To identify candidate bacterial genes involved in maintaining endosymbiotic associations with insect hosts, we analyzed genome-wide patterns of gene expression in the alpha-proteobacteria Wolbachia pipientis across the life cycle of Drosophila melanogaster using public data from the modENCODE project that was generated in a Wolbachia-infected version of the ISO1 reference strain. We find that the majority of Wolbachia genes are expressed at detectable levels in D. melanogaster across the entire life cycle, but that only 7.8% of 1195 Wolbachia genes exhibit robust stage- or sex-specific expression differences when studied in the “holo-organism” context. Wolbachia genes that are differentially expressed during development are typically up-regulated after D. melanogaster embryogenesis, and include many bacterial membrane, secretion system and ankyrin-repeat containing proteins. Sex-biased genes are often organised as small operons of uncharacterised genes and are mainly up-regulated in adult males D. melanogaster in an age-dependent manner suggesting a potential role in cytoplasmic incompatibility. Our results indicate that large changes in Wolbachia gene expression across the Drosophila life-cycle are relatively rare when assayed across all host tissues, but that candidate genes to understand host-microbe interaction in facultative endosymbionts can be successfully identified using holo-organism expression profiling. Our work also shows that mining public gene expression data in D. melanogaster provides a rich set of resources to probe the functional basis of the Wolbachia-Drosophila symbiosis and annotate the transcriptional outputs of the Wolbachia genome.

Inference of Ancestral Recombination Graphs through Topological Data Analysis

Inference of Ancestral Recombination Graphs through Topological Data Analysis

Pablo G. Camara, Arnold J. Levine, Raul Rabadan
(Submitted on 21 May 2015)

The recent explosion of genomic data has underscored the need for interpretable and comprehensive analyses that can capture complex phylogenetic relations within and across species. Recombination, reassortment, horizontal gene transfer, and species hybridization constitute examples of pervasive biological phenomena that cannot be captured by tree-like representations. Starting from tens or hundreds of genomes, we are interested in the reconstruction of potential evolutionary histories leading to the observed data. Ancestral recombination graphs (ARGs) represent potential histories that explicitly accommodate recombination and mutation events across orthologous genomes. However, ARGs are computationally costly to reconstruct and usually become infeasible for more than few tens of genomes. Recently, Topological Data Analysis (TDA) methods have been proposed as robust and scalable methods that can capture the genetic scale and frequency of recombination. We build on previous TDA developments for detecting and quantifying recombination, and present a novel framework that can be applied to hundreds of genomes and can be interpreted in terms of minimal histories of mutation and recombination events, quantifying the scales and identifying the genomic locations of recombinations. For that aim, we extend the notion of barcodes in persistent homology, largely increasing their sensitivity to recombination, and present a new type of summary graph (topological ARG, or tARG), analogous to ARGs, that capture ensembles of minimal recombination histories. We implement this framework in a software package, called TARGet, and apply it to several examples, including small migration between different populations and horizontal evolution in finches inhabiting the Gal\’apagos Islands.

Author post: Coalescent times and patterns of genetic diversity in species with facultative sex

This guest post is by Matthew Hartfield (@mathyhartfield) on “Coalescent times and patterns of genetic diversity in species with facultative sex”.

Our paper “Coalescent times and patterns of genetic diversity in species with facultative sex”, in which we investigate the genealogies of facultative sexuals, is now available from the biorxiv.

Most evolutionary biologists are obsessed with sex. Explaining why organisms reproduce sexually by combining genetic material is a tough problem. The main issue lies with the fact that asexuality (reproduction via clonality) should be able to outcompete sexuals due to sheer weight of numbers. Various theories have been put forward to explain why sex is so widespread. The majority of these revolve around the idea that exchanging genetic material enables the fittest possible genotype to be created, while that of asexuals should degrade over time.

While such theories are ubiquitous, data to test them has been scarce. Recent years have seen a boom in exploring the evolution of sex experimentally using facultative sexual organisms: species that can switch between sexual and asexual reproduction. Such experiments have demonstrated how sexual reproduction can evolve when exposed to stressful environments, or when moving between environmentally different areas. Yet major questions remain regarding what the underlying genetic causes of these transitions are. In addition, there are plenty of organisms that undergo ‘cryptic’ sex, which cannot be observed directly but can with genomic sequence analyses.

Coalescent models are important for analysing genomic data. These tools determine the relationship between neutral markers, and hence make predictions on how genetic diversity is affected depending on environmental structuring, localised natural selection, or other effects. However, classic models cannot be applied to systems with partial asexuality, as they assume the population reproduces entirely sexually.

We worked on introducing partial rates of sex into these models. In the simplest case (one population with a fixed rate of sex), we recovered a classic prediction that extensive divergence between alleles at the same site arises. This phenomenon occurs since lack of sex keeps the two alleles distinct over evolutionary time; only a rare bout of sex has any chance of creating the segregation needed for them to be descended from the same allele.

A schematic of Allelic Sequence Divergence (ASD) in asexuals: Xs are distinct mutations at each neutral site.

A schematic of Allelic Sequence Divergence (ASD) in asexuals: Xs are distinct mutations at each neutral site.

After recovering this familiar result, we worked to extend coalescent theory in partial asexuals to include various other biological phenomena. Two effects we looked at were gene conversion, and heterogeneity in sex rates that change over time or space.

Gene conversion, where one DNA sequence replaces part of a homologous chromosome, is usually regarded as being of minor evolutionary importance. Yet numerous studies of facultative sexuals often observe it as a common force, especially in species not exhibiting allelic sequence divergence (ASD). Could the two be related? Excitingly, we found that low rates of gene conversion become important in organisms with low rates of sex. That is, once sex becomes so rare as to caused ASD, small rates of gene conversion can then reverse the process, homogenizing alleles again. Rather than having higher diversity than otherwise similar sexual populations as expected with ASD (in the absence of gene conversion), asexual populations will have less diversity than comparable sexual populations if gene conversion is not too low.

It is also known that many organisms change their rates of sex over time or location, which can be triggered by environmental cues or organismal stress. By investigating such variation in the rate of sex, the analysis elegantly shows how even a short burst to obligate sex (over tens of generations) is enough to jumble genomes in the population, hence giving the same outcome as long-term obligate sex. If rates of sex are also different in separate geographical locals, then these differences can be detected if there is little gene flow between regions. Otherwise, both areas display intermediate rates of sex.

Coalescent tools are popular since they can be used to simulate complex evolutionary outcomes, which are then tested against genomic data. We used the mathematical analyses to outline a coalescent algorithm to account for partial rates of sex, and predict genetic diversity, under numerous scenarios. The code is available online (http://github.com/MattHartfield/FacSexCoalescent) for others to use.

These are exciting times for population genetics and evolution, with cheaper sequencing costs making it possible to wade through the genomes of more individuals than before. Yet accurately exploring the genetic landscape requires the creation of mathematical tools that accounts for organismal life history. These results will provide the first of many buildings blocks to determine the effects of selection and the environment on the evolution of facultative sexuals. They might eventually reveal why sex is so prevalent in nature.

Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum

Fundamental limits on the accuracy of demographic inference based on the sample frequency spectrum

Jonathan Terhorst, Yun S. Song
(Submitted on 16 May 2015)

The sample frequency spectrum (SFS) of DNA sequences from a collection of individuals is a summary statistic which is commonly used for parametric inference in population genetics. Despite the popularity of SFS-based inference methods, currently little is known about the information-theoretic limit on the estimation accuracy as a function of sample size. Here, we show that using the SFS to estimate the size history of a population has a minimax error of at least O(1/logs), where s is the number of independent segregating sites used in the analysis. This rate is exponentially worse than known convergence rates for many classical estimation problems in statistics. Another surprising aspect of our theoretical bound is that it does not depend on the dimension of the SFS, which is related to the number of sampled individuals. This means that, for a fixed number s of segregating sites considered, using more individuals does not help to reduce the minimax error bound. Our result pertains to populations that have experienced a bottleneck, and we argue that it can be expected to apply to many populations in nature.

Worldwide population structure, long term demography, and local adaptation of Helicobacter pylori

Worldwide population structure, long term demography, and local adaptation of Helicobacter pylori

Valeria Montano, Xavier Didelot, Matthieu Foll, Bodo Linz, Richard Reinhardt, Sebastian Suerbaum, Yoshan Moodley, Jeffrey David Jensen
doi: http://dx.doi.org/10.1101/019430

Helicobacter pylori is an important human pathogen associated with serious gastric diseases. Owing to its medical importance and close relationship with its human host, understanding genomic patterns of global and local adaptation in H. pylori may be of particular significance for both clinical and evolutionary studies. Here we present the first such whole-genome analysis of 60 globally distributed strains, from which we inferred worldwide population structure and demographic history and shed light on interesting global and local events of positive selection, with particular emphasis on the evolution of San-associated lineages. Our results indicate a more ancient origin for the association of humans and H. pylori than previously thought. We identify several important perspectives for future clinical research on candidate selected regions that include both previously characterized genes (e.g. transcription elongation factor NusA and tumor Necrosis Factor Alpha-Inducing Protein Tipα) and hitherto unknown functional genes.

Character trees from transcriptome data: origin and individuation of morphological characters and the so-called “species signal”

Character trees from transcriptome data: origin and individuation of morphological characters and the so-called “species signal”

Jacob Musser, Gunter Wagner
doi: http://dx.doi.org/10.1101/019380

We elaborate a framework for investigating the evolutionary history of morphological characters. We argue that morphological character trees generated from transcriptomes provide a useful tool for identifying causal gene expression differences underlying the development and evolution of morphological characters. They also enable rigorous testing of different models of morphological character evolution and origination, including the hypothesis that characters originate via divergence of repeated ancestral characters. Finally, morphological character trees provide evidence that character transcriptomes undergo concerted evolution. We argue that concerted evolution of transcriptomes can explain the so-called “species-specific clustering” found in several recent comparative transcriptome studies. The species signal is the phenomenon that transcriptomes cluster by species rather than character type, even though the characters are older than the respective species. We suggest that concerted gene expression evolution results from mutations that alter gene regulatory network interactions shared by the characters under comparison. Thus, character trees generated from transcriptomes allow us to investigate the variational independence, or individuation, of morphological characters at the level of genetic programs.

Real-time strain typing and analysis of antibiotic resistance potential using Nanopore MinION sequencing

Real-time strain typing and analysis of antibiotic resistance potential using Nanopore MinION sequencing

Minh Duc Cao, Devika Ganesamoorthy, Alysha Elliott, Huihui Zhang, Matthew Cooper, Lachlan Coin
doi: http://dx.doi.org/10.1101/019356

Clinical pathogen sequencing has significant potential to drive informed treatment of patients with unknown bacterial infection. However, the lack of rapid sequencing technologies with concomitant analysis has impeded clinical adoption in infection diagnosis. Here we demonstrate that commercially-available Nanopore sequencing devices can identify bacterial species and strain information with less than one hour of sequencing time, initial drug-resistance profiles within 2 hours, and a complete resistance profile within 12 hours. We anticipate these devices and associated analysis methods may become useful clinical tools to guide appropriate therapy in time-critical clinical presentations such as bacteraemia and sepsis.

The Multi-allelic Genetic Architecture of a Variance-heterogeneity Locus for Molybdenum Accumulation Acts as a Source of Unexplained Additive Genetic Variance

The Multi-allelic Genetic Architecture of a Variance-heterogeneity Locus for Molybdenum Accumulation Acts as a Source of Unexplained Additive Genetic Variance

Simon K G Forsberg, Matthew E Andreatta, Xin-Yuan Huang, John Danku, David E Salt, Örjan Carlborg
doi: http://dx.doi.org/10.1101/019323

Most biological traits are regulated by both genetic and environmental factors. Individual loci contributing to the phenotypic diversity in a population are generally identified by their contributions to the trait mean. Genome-wide association (GWA) analyses can also detect loci based on variance differences between genotypes and several hypotheses have been proposed regarding the possible genetic mechanisms leading to such signals. Little is, however, known about what causes them and whether this genetic variance-heterogeneity reflects mechanisms of importance in natural populations. Previously, we identified a variance-heterogeneity GWA (vGWA) signal for leaf molybdenum concentrations in Arabidopsis thaliana. Here, fine-mapping of this association to a ~78 kb Linkage Disequilibrium (LD)-block reveals that it emerges from the independent effects of three genetic polymorphisms on the high-variance associated version of this LD-block. By revealing the genetic architecture underlying this vGWA signal, we uncovered the molecular source of a significant amount of hidden additive genetic variation (“missing heritability”). Two of the three polymorphisms on the high-variance LD-block are promoter variants for Molybdate transporter 1 (MOT1), and the third a variant located ~25 kb downstream of this gene. A fourth independent association was also detected ~600 kb upstream of the LD-block. Testing of T-DNA knockout alleles for genes in the associated regions suggest AT2G25660 (unknown function) and AT2G26975 (Copper Transporter 6; COPT6) as the strongest candidates for the associations outside MOT1. Our results show that multi-allelic genetic architectures within a single LD-block can lead to a variance-heterogeneity between genotypes in natural populations. Further they provide novel insights into the genetic regulation of ion homeostasis in A. thaliana, and empirically confirm that variance-heterogeneity based GWA methods are a valuable tool to detect novel associations of biological importance in natural populations.

ReproPhylo: An Environment for Reproducible Phylogenomics

ReproPhylo: An Environment for Reproducible Phylogenomics

Amir Szitenberg, Max John, Mark L Blaxter, David H Lunt
doi: http://dx.doi.org/10.1101/019349

The reproducibility of experiments is key to the scientific process, and particularly necessary for accurate reporting of analyses in data-rich fields such as phylogenomics. We present ReproPhylo, a phylogenomic analysis environment developed to ensure experimental reproducibility, to facilitate the handling of large-scale data, and to assist methodological experimentation. Reproducibility, and instantaneous repeatability, is built in to the ReproPhylo system, and does not require user intervention or configuration because it stores the experimental workflow as a single, serialized Python object containing explicit provenance and environment information. This ‘single file’ approach ensures the persistence of provenance across iterations of the analysis, with changes automatically managed by the version control program Git. ReproPhylo produces an extensive human-readable report, and generates a comprehensive experimental archive file, both of which are suitable for submission with publications. The system facilitates thorough experimental exploration of both parameters and data. ReproPhylo is a platform independent CC0 python module, and is easily installed as a Docker image, with an Jupyter GUI, or as a slimmer version in a Galaxy distribution.

FIQT: a simple, powerful method to accurately estimate effect sizes in genome scans

FIQT: a simple, powerful method to accurately estimate effect sizes in genome scans

Tim B Bigdeli, Donghyung Lee, Brien P Riley, Vladimir I Vladimirov, Ayman H Fanous, Kenneth S Kendler, Silviu-Alin Bacanu
doi: http://dx.doi.org/10.1101/019299

Genome scans, including both genome-wide association studies and deep sequencing, continue to discover a growing number of significant association signals for various traits. However, often variants meeting genome-wide significance criteria explain far less of the overall trait variance than “sub-threshold” association signals. To extract these sub-threshold signals, there is a need for methods which accurately estimate the mean of all (normally-distributed) test-statistics from a genome scan (i.e., Z-scores). This is currently achieved by the difficult procedures of adjusting all Z-score (χ_1^2) statistics for “winner’s curse” (multiple testing). Given that multiple testing adjustments are much simpler for p-values, we propose a method for estimating Z-scores means by i) first adjusting their p-values for multiple testing and then ii) transforming the adjusted p-values to upper tail Z-scores with the sign of the original statistics. Because a False Discovery Rate (FDR) procedure is used for multiple testing adjustment, we denote this method FDR Inverse Quantile Transformation (FIQT). When compared to competitors, e.g. Empirical Bayes (including proposed improvements), FIQT is more i) accurate and ii) computationally efficient by orders of magnitude. Its accuracy advantage is substantial at larger sample sizes and/or moderate numbers of association signals. Practical application of FIQT to Z-scores from the first Psychiatric Genetic Consortium (PGC) schizophrenia predicts a non-trivial fraction of the significant signal regions from the subsequent published PGC schizophrenia studies. Finally, we suggest that FIQT might be i) used to improve subject level risk prediction and ii) further improved by modelling the noncentrality of χ_1^2 statistics.