Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates

Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates
mathieu gautier
doi: http://dx.doi.org/10.1101/023721

In population genomics studies, accounting for the neutral covariance structure across population allele frequencies is critical to improve the robustness of genome-wide scan approaches. Elaborating on the BayEnv model, this study investigates several modeling extensions i) to improve the estimation accuracy of the population covariance matrix and all the related measures; ii) to identify significantly overly differentiated SNPs based on a calibration procedure of the XtX statistics; and iii) to consider alternative covariate models for analyses of association with population-specific covariables. In particular, the auxiliary variable model allows to deal with multiple testing issues and, providing the relative marker positions are available, to capture some Linkage Disequilibrium information. A comprehensive simulation study is further carried out to investigate and compare the performance of the different models. For illustration purpose, genotyping data on 18 French cattle breeds are also analyzed leading to the identification of thirteen strong signatures of selection. Among these, four (surrounding the KITLG, KIT, EDN3 and ALB genes) contained SNPs strongly associated with the piebald coloration pattern while a fifth (surrounding PLAG1) could be associated to morphological differences across the populations. Finally, analysis of Pool–Seq data from 12 populations of {\it Littorina saxatilis} living in two different ecotypes illustrates how the proposed framework might help addressing relevant ecological question in non–model species. Overall, the proposed methods define a robust Bayesian framework to characterize adaptive genetic differentiation across populations. The BayPass program implementing the different models is available at http://www1.montpellier.inra.fr/CBGP/software/baypass/.

Length Distribution of Ancestral Tracks under a General Admixture Model and Its Applications in Population History Inference

Length Distribution of Ancestral Tracks under a General Admixture Model and Its Applications in Population History Inference
Xumin Ni, Xiong Yang, Wei Guo, Kai Yuan, Ying Zhou, Zhiming Ma, Shuhua Xu
doi: http://dx.doi.org/10.1101/023390

As a chromosome is sliced into pieces by recombination after entering an admixed population, ancestral tracks of chromosomes are shortened with the pasting of generations. The length distribution of ancestral tracks reflects information of recombination and thus can be used to infer the histories of admixed populations. Previous studies have shown that inference based on ancestral tracks is powerful in recovering the histories of admixed populations. However, population histories are always complex, and previous studies only deduced the length distribution of ancestral tracks under very simple admixture models. The deduction of length distribution of ancestral tracks under a more general model will greatly elevate the power in inferring population histories. Here we first deduced the length distribution of ancestral tracks under a general model in an admixed population, and proposed general principles in parameter estimation and model selection with the length distribution. Next, we focused on studying the length distribution of ancestral tracks and its applications under three typical admixture models, which were all special cases of our general model. Extensive simulations showed that the length distribution of ancestral tracks was well predicted by our theoretical models. We further developed a new method based on the length distribution of ancestral tracks and good performance was observed when it was applied in inferring population histories under the three typical models. Notably, our method was insensitive to demographic history, sample size and threshold to discard short tracks. Finally, we applied our method in African Americans and Mexicans from the HapMap dataset, and several South Asian populations from the Human Genome Diversity Project dataset. The results showed that the histories of African Americans and Mexicans matched the historical records well, and the population admixture history of South Asians was very complex and could be traced back to around 100 generations ago.

Circlator: automated circularization of genome assemblies using long sequencing reads

Circlator: automated circularization of genome assemblies using long sequencing readsMartin Hunt, Nishadi De Silva, Thomas D Otto, Julian Parkhill, Jacqueline A Keane, Simon R Harris
doi: http://dx.doi.org/10.1101/023408
The assembly of DNA sequence data into finished genomes is undergoing a renaissance thanks to emerging technologies producing reads of tens of kilobases. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/.

Origins of de novo genes in human and chimpanzee

Origins of de novo genes in human and chimpanzee
Jorge Ruiz-Orera, Jessica Hernandez-Rodriguez, Cristina Chiva, Eduard Sabidó, Ivanela Kondova, Ronald Bontrop, Tomàs Marqués-Bonet, M. Mar Albà
(Submitted on 28 Jul 2015)

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that do not contain any gene or gene copy. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process we have performed in-depth sequencing of the transcriptomes of four mammalian species, human, chimpanzee, macaque and mouse, and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new transcriptional multiexonic events in human and/or chimpanzee that are not observed in the rest of species. By comparative genomics we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. We also find that the coding potential of the new genes is higher than expected by chance, consistent with the presence of protein-coding genes in the dataset. Using available human tissue proteomics and ribosome profiling data we identify several de novo genes with translation evidence. These genes show significant purifying selection signatures, indicating that they are probably functional. Taken together, the data supports a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.

Interpreting the dependence of mutation rates on age and time

Interpreting the dependence of mutation rates on age and timeZiyue Gao, Minyoung J. Wyman, Guy Sella, Molly Przeworski
(Submitted on 24 Jul 2015)

Mutations can arise from the chance misincorporation of nucleotides during DNA replication or from DNA lesions that are not repaired correctly. We introduce a model that relates the source of mutations to their accumulation with cell divisions, providing a framework for understanding how mutation rates depend on sex, age and absolute time. We show that the accrual of mutations should track cell divisions not only when mutations are replicative in origin but also when they are non-replicative and repaired efficiently. One implication is that the higher incidence of cancer in rapidly renewing tissues, an observation ascribed to replication errors, could instead reflect exogenous or endogenous mutagens. We further find that only mutations that arise from inefficiently repaired lesions will accrue according to absolute time; thus, in the absence of selection on mutation rates, the phylogenetic “molecular clock” should not be expected to run steadily across species.

The Nicrophorus vespilloides genome and methylome, a beetle with complex social behavior

The Nicrophorus vespilloides genome and methylome, a beetle with complex social behavior
Christopher B Cunningham, Lexiang Ji, R. Axel W Wiberg, Jennifer M Shelton, Elizabeth C McKinney, Darren J Parker, Richard B Meagher, Kyle M Benowitz, Eileen M Roy-Zokan, Michael G Ritchie, Susan J Brown, Robert J Schmitz, Allen J Moore
doi: http://dx.doi.org/10.1101/023093

Testing for conserved and novel mechanisms underlying phenotypic evolution requires a diversity of genomes available for comparison spanning multiple independent lineages. For example, complex social behavior in insects has been investigated primarily with eusocial lineages, nearly all of which are Hymenoptera. If conserved genomic influences on sociality do exist, we need data from a wider range of taxa that also vary in their levels of sociality. Here we present information on the genome of the subsocial beetle Nicrophorus vespilloides, a species long used to investigate evolutionary questions of complex social behavior. We used this genome to address two questions. First, does life history predict overlap in gene models more strongly than phylogenetic groupings? Second, like other insects with highly developed social behavior but unlike other beetles, does N. vespilloides have DNA methylation? We found the overlap in gene models was similar between N. vespilloides and all other insect groups regardless of life history. Unlike previous studies of beetles, we found strong evidence of DNA methylation, which allows this species to be used to address questions about the potential role of methylation in social behavior. The addition of this genome adds a coleopteran resource to answer questions about the evolution and mechanistic basis of sociality.

Stable recombination hotspots in birds

Stable recombination hotspots in birds
Sonal Singhal, Ellen Leffler, Keerthi Sannareddy, Isaac Turner, Oliver Venn, Daniel Hooper, Alva Strand, Qiye Li, Brian Raney, Christopher Balakrishnan, Simon Griffith, Gil McVean, Molly Przeworski
doi: http://dx.doi.org/10.1101/023101

Although the DNA-binding protein PRDM9 plays a critical role in the specification of meiotic recombination hotspots in mice and apes, it appears to be absent from many vertebrate species, including birds. To learn about the determinants of fine-scale recombination rates and their evolution in natural populations lacking PRDM9, we inferred fine-scale recombination maps from population resequencing data for two bird species, the zebra finch Taeniopygia guttata, and the long-tailed finch, Poephila acuticauda, whose divergence is on par with that between human and chimpanzee. We find that both bird species have hotspots, and these are enriched near CpG islands and transcription start sites. In sharp contrast to what is seen in mice and apes, the hotspots are largely shared between the two species, with indirect evidence of conservation extending across bird species tens of millions of years diverged. These observations link the evolution of hotspots to their genetic architecture, suggesting that in the absence of PRDM9 binding specificity, accessibility of the genome to the cellular recombination machinery, particularly around functional genomic elements, both enables increased recombination and constrains its evolution.

Conflict and cooperation in eukaryogenesis: implications for the timing of endosymbiosis and the evolution of sex

Conflict and cooperation in eukaryogenesis: implications for the timing of endosymbiosis and the evolution of sex
Arunas L Radzvilavicius, Neil W Blackstone
doi: http://dx.doi.org/10.1101/023077

The complex eukaryotic cell is a result of an ancient endosymbiosis and one of the major evolutionary transitions. The timing of key eukaryotic innovations relative to the acquisition of mitochondria remains subject to considerable debate, yet the evolutionary process itself might constrain the order of these events. Endosymbiosis entailed levels-of-selection conflicts, and mechanisms of conflict mediation had to evolve for eukaryogenesis to proceed. The initial mechanisms of conflict mediation were based on the pathways inherited from prokaryotic symbionts and led to metabolic homeostasis in the eukaryotic cell, while later mechanisms (e.g., mitochondrial gene transfer) contributed to the expansion of the eukaryotic genome. Perhaps the greatest opportunity for conflict arose with the emergence of sex involving whole-cell fusion. While early evolution of cell fusion may have affected symbiont acquisition, sex together with the competitive symbiont behaviour would have destabilised the emerging higher-level unit. Cytoplasmic mixing, on the other hand, would have been beneficial for selfish endosymbionts, capable of using their own metabolism to manipulate the life history of the host. Given the results of our mathematical modelling, we argue that sex represents a rather late proto- eukaryotic innovation, allowing for the growth of the chimeric nucleus and contributing to the successful completion of the evolutionary transition.

Morphological data is lacking for living mammals

Morphological data is lacking for living mammals
Thomas Guillerme, Natalie Cooper
doi: http://dx.doi.org/10.1101/022970

Combining living and fossil in the same analysis data is crucial for studying changes in global biodiversity through time. One method allowing to combine this data is the Total Evidence method that uses both molecular data for living species and morphological data for both living and fossil species. With this method, a good overlap of morphological data between living and fossil taxa is crucial for accurately inferring the phylogenies’ topology. Since the advent of DNA, molecular data has become easily and widely available. However, despite two centuries of morphological studies, scientists using and generating such data mainly focus on palaeontological data. Therefore, there is a gap in our knowledge of neontological morphological data even in well studied groups such as mammals. In this study, we quantify the morphological data available for living mammal taxa. We then analyse the structure of the available data by testing if it is clustered or evenly spread across the phylogeny. We found that 78% of mammalian orders have less than 25% data available at the species level. However, we found that the available is often randomly distributed among these orders apart from six of them where the data is clustered

Non-paradoxical evolutionary stability of the recombination initiation landscape in Saccharomycetes

Non-paradoxical evolutionary stability of the recombination initiation landscape in SaccharomycetesIsabel Lam, Scott Keeney
doi: http://dx.doi.org/10.1101/023176

The nonrandom distribution of meiotic recombination shapes heredity and genetic diversification. A widely held view is that individual hotspots — favored sites of recombination initiation — are always ephemeral because they evolve rapidly toward extinction. An alternative view, often ignored or dismissed as implausible, predicts conservation of the positions of hotspots if they are chromosomal features under selective constraint, such as gene promoters. Here we empirically test opposite predictions of these theories by comparing genome-wide maps of meiotic recombination initiation from widely divergent species in the Saccharomyces clade. We find that the frequent overlap of hotspots with promoters is true of the species tested and, consequently, hotspot positions are well conserved. Remarkably, however, the relative strength of individual hotspots is also highly conserved, as are larger-scale features of the distribution of recombination initiation. This stability, not predicted by prior models, suggests that the particular shape of the yeast recombination landscape is adaptive, and helps in understanding evolutionary dynamics of recombination in other species.