Phylogenetic effective sample size

Phylogenetic effective sample size

Krzysztof Bartoszek
doi: http://dx.doi.org/10.1101/023242

In this paper I address the question – how large is a phylogenetic sample? I propose a definition of a phylogenetic effective sample size for Brownian motion and Ornstein-Uhlenbeck processes – the regression effective sample size. I discuss how mutual information can be used to define an effective sample size in the non-normal process case and compare these two definitions to an already present concept of effective sample size (the mean effective sample size). Through a simulation study I find that the AICc is robust if one corrects for the number of species or effective number of species. Lastly I discuss how the concept of the phylogenetic effective sample size can be useful for biodiversity quantification, identification of interesting clades and deciding on the importance of phylogenetic correlations

A probabilistic method for identifying sex-linked genes using RNA-seq-derived genotyping data

A probabilistic method for identifying sex-linked genes using RNA-seq-derived genotyping data

Aline Muyle, Jos Käfer, Niklaus Zemp, Sylvain Mousset, Franck Picard, Gabriel AB Marais
doi: http://dx.doi.org/10.1101/023358

The genetic basis of sex determination remains unknown for the vast majority of organisms with separate sexes. A key question is whether a species has sex chromosomes (SC). SC presence indicates genetic sex determination, and their sequencing may help identifying the sex-determining genes and understanding the molecular mechanisms of sex determination. Identifying SC, especially homomorphic SC, can be difficult. Sequencing SC is also very challenging, in particular the repeat-rich non-recombining regions. A novel approach for identifying sex-linked genes and SC consisting of using RNA-seq to genotype male and female individuals and study sex-linkage has recently been proposed. This approach entails a modest sequencing effort and does not require prior genomic or genetic resources, and is thus particularly suited to study non-model organisms. Applying this approach to many organisms is, however, difficult due to the lack of an appropriate statistically-grounded pipeline to analyse the data. Here we propose a model-based method to infer sex-linkage using a maximum likelihood framework and genotyping data from a full-sib family, which can be obtained for most organisms that can be grown in the lab and for economically important animals/plants. Our method works on any type of SC (XY, ZW, UV) and has been embedded in a pipeline that includes a genotyper specifically developed for RNA-seq data. Validation on empirical and simulated data indicates that our pipeline is particularly relevant to study SC of recent or intermediate age but can return useful information in old systems as well; it is available as a Galaxy workflow.

Interpreting the dependence of mutation rates on age and time

Interpreting the dependence of mutation rates on age and timeZiyue Gao, Minyoung J. Wyman, Guy Sella, Molly Przeworski
(Submitted on 24 Jul 2015)

Mutations can arise from the chance misincorporation of nucleotides during DNA replication or from DNA lesions that are not repaired correctly. We introduce a model that relates the source of mutations to their accumulation with cell divisions, providing a framework for understanding how mutation rates depend on sex, age and absolute time. We show that the accrual of mutations should track cell divisions not only when mutations are replicative in origin but also when they are non-replicative and repaired efficiently. One implication is that the higher incidence of cancer in rapidly renewing tissues, an observation ascribed to replication errors, could instead reflect exogenous or endogenous mutagens. We further find that only mutations that arise from inefficiently repaired lesions will accrue according to absolute time; thus, in the absence of selection on mutation rates, the phylogenetic “molecular clock” should not be expected to run steadily across species.

The Nicrophorus vespilloides genome and methylome, a beetle with complex social behavior

The Nicrophorus vespilloides genome and methylome, a beetle with complex social behavior
Christopher B Cunningham, Lexiang Ji, R. Axel W Wiberg, Jennifer M Shelton, Elizabeth C McKinney, Darren J Parker, Richard B Meagher, Kyle M Benowitz, Eileen M Roy-Zokan, Michael G Ritchie, Susan J Brown, Robert J Schmitz, Allen J Moore
doi: http://dx.doi.org/10.1101/023093

Testing for conserved and novel mechanisms underlying phenotypic evolution requires a diversity of genomes available for comparison spanning multiple independent lineages. For example, complex social behavior in insects has been investigated primarily with eusocial lineages, nearly all of which are Hymenoptera. If conserved genomic influences on sociality do exist, we need data from a wider range of taxa that also vary in their levels of sociality. Here we present information on the genome of the subsocial beetle Nicrophorus vespilloides, a species long used to investigate evolutionary questions of complex social behavior. We used this genome to address two questions. First, does life history predict overlap in gene models more strongly than phylogenetic groupings? Second, like other insects with highly developed social behavior but unlike other beetles, does N. vespilloides have DNA methylation? We found the overlap in gene models was similar between N. vespilloides and all other insect groups regardless of life history. Unlike previous studies of beetles, we found strong evidence of DNA methylation, which allows this species to be used to address questions about the potential role of methylation in social behavior. The addition of this genome adds a coleopteran resource to answer questions about the evolution and mechanistic basis of sociality.

Stable recombination hotspots in birds

Stable recombination hotspots in birds
Sonal Singhal, Ellen Leffler, Keerthi Sannareddy, Isaac Turner, Oliver Venn, Daniel Hooper, Alva Strand, Qiye Li, Brian Raney, Christopher Balakrishnan, Simon Griffith, Gil McVean, Molly Przeworski
doi: http://dx.doi.org/10.1101/023101

Although the DNA-binding protein PRDM9 plays a critical role in the specification of meiotic recombination hotspots in mice and apes, it appears to be absent from many vertebrate species, including birds. To learn about the determinants of fine-scale recombination rates and their evolution in natural populations lacking PRDM9, we inferred fine-scale recombination maps from population resequencing data for two bird species, the zebra finch Taeniopygia guttata, and the long-tailed finch, Poephila acuticauda, whose divergence is on par with that between human and chimpanzee. We find that both bird species have hotspots, and these are enriched near CpG islands and transcription start sites. In sharp contrast to what is seen in mice and apes, the hotspots are largely shared between the two species, with indirect evidence of conservation extending across bird species tens of millions of years diverged. These observations link the evolution of hotspots to their genetic architecture, suggesting that in the absence of PRDM9 binding specificity, accessibility of the genome to the cellular recombination machinery, particularly around functional genomic elements, both enables increased recombination and constrains its evolution.

Conflict and cooperation in eukaryogenesis: implications for the timing of endosymbiosis and the evolution of sex

Conflict and cooperation in eukaryogenesis: implications for the timing of endosymbiosis and the evolution of sex
Arunas L Radzvilavicius, Neil W Blackstone
doi: http://dx.doi.org/10.1101/023077

The complex eukaryotic cell is a result of an ancient endosymbiosis and one of the major evolutionary transitions. The timing of key eukaryotic innovations relative to the acquisition of mitochondria remains subject to considerable debate, yet the evolutionary process itself might constrain the order of these events. Endosymbiosis entailed levels-of-selection conflicts, and mechanisms of conflict mediation had to evolve for eukaryogenesis to proceed. The initial mechanisms of conflict mediation were based on the pathways inherited from prokaryotic symbionts and led to metabolic homeostasis in the eukaryotic cell, while later mechanisms (e.g., mitochondrial gene transfer) contributed to the expansion of the eukaryotic genome. Perhaps the greatest opportunity for conflict arose with the emergence of sex involving whole-cell fusion. While early evolution of cell fusion may have affected symbiont acquisition, sex together with the competitive symbiont behaviour would have destabilised the emerging higher-level unit. Cytoplasmic mixing, on the other hand, would have been beneficial for selfish endosymbionts, capable of using their own metabolism to manipulate the life history of the host. Given the results of our mathematical modelling, we argue that sex represents a rather late proto- eukaryotic innovation, allowing for the growth of the chimeric nucleus and contributing to the successful completion of the evolutionary transition.

Morphological data is lacking for living mammals

Morphological data is lacking for living mammals
Thomas Guillerme, Natalie Cooper
doi: http://dx.doi.org/10.1101/022970

Combining living and fossil in the same analysis data is crucial for studying changes in global biodiversity through time. One method allowing to combine this data is the Total Evidence method that uses both molecular data for living species and morphological data for both living and fossil species. With this method, a good overlap of morphological data between living and fossil taxa is crucial for accurately inferring the phylogenies’ topology. Since the advent of DNA, molecular data has become easily and widely available. However, despite two centuries of morphological studies, scientists using and generating such data mainly focus on palaeontological data. Therefore, there is a gap in our knowledge of neontological morphological data even in well studied groups such as mammals. In this study, we quantify the morphological data available for living mammal taxa. We then analyse the structure of the available data by testing if it is clustered or evenly spread across the phylogeny. We found that 78% of mammalian orders have less than 25% data available at the species level. However, we found that the available is often randomly distributed among these orders apart from six of them where the data is clustered