Genomic DNA transposition induced by human PGBD5

Genomic DNA transposition induced by human PGBD5

Anton Henssen, Elizabeth Henaff, Eileen Jiang, Amy R Eisenberg, Julianne R Carson, Camila Villasante, Mondira Ray, Eric Still, Melissa Burns, Jorge Gandara, Cedric Feschotte, Christopher E. Mason, Alex Kentsis
doi: http://dx.doi.org/10.1101/023887

Transposons are mobile genetic elements that are found in nearly all organisms, including humans. Mobilization of DNA transposons by transposase enzymes can cause genomic rearrangements, but our knowledge of human genes derived from transposases is limited. Here, we find that the protein encoded by human PGBD5, the most evolutionarily conserved transposable element-derived gene in chordates, can induce stereotypical cut-and-paste DNA transposition in human cells. Genomic integration activity of PGBD5 requires distinct aspartic acid residues in its transposase domain, and specific DNA sequences with inverted terminal repeats with similarity to piggyBac transposons. DNA transposition catalyzed by PGBD5 in human cells occurs genome-wide, with precise transposon excision and preference for insertion at TTAA sites. The apparent conservation of DNA transposition activity by PGBD5 raises the possibility that genomic remodeling may contribute to its biological function.

Species Tree Estimation from Genome-wide Data with Guenomu

Species Tree Estimation from Genome-wide Data with Guenomu

Leonardo de Oliveira Martins, David Posada
doi: http://dx.doi.org/10.1101/023861

The history of particular genes and that of the species that carry them can be different due to different reasons. In particular, gene trees and species trees can truly differ due to well-known evolutionary processes like gene duplication and loss, lateral gene transfer or incomplete lineage sorting. Different species tree reconstruction methods have been developed to take this incongruence into account, which can be divided grossly into supertree and supermatrix approaches. Here, we introduce a new Bayesian hierarchical model that we have recently developed and implemented in the program Guenomu, that considers multiple sources of gene tree/species tree disagreement. Guenomu takes as input the posterior distributions of unrooted gene tree topologies for multiple gene families, in order to estimate the posterior distribution of rooted species tree topologies.

Population genomic scans reveal novel genes underlie convergent flowering time evolution in the introduced range of Arabidopsis thaliana

Population genomic scans reveal novel genes underlie convergent flowering time evolution in the introduced range of Arabidopsis thaliana

Billie Gould, John R Stinchcombe
doi: http://dx.doi.org/10.1101/023788

A long-standing question in evolutionary biology is whether the evolution of convergent phenotypes results from selection on the same heritable genetic components. Using whole genome sequencing and genome scans, we tested whether the evolution of parallel longitudinal flowering time clines in the native and introduced ranges of Arabidopsis thaliana has a similar genetic basis. We found that common variants of large effect on flowering time in the native range do not appear to have been under recent strong selection in the introduced range. Genes in regions of the genome that are under selection for flowering time are also not enriched for functions related to development or environmental sensing. We instead identified a set of 53 new candidate genes putatively linked to the evolution of flowering time in the species introduced range. A high degree of conditional neutrality of flowering time variants between the native and introduced range may preclude parallel evolution at the level of genes. Overall, neither gene pleiotropy nor available standing genetic variation appears to have restricted the evolution of flowering time in the introduced range to high frequency variants from the native range or to known flowering time pathway genes.

Detection of adaptive shifts on phylogenies using shifted stochastic processes on a tree

Detection of adaptive shifts on phylogenies using shifted stochastic processes on a tree

Paul Bastide, Mahendra Mariadassou, Stéphane Robin
doi: http://dx.doi.org/10.1101/023804

Comparative and evolutive ecologists are interested in the distribution of quantitative traits among related species. The classical framework for these distributions consists of a random process running along the branches of a phylogenetic tree relating the species. We consider shifts in the process parameters, which reveal fast adaptation to changes of ecological niches. We show that models with shifts are not identifiable in general. Constraining the models to be parsimonious in the number of shifts partially alleviates the problem but several evolutionary scenarios can still provide the same joint distribution for the extant species. We provide a recursive algorithm to enumerate all the equivalent scenarios and to count the effectively different scenarios. We introduce an incomplete-data framework and develop a maximum likelihood estimation procedure based on the EM algorithm. Finally, we propose a model selection procedure, based on the cardinal of effective scenarios, to estimate the number of shifts and prove an oracle inequality.

Most viewed on Haldane’s Sieve: July 2015

The most viewed posts on Haldane’s Sieve this month were:

The constant philopater hypothesis: a new life history invariant for dispersal evolution

The constant philopater hypothesis: a new life history invariant for dispersal evolution
António M. M. Rodrigues, Andy Gardner
doi: http://dx.doi.org/10.1101/023655
Life history invariants play a pivotal role in the study of social adaptation: they provide theoretical hypotheses that can be empirically tested, and benchmark frameworks against which new theoretical developments can be understood. Here we derive a novel invariant for dispersal evolution: the “constant philopater hypothesis” (CPH). Specifically, we find that, irrespective of variation in maternal fecundity, all mothers are favoured to produce exactly the same number of philopatric offspring, with high-fecundity mothers investing proportionally more, and low-fecundity mothers investing proportionally less, into dispersing offspring. This result holds for female and male dispersal, under haploid, diploid and haplodiploid modes of inheritance, irrespective of the sex ratio, local resource availability, and whether mother or offspring controls the latter’s dispersal propensity. We explore the implications of this result for evolutionary conflicts of interest – and the exchange and withholding of contextual information – both within and between families, and we show that the CPH is the fundamental invariant that underpins and explains a wider family of invariance relationships that emerge from the study of social evolution.

Multilocus sex determination revealed in two populations of gynodioecious wild strawberry, Fragaria vesca subsp. bracteata

Multilocus sex determination revealed in two populations of gynodioecious wild strawberry, Fragaria vesca subsp. bracteata
Tia-Lynn Ashman, Jacob Tennessen, Rebecca Dalton, Rajanikanth Govindarajulu, Mathew Koski, Aaron Liston
doi: http://dx.doi.org/10.1101/023713

Gynodioecy, the coexistence of females and hermaphrodites, occurs in 20% of angiosperm families and often enables transitions between hermaphroditism and dioecy. Clarifying mechanisms of sex determination in gynodioecious species can thus illuminate sexual system evolution. Genetic determination of gynodioecy, however, can be complex and is not fully characterized in any wild species. We used targeted sequence capture to genetically map a novel nuclear contributor to male sterility in a self-pollinated hermaphrodite of Fragaria vesca subsp. bracteata from the southern portion of its range. To understand its interaction with another identified locus and possibly additional loci, we performed crosses within and between two populations separated by 2000 km, phenotyped the progeny and sequenced candidate markers at both sex-determining loci. The newly mapped locus contains a high density of pentatricopeptide repeat genes, a class commonly involved in restoration of fertility caused by cytoplasmic male sterility. Examination of all crosses revealed three unlinked epistatically interacting loci that determine sexual phenotype and vary in frequency between populations. Fragaria vesca subsp. bracteata represents the first wild gynodioecious species with genomic evidence of both cytoplasmic and nuclear genes in sex determination. We propose a model for the interactions between these loci and new hypotheses for the evolution of sex determining chromosomes in the subdioecious and dioecious Fragaria.

Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines

Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines
John G. Cleary, Ross Braithwaite, Kurt Gaastra, Brian S Hilbush, Stuart Inglis, Sean A Irvine, Alan Jackson, Richard Littin, Mehul Rathod, David Ware, Justin M. Zook, Len Trigg, Francisco M. M. De La Vega
doi: http://dx.doi.org/10.1101/023754
To evaluate and compare the performance of variant calling methods and their confidence scores, comparisons between a test call set and a “gold standard” need to be carried out. Unfortunately, these comparisons are not straightforward with the current Variant Call Files (VCF), which are the standard output of most variant calling algorithms for high-throughput sequencing data. Comparisons of VCFs are often confounded by the different representations of indels, MNPs, and combinations thereof with SNVs in complex regions of the genome, resulting in misleading results. A variant caller is inherently a classification method designed to score putative variants with confidence scores that could permit controlling the rate of false positives (FP) or false negatives (FN) for a given application. Receiver operator curves (ROC) and the area under the ROC (AUC) are efficient metrics to evaluate a test call set versus a gold standard. However, in the case of VCF data this also requires a special accounting to deal with discrepant representations. We developed a novel algorithm for comparing variant call sets that deals with complex call representation discrepancies and through a dynamic programing method that minimizes false positives and negatives globally across the entire call sets for accurate performance evaluation of VCFs.

Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates

Genome-Wide Scan for Adaptive Divergence and Association with Population-Specific Covariates
mathieu gautier
doi: http://dx.doi.org/10.1101/023721

In population genomics studies, accounting for the neutral covariance structure across population allele frequencies is critical to improve the robustness of genome-wide scan approaches. Elaborating on the BayEnv model, this study investigates several modeling extensions i) to improve the estimation accuracy of the population covariance matrix and all the related measures; ii) to identify significantly overly differentiated SNPs based on a calibration procedure of the XtX statistics; and iii) to consider alternative covariate models for analyses of association with population-specific covariables. In particular, the auxiliary variable model allows to deal with multiple testing issues and, providing the relative marker positions are available, to capture some Linkage Disequilibrium information. A comprehensive simulation study is further carried out to investigate and compare the performance of the different models. For illustration purpose, genotyping data on 18 French cattle breeds are also analyzed leading to the identification of thirteen strong signatures of selection. Among these, four (surrounding the KITLG, KIT, EDN3 and ALB genes) contained SNPs strongly associated with the piebald coloration pattern while a fifth (surrounding PLAG1) could be associated to morphological differences across the populations. Finally, analysis of Pool–Seq data from 12 populations of {\it Littorina saxatilis} living in two different ecotypes illustrates how the proposed framework might help addressing relevant ecological question in non–model species. Overall, the proposed methods define a robust Bayesian framework to characterize adaptive genetic differentiation across populations. The BayPass program implementing the different models is available at http://www1.montpellier.inra.fr/CBGP/software/baypass/.

Length Distribution of Ancestral Tracks under a General Admixture Model and Its Applications in Population History Inference

Length Distribution of Ancestral Tracks under a General Admixture Model and Its Applications in Population History Inference
Xumin Ni, Xiong Yang, Wei Guo, Kai Yuan, Ying Zhou, Zhiming Ma, Shuhua Xu
doi: http://dx.doi.org/10.1101/023390

As a chromosome is sliced into pieces by recombination after entering an admixed population, ancestral tracks of chromosomes are shortened with the pasting of generations. The length distribution of ancestral tracks reflects information of recombination and thus can be used to infer the histories of admixed populations. Previous studies have shown that inference based on ancestral tracks is powerful in recovering the histories of admixed populations. However, population histories are always complex, and previous studies only deduced the length distribution of ancestral tracks under very simple admixture models. The deduction of length distribution of ancestral tracks under a more general model will greatly elevate the power in inferring population histories. Here we first deduced the length distribution of ancestral tracks under a general model in an admixed population, and proposed general principles in parameter estimation and model selection with the length distribution. Next, we focused on studying the length distribution of ancestral tracks and its applications under three typical admixture models, which were all special cases of our general model. Extensive simulations showed that the length distribution of ancestral tracks was well predicted by our theoretical models. We further developed a new method based on the length distribution of ancestral tracks and good performance was observed when it was applied in inferring population histories under the three typical models. Notably, our method was insensitive to demographic history, sample size and threshold to discard short tracks. Finally, we applied our method in African Americans and Mexicans from the HapMap dataset, and several South Asian populations from the Human Genome Diversity Project dataset. The results showed that the histories of African Americans and Mexicans matched the historical records well, and the population admixture history of South Asians was very complex and could be traced back to around 100 generations ago.