Estimating the temporal and spatial extent of gene flow among sympatric lizard populations (genus Sceloporus) in the southern Mexican highlands

Estimating the temporal and spatial extent of gene flow among sympatric lizard populations (genus Sceloporus) in the southern Mexican highlands

Jared A Grummer, Martha L. Calderón, Adrián Nieto Montes-de Oca, Eric N Smith, Fausto Méndez-de la Cruz, Adam Leaché
doi: http://dx.doi.org/10.1101/008623

Interspecific gene flow is pervasive throughout the tree of life. Although detecting gene flow between populations has been facilitated by new analytical approaches, determining the timing and geography of hybridization has remained difficult, particularly for historical gene flow. A geographically explicit phylogenetic approach is needed to determine the ancestral population overlap. In this study, we performed population genetic analyses, species delimitation, simulations, and a recently developed approach of species tree diffusion to infer the phylogeographic history, timing and geographic extent of gene flow in lizards of the Sceloporus spinosus group. The two species in this group, S. spinosus and S. horridus, are distributed in eastern and western portions of Mexico, respectively, but populations of these species are sympatric in the southern Mexican highlands. We generated data consisting of three mitochondrial genes and eight nuclear loci for 148 and 68 individuals, respectively. We delimited six lineages in this group, but found strong evidence of mito-nuclear discordance in sympatric populations of S. spinosus and S. horridus owing to mitochondrial introgression. We used coalescent simulations to differentiate ancestral gene flow from secondary contact, but found mixed support for these two models. Bayesian phylogeography indicated more than 60% range overlap between ancestral S. spinosus and S. horridus populations since the time of their divergence. Isolation-migration analyses, however, revealed near-zero levels of gene flow between these ancestral populations. Interpreting results from both simulations and empirical data indicate that despite a long history of sympatry among these two species, gene flow in this group has only recently occurred.

Strong selection in the human-chimpanzee ancestor links the X chromosome to speciation

Strong selection in the human-chimpanzee ancestor links the X chromosome to speciation

Julien Y Dutheil, Kasper Munch, Thomas Mailund, Kiwoong Nam, Mikkel Schierup
doi: http://dx.doi.org/10.1101/011601

The human and chimpanzee X chromosomes are less divergent than expected based on autosomal divergence. This has led to a controversial hypothesis proposing a unique role of the X chromosome in complex human-chimpanzee speciation. Here, we study incomplete lineage sorting patterns between humans, chimpanzees and gorillas to show that this low divergence is entirely due to megabase-sized regions comprising one-third of the X chromosome, where polymorphism in the human-chimpanzee ancestral species was severely reduced. Background selection can explain 10% of this reduction at most. Instead, we show that several strong selective sweeps in the ancestral species can explain these patterns. We also report evidence of population specific sweeps of a similar magnitude in extant humans that overlap the regions of low diversity in the ancestral species. These regions further correspond to chromosomal sections shown to be devoid of Neanderthal introgression into modern humans. This suggests that these X-linked regions are directly involved in forming reproductive barriers.

The seed-bank coalescent

The seed-bank coalescent

Jochen Blath, Adrián González Casanova, Noemi Kurt, Maite Wilke-Berenguer
(Submitted on 18 Nov 2014)

We identify a new natural coalescent structure, the seed-bank coalescent, which describes the gene genealogy of populations under the influence of a strong seed-bank effect, where `dormant forms’ of individuals (such as seeds or spores) may jump a significant number of generations before joining the `active’ population. Mathematically, our seed-bank coalescent appears as scaling limit in a Wright-Fisher model with geometric seed-bank age structure if the average time of seed dormancy scales with the order of the total population size N. This extends earlier results of Kaj, Krone and Lascaux (2001) who show that the genealogy of a Wright-Fisher model in the presence of a `weak’ seed-bank effect is given by a suitably time-changed Kingman coalescent. The qualitatively new feature of the seed-bank coalescent is that ancestral lineages are independently blocked at a certain rate from taking part in coalescence events, thus strongly altering the predictions of classical coalescent models. In particular, the seed-bank coalescent `does not come down from infinity’, and the time to the most recent common ancestor of a sample of size n grows like loglogn, which is the order also observed for the Bolthausen-Sznitman coalescent. This is in line with the empirical observation that seed-banks drastically increase genetic variability in a population and indicates how they may serve as a buffer against other evolutionary forces such as genetic drift and selection.

Triticeae resources in Ensembl Plants

Triticeae resources in Ensembl Plants

Dan M Bolser, Arnaud Kerhornou, Brandon Walts, Paul Kersey
doi: http://dx.doi.org/10.1101/011585

Recent developments in DNA sequencing have enabled the large and complex genomes of many crop species to be determined for the first time, even those previously intractable due to their polyploid nature. Indeed, over the course of the last two years, the genome sequences of several commercially important cereals, notably barley and bread wheat, have become available, as well as those of related wild species. While still incomplete, comparison to other, more completely assembled species suggests that coverage of genic regions is likely to be high. Ensembl Plants (http://plants.ensembl.org) is an integrative resource organising, analysing and visualising genome-scale information for important crop and model plants. Available data includes reference genome sequence, variant loci, gene models and functional annotation. For variant loci, individual and population genotypes, linkage information and, where available, phenotypic information, are shown. Comparative analyses are performed on DNA and protein sequence alignments. The resulting genome alignments and gene trees, representing the implied evolutionary history the gene family, are made available for visualisation and analysis. Driven by the use case of bread wheat, specific extensions to the analysis pipelines and web interface have recently been developed to support polyploid genomes. Data in Ensembl Plants is accessible through a genome browser incorporating various specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These interfaces are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests and pollinators, facilitating the study of the plant in its environment.

Expansion load: recessive mutations and the role of standing genetic variation

Expansion load: recessive mutations and the role of standing genetic variation

Stephan Peischl, Laurent Excoffier
doi: http://dx.doi.org/10.1101/011593

Expanding populations incur a mutation burden – the so-called expansion load. Previous studies of expansion load have focused on co-dominant mutations. An important consequence of this assumption is that expansion load stems exclusively from the accumulation of new mutations occurring in individuals living at the wave front. Using individual-based simulations we study here the dynamics of standing genetic variation at the front of expansions, and its consequences on mean fitness if mutations are recessive. We find that deleterious genetic diversity is quickly lost at the front of the expansion, but the loss of deleterious mutations at some loci is compensated by an increase of their frequencies at other loci. The frequency of deleterious homozygotes therefore increases along the expansion axis whereas the average number of deleterious mutations per individual remains nearly constant across the species range. This reveals two important differences to co-dominant models: (i) mean fitness at the front of the expansion drops much faster if mutations are recessive, and (ii) mutation load can increase during the expansion even if the total number of deleterious mutations per individual remains constant. We use our model to make predictions about the shape of the site frequency spectrum at the front of range expansion, and about correlations between heterozygosity and fitness in different parts of the species range. Importantly, these predictions provide opportunities to empirically validate our theoretical results. We discuss our findings in the light of recent results on the distribution of deleterious genetic variation across human populations, and link them to empirical results on the correlation of heterozygosity and fitness found in many natural range expansions.

Full-genome evolutionary histories of selfing, splitting and selection in Caenorhabditis

Full-genome evolutionary histories of selfing, splitting and selection in Caenorhabditis
Cristel G. Thomas, Wei Wang, Richard Jovelin, Rajarshi Ghosh, Tatiana Lomasko, Quang Trinh, Leonid Kruglyak, Lincoln D Stein, Asher D Cutter
doi: http://dx.doi.org/10.1101/011502

The nematode Caenorhabditis briggsae is a model for comparative developmental evolution with C. elegans. Worldwide collections of C. briggsae have implicated an intriguing history of divergence among genetic groups separated by latitude, or by restricted geography, that is being exploited to dissect the genetic basis to adaptive evolution and reproductive incompatibility. And yet, the genomic scope and timing of population divergence is unclear. We performed high-coverage whole-genome sequencing of 37 wild isolates of the nematode C. briggsae and applied a pairwise sequentially Markovian coalescent (PSMC) model to 703 combinations of genomic haplotypes to draw inferences about population history, the genomic scope of natural selection, and to compare with 40 wild isolates of C. elegans. We estimate that a diaspora of at least 6 distinct C. briggsae lineages separated from one another approximately 200 thousand generations ago, including the ???Temperate??? and ???Tropical??? phylogeographic groups that dominate most samples from around the world. Moreover, an ancient population split in its history 2 million generations ago, coupled with only rare gene flow among lineage groups, validates this system as a model for incipient speciation. Low versus high recombination regions of the genome give distinct signatures of population size change through time, indicative of widespread effects of selection on highly linked portions of the genome owing to extreme inbreeding by self-fertilization. Analysis of functional mutations indicates that genomic context, owing to selection that acts on long linkage blocks, is a more important driver of population variation than are the functional attributes of the individually encoded genes.

Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants

Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants

Sarah A Gagliano, Reena Ravji, Michael R Barnes, Michael E Weale, Jo Knight
doi: http://dx.doi.org/10.1101/011445

Although technology has triumphed in facilitating routine genome re-sequencing, new challenges have been created for the data analyst. Genome scale surveys of human disease variation generate volumes of data that far exceed capabilities for laboratory characterization, and importantly also create a substantial burden of type I error. By incorporating a variety of functional annotations as predictors, such as regulatory and protein coding elements, statistical learning has been widely investigated as a mechanism for the prioritization of genetic variants that are more likely to be associated with complex disease. These methods offer a hope of identification of sufficiently large numbers of truly associated variants, to make cost-effective the large-scale functional characterization necessary to progress genome scale experiments. We compared the results from three published prioritization procedures which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding of the functional annotations. In this paper we also explore different combinations of algorithm and annotation set. We train the models in 60% of the data and reserve the remainder for testing the accuracy. As an application, we tested which methodology performed the best for prioritizing sub-genome-wide-significant variants using data from the first and second rounds of a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71). However, predictive accuracy results obtained from the test set do not always reflect results obtained from the application to the schizophrenia meta-analysis. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved towards the step change in the risk variant prediction required to address the impending bottleneck of the new generation of genome re-sequencing studies.