The Time-Scale of Recombination Rate Evolution in Great Apes

The Time-Scale of Recombination Rate Evolution in Great Apes

Laurie S Stevison, August E Woerner, Jeffrey M Kidd, Joanna L Kelley, Krishna R Veeramah, Kimberly F McManus, Carlos D Bustamante, Michael F Hammer, Jeffrey D Wall
doi: http://dx.doi.org/10.1101/013755

We present three linkage-disequilibrium (LD)-based recombination maps generated using whole-genome sequencing data of 10 Nigerian chimpanzees, 13 bonobos, and 15 western gorillas, collected as part of the Great Ape Genome Project (Prado-Martinez et al. 2013). Using species-specific PRDM9 sequences to predict potential binding sites, we identified an important role for PRDM9 in predicting recombination rate variation broadly across great apes. Our results are contrary to previous research that PRDM9 is not associated with recombination in western chimpanzees (Auton et al. 2012). Additionally, we show that fewer hotspots are shared among chimpanzee subspecies than within human populations, further narrowing the time-scale of complete hotspot turnover. We quantified the variation in the biased distribution of recombination rates towards recombination hotspots across great apes. We found that correlations between broad-scale recombination rates decline more rapidly than nucleotide divergence between species. We also compared the skew of recombination rates at centromeres and telomeres between species and show a skew from chromosome means extending as far as 10‐15 Mb from chromosome ends. Further, we examined broad-scale recombination rate changes near a translocation in gorillas and found minimal differences as compared to other great ape species perhaps because the coordinates relative to the chromosome ends were unaffected. Finally, based on multiple linear regression analysis, we found that various correlates of recombination rate persist throughout primates including repeats, diversity, divergence and local effective population size (Ne). Our study is the first to analyze within- and between-species genome-wide recombination rate variation in several close relatives.

The P-element strikes again: the recent invasion of natural Drosophila simulans populations

The P-element strikes again: the recent invasion of natural Drosophila simulans populations

Robert Kofler, Tom Hill, Viola Nolte, Andrea Betancourt, Christian Schlötterer
doi: http://dx.doi.org/10.1101/013722

The P-element is one of the best understood eukaryotic transposable elements. It invaded Drosophila melanogaster populations within a few decades, but was thought to be absent from close relatives, including D. simulans. Five decades after the spread in D. melanogaster, we provide evidence that the P-element has also invaded D. simulans. P-elements in D. simulans appear to have been acquired recently from D. melanogaster probably via a single horizontal transfer event. Expression data indicate that the P-element is processed in the germline of D. simulans, and genomic data show an enrichment of P-element insertions in putative origins of replication, similar to that seen in D. melanogaster. This ongoing spread of the P-element in natural populations provides an unique opportunity to understand the dynamics of transposable element spreads and the associated piRNA defense mechanisms.

Distributions of topological tree metrics between a species tree and a gene tree

Distributions of topological tree metrics between a species tree and a gene tree

Jing Xi, Jin Xie, Ruriko Yoshida
(Submitted on 10 Jan 2015)

In order to conduct a statistical analysis on a given set of phylogenetic gene trees, we often use a distance measure between two trees. In a statistical distance-based method to analyze discordance between gene trees, it is a key to decide “biological meaningful” and “statistically well-distributed” distance between trees. Thus, in this paper, we study the distributions of the three tree distance metrics: the edge difference, the path difference, and the precise K interval cospeciation distance, between two trees: first, we focus on distributions of the three tree distances between two random unrooted trees with n leaves (n≥4); and then we focus on the distributions the three tree distances between a fixed rooted species tree with n leaves and a random gene tree with n leaves generated under the coalescent process with given the species tree. We show some theoretical results as well as simulation study on these distributions.

Reprogramming LCLs to iPSCs Results in Recovery of Donor-Specific Gene Expression Signature

Reprogramming LCLs to iPSCs Results in Recovery of Donor-Specific Gene Expression Signature

Samantha M Thomas, Courtney Kagan, Bryan J Pavlovic, Jonathan Burnett, Kristen Patterson, Jonathan K Pritchard, Yoav Gilad
doi: http://dx.doi.org/10.1101/013631

Renewable in vitro cell cultures, such as lymphoblastoid cell lines (LCLs), have facilitated studies that contributed to our understanding of genetic influence on human traits. However, the degree to which cell lines faithfully maintain differences in donor-specific phenotypes is still debated. We have previously reported that standard cell line maintenance practice results in a loss of donor-specific gene expression signatures in LCLs. An alternative to the LCL model is the induced pluripotent stem cell (iPSC) system, which carries the potential to model tissue-specific physiology through the use of differentiation protocols. Still, existing LCL banks represent an important source of starting material for iPSC generation, and it is possible that the disruptions in gene regulation associated with long-term LCL maintenance could persist through the reprogramming process. To address this concern, we studied the effect of reprogramming mature LCLs to iPSCs on the ensuing gene expression patterns within and between six unrelated donor individuals. We show that the reprogramming process results in a recovery of donor-specific gene regulatory signatures. Since environmental contributions are unlikely to be a source of individual variation in our system of highly passaged cultured cell lines, our observations suggest that the effect of genotype on gene regulation is more pronounced in the iPSCs than in the LCL precursors. Our findings indicate that iPSCs can be a powerful model system for studies of phenotypic variation across individuals in general, and the genetic association with variation in gene regulation in particular. We further conclude that LCLs are an appropriate starting material for iPSC generation.

Software for the analysis and visualization of deep mutational scanning data

Software for the analysis and visualization of deep mutational scanning data

Jesse D Bloom
doi: http://dx.doi.org/10.1101/013623

Background Deep mutational scanning is a technique to estimate the impacts of mutations on a gene by using deep sequencing to count mutations in a library of variants before and after imposing a functional selection. The impacts of mutations must be inferred from changes in their counts after selection. Results I describe a software package, dms_tools, to infer the impacts of mutations from deep mutational scanning data using a likelihood-based treatment of the mutation counts. I show that dms_tools yields more accurate inferences on simulated data than the widely used but statistically biased approach of calculating ratios of counts pre- and post-selection. Using dms_tools, one can infer the preference of each site for each amino acid given a single selection pressure, or assess the extent to which these preferences change under different selection pressures. The preferences and their changes can be intuitively visualized with sequence-logo-style plots created using an extension to weblogo. Conclusions dms_tools implements a statistically principled approach for the analysis and subsequent visualization of deep mutational scanning data.

The origin and evolution of maize in the American Southwest

The origin and evolution of maize in the American Southwest

Rute R da Fonseca, Bruce D Smith, Nathan Wales, Enrico Cappellini, Pontus Skoglund, Matteo Fumagalli, José Alfredo Samaniego, Christian Carøe, María C Ávila-Arcos, David E Hufnagel, Thorfinn Sand Korneliussen, Filipe Garrett Vieira, Mattias Jakobsson, Bernardo Arriaza, Eske Willerslev, Rasmus Nielsen, Matthew B Hufford, Anders Albrechtsen, Jeffrey Ross-Ibarra, M Thomas P Gilbert
doi: http://dx.doi.org/10.1101/013540

Maize offers an ideal system through which to demonstrate the potential of ancient population genomic techniques for reconstructing the evolution and spread of domesticates. The diffusion of maize from Mexico into the North American Southwest (SW) remains contentious with the available evidence being restricted to morphological studies of ancient maize plant material. We captured 1 Mb of nuclear DNA from 32 archaeological maize samples spanning 6000 years and compared them with modern landraces including those from the Mexican West coast and highlands. We found that the initial diffusion of domesticated maize into the SW is likely to have occurred through a highland route. However, by 2000 years ago a Pacific coastal corridor was also being used. Furthermore, we could distinguish between genes that were selected for early during domestication (such as zagl1 involved in shattering) from genes that changed in the SW context (e.g. related to sugar content and adaptation to drought) likely as a response to the local arid environment and new cultural uses of maize.

Response of polygenic traits under stabilising selection and mutation when loci have unequal effects

Response of polygenic traits under stabilising selection and mutation when loci have unequal effects

Kavita Jain, Wolfgang Stephan
(Submitted on 9 Jan 2015)

We consider an infinitely large population under stabilising selection and mutation in which the allelic effects determining a polygenic trait vary between loci. We obtain analytical expressions for the stationary genetic variance as a function of the distribution of effects, mutation rate and selection coefficient. We also study the dynamics of the allele frequencies, focussing on short-term evolution of the phenotypic mean as it approaches the optimum after an environmental change. We find that when most effects are small, the genetic variance does not change appreciably during adaptation, and the time until the phenotypic mean reaches the optimum is short if the number of loci is large. However, when most effects are large, the change of the variance during the adaptive process cannot be neglected. In this case, the short-term dynamics may be described by that of a single locus of large effect. Our results may be used to understand polygenic selection driving rapid adaptation.

A pooling-based approach to mapping genetic variants associated with DNA methylation

A pooling-based approach to mapping genetic variants associated with DNA methylation

Irene Miriam Kaplow, Julia L MacIsaac, Sarah M Mah, Lisa M McEwen, Michael S Kobor, Hunter B Fraser
doi: http://dx.doi.org/10.1101/013649

DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover less than 2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a truly genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified over 2,000 genetic variants associated with DNA methylation. We found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data.

SWS2 visual pigment evolution as a test of historically contingent patterns of plumage color evolution in Warblers

SWS2 visual pigment evolution as a test of historically contingent patterns of plumage color evolution in Warblers

Natasha Bloch, James M Morrow, Belinda SW Chang, Trevor D Price
doi: http://dx.doi.org/10.1101/013573

Distantly related clades that occupy similar environments may differ due to the lasting imprint of their ancestors – historical contingency. The New World warblers (Parulidae) and Old World warblers (Phylloscopidae) are ecologically similar clades that differ strikingly in plumage coloration. We studied genetic and functional evolution of the short-wavelength sensitive visual pigments (SWS2 and SWS1) to ask if altered color perception could contribute to the plumage color differences between clades. We show SWS2 is short-wavelength shifted in birds that occupy open environments, such as finches, compared to those in closed environments, including warblers. Sequencing of opsin genes and phylogenetic reconstructions indicate New World warblers were derived from a finch-like form that colonized from the Old World 15-20Ma. During this process the SWS2 gene accumulated 6 substitutions in branches leading to New World warblers, inviting the hypothesis that passage through a finch-like ancestor resulted in SWS2 evolution. In fact, we show spectral tuning remained similar across warblers as well as the finch ancestor. Results reject the hypothesis of historical contingency based on opsin spectral tuning, but point to evolution of other aspects of visual pigment function. Using the approach outlined here, historical contingency becomes a generally testable theory in systems where genotype and phenotype can be connected.

Independent molecular basis of convergent highland adaptation in maize

Independent molecular basis of convergent highland adaptation in maize

Shohei Takuno, Peter Ralph, Kelly Swarts, Rob J Elshire, Jeffrey C Glaubitz, Edward S. Buckler, Matthew B Hufford, Jeff Ross-Ibarra
doi: http://dx.doi.org/10.1101/013607

Convergent evolution occurs when multiple species/subpopulations adapt to similar environments via similar phenotypes. We investigate here the molecular basis of convergent adaptation in maize to highland climates in Mexico and South America using genome-wide SNP data. Taking advantage of archaeological data on the arrival of maize to the highlands, we infer demographic models for both populations, identifying evidence of a strong bottleneck and rapid expansion in South America. We use these models to then identify loci showing an excess of differentiation as a means of identifying putative targets of natural selection, and compare our results to expectations from recently developed theory on convergent adaptation. Consistent with predictions across a wide array of parameter space, we see limited evidence for convergent evolution at the nucleotide level in spite of strong similarities in overall phenotypes. Instead, we show that selection appears to have predominantly acted on standing genetic variation, and that introgression from wild teosinte populations appears to have played a role in highland adaptation in Mexican maize.