Complex patterns of local adaptation in teosinte

Complex patterns of local adaptation in teosinte

Tanja Pyhäjärvi, Matthew B. Hufford, Sofiane Mezmouk, Jeffrey Ross-Ibarra
(Submitted on 3 Aug 2012)

Populations of widely distributed species often encounter and adapt to specific environmental conditions. However, comprehensive characterization of the genetic basis of adaptation is demanding, requiring genome-wide genotype data, multiple sampled populations, and a good understanding of population structure. We have used environmental and high-density genotype data to describe the genetic basis of local adaptation in 21 populations of teosinte, the wild ancestor of maize. We found that altitude, dispersal events and admixture among subspecies formed a complex hierarchical genetic structure within teosinte. Patterns of linkage disequilibrium revealed four mega-base scale inversions that segregated among populations and had altitudinal clines. Based on patterns of differentiation and correlation with environmental variation, inversions and nongenic regions play an important role in local adaptation of teosinte. Further, we note that strongly differentiated individual populations can bias the identification of adaptive loci. The role of inversions in local adaptation has been predicted by theory and requires attention as genome-wide data become available for additional plant species. These results also suggest a potentially important role for noncoding variation, especially in large plant genomes in which the gene space represents a fraction of the entire genome.

Analysis of DNA sequence variation within marine species using Beta-coalescents

Analysis of DNA sequence variation within marine species using Beta-coalescents

Matthias Steinrücken, Matthias Birkner, Jochen Blath
(Submitted on 4 Sep 2012)

We apply recently developed inference methods based on general coalescent processes to DNA sequence data obtained from various marine species. Several of these species are believed to exhibit so-called shallow gene genealogies, potentially due to extreme reproductive behaviour, e.g. via Hedgecock’s “reproduction sweepstakes”. Besides the data analysis, in particular the inference of mutation rates and the estimation of the (real) time to the most recent common ancestor, we briefly address the question whether the genealogies might be adequately described by so-called Beta coalescents (as opposed to Kingman’s coalescent), allowing multiple mergers of genealogies.
The choice of the underlying coalescent model for the genealogy has drastic implications for the estimation of the above quantities, in particular the real-time embedding of the genealogy

The date of interbreeding between Neandertals and modern humans

The date of interbreeding between Neandertals and modern humans

Sriram Sankararaman, Nick Patterson, Heng Li, Svante Pääbo, David Reich
(Submitted on 10 Aug 2012)

Comparisons of DNA sequences between Neandertals and present-day humans have shown that Neandertals share more genetic variants with non-Africans than with Africans. This could be due to interbreeding between Neandertals and modern humans when the two groups met subsequent to the emergence of modern humans outside Africa. However, it could also be due to population structure that antedates the origin of Neandertal ancestors in Africa. We measure the extent of linkage disequilibrium (LD) in the genomes of present-day Europeans and find that the last gene flow from Neandertals (or their relatives) into Europeans likely occurred 37,000-86,000 years before the present (BP), and most likely 47,000-65,000 years ago. This supports the recent interbreeding hypothesis, and suggests that interbreeding may have occurred when modern humans carrying Upper Paleolithic technologies encountered Neandertals as they expanded out of Africa.

Transposable sequence evolution is driven by gene context

Transposable sequence evolution is driven by gene context

Anna-Sophie Fiston-Lavier, Charles E. Vejnar, Hadi Quesneville
(Submitted on 2 Sep 2012)

Transposable elements (TEs) in eukaryote genomes are quantitatively the main components affecting genome size, structure and expression. The dynamics of their insertion and deletion depend on diverse factors varying in strength and nature along the genome. We address here how TE sequence evolution is affected by neighboring genes and the chromatin status (euchromatin or heterochromatin) at their insertion site. We estimated the rates of evolution of TE sequences in Arabidopsis thaliana, and found that they depend on the distance to the nearest genes: TEs located close to genes evolve faster than those that are more distant. Consequently, TE sequences in heterochromatic regions, which are gene-poor regions, are surprisingly younger and longer than those elsewhere. We present a model of TE sequence dynamics in TE-rich genomes, such as maize and wheat, and in TE-poor genomes such as fly and A. thaliana.

Evolutionary genomics of transposable elements in Saccharomyces cerevisiae

Evolutionary genomics of transposable elements in Saccharomyces cerevisiae

Martin Carr, Douda Bensasson, Casey M. Bergman
(Submitted on 1 Sep 2012)

Saccharomyces cerevisiae is one of the premier model systems for studying the genomics and evolution of transposable elements. The availability of the S. cerevisiae genome led to many insights into its five known transposable element families (Ty1-Ty5) in the years shortly after its completion. However, subsequent advances in bioinformatics tools for analysing transposable elements and the recent availability of genome sequences for multiple strains and species of yeast motivates new investigations into Ty evolution in S. cerevisiae. Here we provide a comprehensive phylogenetic and population genetic analysis of Ty families in S. cerevisiae based on a reannotation of Ty elements in the S288c reference genome. We show that previous annotation efforts have underestimated the total copy number of Ty elements for all known families. In addition, we identify a new family of Ty3-like elements related to the S. paradoxus Ty3p which is composed entirely of degenerate solo LTRs. Phylogenetic analyses of LTR sequences identified three families with short-branch, recently active clades nested among long branch, inactive insertions (Ty1, Ty3, Ty4), one family with essentially all recently active elements (Ty2) and two families with only inactive elements (Ty3p and Ty5). Population genomic data from 38 additional strains of S. cerevisiae show that elements present in active clades are predominantly polymorphic, whereas most of the inactive elements are fixed. Finally, we use comparative genomic data to provide evidence that the Ty2 and Ty3p families have arisen in the S. cerevisiae genome by horizontal transfer. Our results demonstrate that the genome of a single individual contains important information about the state of TE population dynamics within a species and suggest that horizontal transfer may play an important role in shaping the diversity of transposable elements in unicellular eukaryotes.

Our paper: Inference of population splits and mixtures from genome-wide allele frequency data

[This author post is by Joe Pickrell (@joe_pickrell) on Inference of population splits and mixtures from genome-wide allele frequency data, available from arXiv here]

Early last year, I began working (with Jonathan Pritchard) on methods for using genetics to understand population history. As we describe in our preprint, our approach was to build a parameterized model to describe the patterns of correlation in allele frequencies across populations. This type of approach dates back to brilliant work on building population trees by Luca Cavalli-Sforza, AWF Edwards, and Joe Felsenstein from around 40 years ago. The key to our work is that instead of representing history as a bifurcating tree, we additionally allow “migration events” to model admixture between populations. The output from our model (called TreeMix, and available here) is something like that shown below.

A graph of human population history, allowing 10 migration events. Populations are colored according to geographic region.

We applied this method to both human and dog history, with a mix of both known and novel historical results. I thought here I’d speculate about a couple of the novel results:

1. In the human data (see the graph above), one of the more surprising things to me was the arrow to the Cambodian population. The Cambodians appear to be an admixed population, with ~85% of their ancestry related to other southeast Asian populations (like the Dai) and ~15% of their ancestry from…it’s not totally clear. As you can see in the graph, the source of this admixture appears to be a population not particularly closely related to any other population in these data. So who was this population? A speculation is that this represents ancestry from a population related to the “Ancestral South Indian” population described by Reich et al. (2009), though other sources (e.g. Oceania) are plausible.

2. In the dog data (see Figures 5 and 6 in the pre-print), the most overwhelming signal in the data is that the Basenji, a central African dog breed, appears to trace ~25% of its ancestry to admixture with wolves since domestication. This signal is made somewhat surprising by the fact that there are no wolf populations currently living in Africa, which would seem to be a formidable barrier to admixture with an African dog breed. A hint for what’s going on here is provided by vonHoldt et al. (2010), who show that the basenji have an unusual amount of shared variation with wolves from the Middle East. One speculation, then, is that as the ancestors of the Basenji moved into Africa, they came into contact with Middle Eastern wolves and admixed with them.

Other suggestions for scenarios to explain these results are of course welcome. Overall, I’m hopeful that approaches like TreeMix will eventually supplant “standard” tree-building algorithms for situations in which gene flow is known to occur, though of course further development is necessary before this becomes reality.

Joe Pickrell

The genetic prehistory of southern Africa

The genetic prehistory of southern Africa

Joseph K. Pickrell, Nick Patterson, Chiara Barbieri, Falko Berthold, Linda Gerlach, Mark Lipson, Po-Ru Loh, Tom Güldemann, Blesswell Kure, Sununguko Wata Mpoloka, Hirosi Nakagawa, Christfried Naumann, Joanna L. Mountain, Carlos D. Bustamante, Bonnie Berger, Brenna M. Henn, Mark Stoneking, David Reich, Brigitte Pakendorf
(Submitted on 23 Jul 2012)

The hunter-gatherer populations of southern and eastern Africa are known to harbor some of the most ancient human lineages, but their historical relationships are poorly understood. We report data from 22 populations analyzed at over half a million single nucleotide polymorphisms (SNPs), using a genome-wide array designed for studies of history. The southern Africans-here called Khoisan-fall into two groups, loosely corresponding to the northwestern and southeastern Kalahari, which we show separated within the last 30,000 years. All individuals derive at least a few percent of their genomes from admixture with non-Khoisan populations that began 1,200 years ago. In addition, the Hadza, an east African hunter-gatherer population that speaks a language with click consonants, derive about a quarter of their ancestry from admixture with a population related to the Khoisan, implying an ancient genetic link between southern and eastern Africa.

Our paper: Blood ties: ABO is a trans-species polymorphism in primates

[This author post is by Laure Ségurel [a postdoc in the Przeworski Lab] on the paper Blood ties: ABO is a trans-species polymorphism in primates, posted on the arXiv here]

The mysteries of the ABO blood group were first brought to our attention by Carole Ober. When we started working on it, we were mostly surprised by how little was known about the function of such a heavily studied gene and such an important clinical phenotype. Indeed, the expression of A, B and/or O antigens at the surface of some cells is a polymorphic phenotype shared by species as diverse as macaques and baboons in Africa, gibbons in Asia, squirrel monkeys in the Americas and, of course, humans throughout most of the world yet, many questions remain unanswered, such as: What is the biological role of ABO in different cell types? Why did Hominoids evolve toward its expression of blood cells whereas other primates express it only on epithelial/endothelial cells? Why is the O allele at such high frequency only in humans? What are the selective agents responsible for the maintenance of this polymorphism? And why did chimpanzees and bonobos apparently loose the polymorphism?

One question that we became interested in answering with population genetic tools was that of the origin of such blood types. When did the genetic polymorphism first emerge and which species share it identical by descent (as opposed to by convergent evolution)? Answers to these questions could tell us where and when having multiple alleles at this locus became advantageous. We therefore sequenced as many Hominoids, Old World monkeys and New World monkeys we could get our hands on, and, even more interestingly, we started thinking about the expectations under a model of convergent evolution, i.e., one where the AB genetic polymorphism was created independently multiple times in different species (and then maintained by balancing selection in these lineages) versus under a model of trans-species polymorphism, i.e., in which the AB genetic polymorphism arose early in time and was transmitted identical by descent to distinct species. Key to distinguishing the two predictions is the age of different selected alleles within a polymorphic population.

We therefore compared alleles within humans, orangutans, gibbons, macaques, baboons and colobus monkeys (all polymorphic species for the A and B alleles), and showed that, at least among Hominoids and among Old World monkeys, the observed genetic pattern is not compatible with a model of convergent evolution but on the contrary matches the expectations under a model of a trans-species polymorphism maintained by multi-allelic balancing selection. In other words, the data indicate that the AB polymorphism was present at least around 20 Millions of years ago, if not earlier. Also, interestingly, it seems that the A, B and O functional classes do not provide a complete description of the allelic classes natural selection is acting on, which underscores the need for more detailed functional studies of ABO sub-groups.

By submitting the paper to arXiv, we hope to circulate it to a diverse audience and without delay. In particular, we hope that the study will motivate more experimental/functional work about the role of this polymorphism in immune response, e.g., to pathogen infections.

Laure Ségurel

Our paper: Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster

Casey Bergman [@caseybergman and @bergmanlab] kindly wrote a post about his recently arXived paper:
Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster
ArXived here.
__________________________________________________________________
As part of the Drosophila 12 Genome Project, Steve Salzberg and colleagues’ published a pioneering paper in 2005 showing that complete genomes of the bacterial endosymbiont Wolbachia pipientis can be extracted from the whole-genome shotgun sequence assemblies of Drosophila species. This paper always left an impression on me as a very clever use of extracting new biology from existing genomic data, and when the era of resequencing multiple strains of D. melanogaster kicked off a few years ago, it seemed like a natural extension to ask if this approach could be adapted to a next-generation sequencing data to study the co-evolution of Wolbachia and Drosophila using whole genome data.

In the current work, we used short-read next generation sequencing data from two major resequencing efforts in D. melanogaster — the Drosophila Genetic Reference Panel (DGRP) and Drosophila Population Genomics Project (DPGP) — together with the reference Wolbachia genome published by Wu et al. (2005) and extracted over 175 complete Wolbachia genomes and nearly 300 complete mitochondrial genomes. Readers can find the main results in the paper, which is currently in review. I’d like to discuss here the social context of the project and some of the reasons we submitted to arXiv.

This project started out as summer project for a masters student, Mark Richardson, in 2010 who did an amazing job developing the initial pipeline made most of the initial discoveries in the paper. Mark and I started a collaboration with Frank Jiggins and Mike McGwire shortly after to verify that our in silico genotyping results were making sense, who suggested to bring in Lucy Weinart and John Welch to help with the more sophisticated Bayesian phylogenetic analysis. Another PhD student in my lab, Raquel Linheiro, adapted her transposable element detection pipeline to identify particular Wolbachia sublineages which was crucial to linking our data with previous results. This was a great collaboration, where everyone made significant contributions, and I would collaborate with everyone again (and I hope to!).

At the time (summer 2010), we only had access to the North American strains from the DGRP sample; knowing that North American D. melanogaster are derived populations, we were cautious about the impact that population structure had on our results. We planned in early 2011 to publish on only the DGRP dataset since Mark was going off to do a PhD in Australia and I didn’t have anyone else in the group working on this project. In the summer of 2011, the African DPGP data came online and I decided to take a peek and run the pipeline on the African strains as well. This led to a major overhaul of the project and set us back a year, since all the data had to be reanalyzed again together and the interpretation of the biogeography results was substantially altered. This was in some ways lucky because our initial interpretation of evidence for a selective sweep on one of the cytoplasmic lineages was probably wrong, and it saved us from having to back peddle on this misinterpretation in a later publication.

As we plugged away at trying to finish this project, we had inquiries about the status of the project from several other groups working in the Wolbachia field. Honestly this stressed me out quite a bit, since some of the inquiries were coming from post-docs in big labs. But instead of just sitting on the data, after we finalized the dataset we decided to release these data openly on our lab blog in April 2012. We decided on an open release as a way to help these teams (and others we didn’t know about), but also to get some priority in this area by providing the “gold standard” that other groups could use (and cite!). For the record, I will note that we asked two teams who contacted us about our project if they would reciprocate by sharing unpublished genomic data or in one case published genomic data that was not submitted to GenBank; both declined.

After making the decision to release the data pre-publication, it was a natural step to submit the manuscript to arXiv. I’m an open science advocate and used the Nature Preprint server occasionally in the past. I never really liked the Nature Preprint server, though, since I thought people posted there to give their manuscript the stink of being “Nature (in prep)” on their CV. And I never posted to arXiv in the past, since I always thought it was for more hardcore computational or mathematical biology. But recently, I was convinced by Rosie Redfield, Leonid Kruglyak and colleagues putting their Arsenic Life paper on arXiv that more empirical work in quantitative biology was arXiv-able. And just as with releasing our data early, it seemed like the best way to prevent being scooped was to get our results out as quickly as possible and letting people know about it.

So we went for it. And I have to say the experience has been thoroughly rewarding. Submitting was a piece of cake, easier than any journal I’ve ever submitted to. Having a URL to point to allowed me to tweet about it, which got some exposure to the paper and some new colleagues on twitter. It also allowed me to send a submitted manuscript around to colleagues for informal review, without cluttering up their inboxes with big attachments or providing a moral dilemma about who they can share the manuscript with. And somehow submitting to arXiv pushed the “it’s submitted” button in my brain, which made me a whole lot less stressed about the possibility of being scooped and I’ve been more relaxed throughout the formal submission process. Finally, I know that the pre-publication release of the data and posting of the manuscript has led to a group in Russia using these sequences into their work, and I’ve just gotten a manuscript to review from this group citing our arXiv manuscript and extending our results before our paper is even published! This is what research is all about, right: doing science, getting it out, and letting others build on it. I’ll definitely submit to arXiv for all my papers from my lab, and look forward to the Haldane’s Sieve readership giving us a hard time about our manuscripts while they evolve into formal publications.

Casey Bergman

Population genomics of sub-Saharan Drosophila melanogaster: African diversity and non-African admixture

Population genomics of sub-Saharan Drosophila melanogaster: African diversity and non-African admixture
John E. Pool, Russell B. Corbett-Detig, Ryuichi P. Sugino, Kristian A. Stevens, Charis M. Cardeno, Marc W. Crepeau, Pablo Duchen, J. J. Emerson, Perot Saelao, David J. Begun, Charles H. Langley
(Submitted on 23 Aug 2012)

(ABRIDGED) We report the genome sequencing of 139 wild-derived strains of D. melanogaster, representing 22 population samples from the sub-Saharan ancestral range of this species, along with one European population. Most genomes were sequenced above 25X depth from haploid embryos. Results indicated a pervasive influence of non-African admixture in many African populations, motivating the development and application of a novel admixture detection method. Admixture proportions varied among populations, with greater admixture in urban locations. Admixture levels also varied across the genome, with localized peaks and valleys suggestive of a non-neutral introgression process. Genomes from the same location differed starkly in ancestry, suggesting that isolation mechanisms may exist within African populations. After removing putatively admixed genomic segments, the greatest genetic diversity was observed in southern Africa (e.g. Zambia), while diversity in other populations was largely consistent with a geographic expansion from this potentially ancestral region. The European population showed different levels of diversity reduction on each chromosome arm, and some African populations displayed chromosome arm-specific diversity reductions. Inversions in the European sample were associated with strong elevations in diversity across chromosome arms. Genomic scans were conducted to identify loci that may represent targets of positive selection. A disproportionate number of candidate selective sweep regions were located near genes with varied roles in gene regulation. Outliers for Europe-Africa FST were found to be enriched in genomic regions of locally elevated cosmopolitan admixture, possibly reflecting a role for some of these loci in driving the introgression of non-African alleles into African populations.