The Population Genetic Signature of Polygenic Local Adaptation

The Population Genetic Signature of Polygenic Local Adaptation
Jeremy J. Berg, Graham Coop
(Submitted on 29 Jul 2013)

Adaptation in response to selection on polygenic phenotypes occurs via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that have undergone local adaptation. Using GWAS data, we estimate the mean additive genetic value for a give phenotype across many populations as simple weighted sums of allele frequencies. We model the expected differentiation of GWAS loci among populations under neutrality to develop simple tests of selection across an arbitrary number of populations with arbitrary population structure. To find support for the role of specific environmental variables in local adaptation we test for correlations with the estimated genetic values. We also develop a general test of local adaptation to identify overdispersion of the estimated genetic values values among populations. This test is a natural generalization of QST /FST comparisons based on GWAS predictions. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles. We apply our tests to the human genome diversity panel dataset using GWAS data for six different traits. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.

Ancient west Eurasian ancestry in southern and eastern Africa

Ancient west Eurasian ancestry in southern and eastern Africa
Joseph K. Pickrell, Nick Patterson, Po-Ru Loh, Mark Lipson, Bonnie Berger, Mark Stoneking, Brigitte Pakendorf, David Reich
(Submitted on 30 Jul 2013)

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter-gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger-Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to approximately 900-1,800 years ago, and show that it had the largest demographic impact in Khoisan populations that speak Khoe-Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to approximately 2,700 – 3,300 years ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa, and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa.

The genomic impacts of drift and selection for hybrid performance in maize

The genomic impacts of drift and selection for hybrid performance in maize
Justin P. Gerke, Jode W. Edwards, Katherine E. Guill, Jeffrey Ross-Ibarra, Michael D. McMullen
(Submitted on 27 Jul 2013)

Modern maize breeding relies upon selection in inbreeding populations to improve the performance of cross-population hybrids. The United States Department of Agriculture – Agricultural Research Service reciprocal recurrent selection experiment between the Iowa Stiff Stalk Synthetic (BSSS) and the Iowa Corn Borer Synthetic No. 1 (BSCB1) populations represents one of the longest standing models of selection for hybrid performance. To investigate the genomic impact of this selection program, we used the Illumina MaizeSNP50 high-density SNP array to determine genotypes of progenitor lines and over 600 individuals across multiple cycles of selection. Consistent with previous research (Messmer et al., 1991; Labate et al., 1997; Hagdorn et al., 2003; Hinze et al., 2005), we found that genetic diversity within each population steadily decreases, with a corresponding increase in population structure. High marker density also enabled the first view of haplotype ancestry, fixation and recombination within this historic maize experiment. Extensive regions of haplotype fixation within each population are visible in the pericentromeric regions, where large blocks trace back to single founder inbreds. Simulation attributes most of the observed reduction in genetic diversity to genetic drift. Signatures of selection were difficult to observe in the background of this strong genetic drift, but heterozygosity in each population has fallen more than expected. Regions of haplotype fixation represent the most likely targets of selection, but as observed in other germplasm selected for hybrid performance (Feng et al., 2006), there is no overlap between the most likely targets of selection in the two populations. We discuss how this pattern is likely to occur during selection for hybrid performance, and how it poses challenges for dissecting the impacts of modern breeding and selection on the maize genome.

A model-based approach for identifying signatures of balancing selection in genetic data

A model-based approach for identifying signatures of balancing selection in genetic data
Michael DeGiorgio, Kirk E. Lohmueller, Rasmus Nielsen
(Submitted on 16 Jul 2013)

While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. Here we develop two composite likelihood ratio tests for detecting balancing selection. Using simulations, we show that these methods outperform competing methods under a variety of assumptions and demographic models. We apply the new methods to whole-genome human data, and find a number of previously-identified loci with strong evidence of balancing selection, including several HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Thus, our methods are able to reproduce many previously-hypothesized signals of balancing selection, as well as discover novel interesting candidates.

Genomic identification of founding haplotypes reveals the history of the selfing species Capsella rubella

Genomic identification of founding haplotypes reveals the history of the selfing species Capsella rubella
Yaniv Brandvain, Tanja Slotte, Khaled Hazzouri, Stephen Wright, Graham Coop
(Submitted on 15 Jul 2013)

The shift from outcrossing to self-fertilization is among the most common transitions in plants. Until recently, however, a genome-wide view of this transition has been obscured by a dearth of appropriate data and the lack of appropriate population genomic methods to interpret such data. Here, we present novel analyses detailing the origin of the selfing species, Capsella rubella, which recently split from its outcrossing sister, Capsella grandiflora. Due to the recency of the split, most variation within C. rubella is found within C. grandiflora. We can therefore identify genomic regions where two C. rubella individuals have inherited the same or different segments of ancestral diversity (i.e. founding haplotypes) present in C. rubella’s founder(s). Based on this analysis, we show that C. rubella was founded by multiple individuals drawn from a diverse ancestral population closely related to extant C. grandiflora, that drift and selection have rapidly homogenized most of this ancestral variation since C. rubella’s founding, and that little novel variation has accumulated within this time. Despite the extensive loss of ancestral variation, the approximately 25% of the genome for which two C. rubella individuals have inherited different founding haplotypes makes up roughly 90% of the genetic variation between them. To extend these findings, we develop a coalescent model that utilizes the inferred frequency of founding haplotypes and variation within founding haplotypes to estimate that C. rubella was founded by a potentially large number of individuals 50-100 kya, and has subsequently experienced a 20X reduction in its effective population size. As population genomic data from an increasing number of outcrossing/selfing pairs are generated, analyses like this here will facilitate a fine-scaled view of the evolutionary and demographic impact of the transition to self-fertilization.

Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster

Inference of chromosomal inversion dynamics from Pool-Seq data in natural and laboratory populations of Drosophila melanogaster
Martin Kapun, Hester van Schalkwyk, Bryant McAllister, Thomas Flatt, Christian Schlötterer
(Submitted on 9 Jul 2013)

Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost- effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that for example obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for 7 cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool- Seq data from diverse D. melanogaster populations

The complex hybrid origins of the root knot nematodes revealed through comparative genomics

The complex hybrid origins of the root knot nematodes revealed through comparative genomics
David H Lunt, Sujai Kumar, Georgios Koutsovoulos, Mark L Blaxter
(Submitted on 26 Jun 2013)

Meloidogyne root knot nematodes (RKN) can infect most of the world’s agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by mitotic parthenogenesis and are suggested to originate by interspecific hybridizations between unknown parental taxa. We sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that it was involved in the hybrid origin of the tropical mitotic parthenogen M. incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species M. hapla was used to trace the evolutionary history of these species’ genomes, demonstrating that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and agricultural success may be related to this hybridization, producing transgressive variation on which natural selection acts. Studying RKN variation via individual marker loci may fail due to the species’ convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally.

Genome-wide inference of ancestral recombination graphs

Genome-wide inference of ancestral recombination graphs
Matthew D. Rasmussen, Adam Siepel
(Submitted on 21 Jun 2013)

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of all coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are extremely computationally intensive, depend on fairly crude approximations, or are limited to small numbers of samples. As a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to be applied on the scale of dozens of complete human genomes. The key idea of our approach is to sample an ARG of n chromosomes conditional on an ARG of n-1 chromosomes, an operation we call “threading”. Using techniques based on hidden Markov models, this threading operation can be performed exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated applications of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG, for twenty or more sequences generated under realistic parameters for human populations. We also report initial results from applications of ARGweaver to high-coverage individual human genome sequences from Complete Genomics. Work is in progress on further applications of these methods to genome-wide sequence data.

Conservation of nuclear SSR loci reveals high affinity of Quercus infectoria ssp. veneris A. Kern (Fagaceae) to section Robur

Conservation of nuclear SSR loci reveals high affinity of Quercus infectoria ssp. veneris A. Kern (Fagaceae) to section Robur
Charalambos Neophytou, Aikaterini Dounavi, Filippos A. Aravanopoulos
(Submitted on 21 Jun 2013)

Conservation of 16 nuclear microsatellite loci, originally developed for Quercus macrocarpa (section Albae), Q. petraea, Q. robur (section Robur) and Q. myrsinifolia, (subgenus Cyclobalanopsis) was tested in a Q. infectoria ssp. veneris population from Cyprus. All loci could be amplified successfully and displayed allele size and diversity patterns that match those of oak species belonging to the section Robur. At least in one case, limited amplification and high levels of homozygosity support the occurrence of ‘null alleles’, caused by a possible mutation in the highly conserved primer areas, thus hindering PCR. The sampled population exhibited high levels of diversity despite the very limited distribution of this species in Cyprus and extended population fragmentation. Allele sizes of Q. infectoria at locus QpZAG9 partially match those of Q. alnifolia and Q. coccifera from neighboring populations. However, sequencing showed homoplasy, excluding a case of interspecific introgression with the latter, phylogenetically remote species. Q. infectoria ssp. veneris sequences at this locus were concordant to those of other species of section Robur, while sequences of Quercus alnifolia and Quercus coccifera were almost identical to Q. cerris.

Reconstructing Native American Migrations from Whole-genome and Whole-exome Data

Reconstructing Native American Migrations from Whole-genome and Whole-exome Data
Simon Gravel, Fouad Zakharia, Jake K Byrnes, Marina Muzzio, Andres Moreno-Estrada, Juan L. Rodriguez-Flores, Eimear E. Kenny, Christopher R. Gignoux, Brian K. Maples, Wilfried Guiblet, Julie Dutil, Karla Sandoval, Gabriel Bedoya, The 1000 Genomes Project, Taras K Oleksyk, Andres Ruiz-Linares, Esteban G Burchard, Juan Carlos Martinez-Cruzado, Carlos D. Bustamante
(Submitted on 17 Jun 2013)

There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR appears most closely related to Equatorial-Tucanoan-speaking populations, supporting a Southern America ancestry of the Taino people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a three-population demographic model. The ancestral populations to the three groups likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features a Mexican population of 62,000, a Colombian population of 8,700, and a Puerto Rican population of 1,900. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest size and the earlier migration from Europe.