Efficient moment-based inference of admixture parameters and sources of gene flow
Mark Lipson, Po-Ru Loh, Alex Levin, David Reich, Nick Patterson, Bonnie Berger
(Submitted on 11 Dec 2012)
The recent explosion in available genetic data has led to significant advances in understanding the demographic histories of and relationships among human populations. It is still a challenge, however, to infer reliable parameter values for complicated models involving many populations. Here we present MixMapper, an efficient, interactive method for constructing phylogenetic trees including admixture events using single nucleotide polymorphism (SNP) genotype data. MixMapper implements a novel two-phase approach to admixture inference using moment statistics, first building an unadmixed scaffold tree and then adding admixed populations by solving systems of equations that express allele frequency divergences in terms of mixture parameters. Importantly, all features of the tree, including topology, sources of gene flow, branch lengths, and mixture proportions, are optimized automatically from the data and include estimates of statistical uncertainty. MixMapper also uses a new method to express branch lengths in easily interpretable drift units. We apply MixMapper to recently published data for HGDP individuals genotyped on a SNP array designed especially for use in population genetics studies, obtaining confident results for 30 populations, 20 of them admixed. Notably, we confirm a signal of ancient admixture in European populations—including previously undetected admixture in Sardinians and Basques—involving a proportion of 20-40% ancient northern Eurasian ancestry.
Detection of selective sweeps in cattle using genome-wide SNP data
Holly R. Ramey, Jared E. Decker, Stephanie D. McKay, Megan M. Rolf, Robert D. Schnabel, Jeremy F. Taylor
(Submitted on 11 Dec 2012)
The domestication and subsequent selection by humans to create breeds of cattle undoubtedly altered the patterning of variation within their genomes. Strong selection to fix advantageous large-effect mutations underlying domesticability, breed characteristics or productivity created selective sweeps in which variation was lost in the chromosomal region flanking the selected allele. Selective sweeps have been identified in the genomes of many species including humans, dogs, horses, and chickens. We attempt to identify regions of the bovine genome that have been subjected to selective sweeps. Two datasets were used for the discovery and validation of selective sweeps via the fixation of alleles at a series of contiguous SNP loci. BovineSNP50 data were used to identify 28 putative sweep regions among 14 cattle breeds. Affymetrix BOS 1 prescreening assay data for five breeds were used to identify 114 regions and validate 5 regions identified using the BovineSNP50 data. Many genes are located within these regions; however, phenotypes that we predict to have historically been under strong selection include horned-polled, coat color, stature, ear morphology, and behavior. The identified selective sweeps represent recent events associated with breed formation rather than ancient events associated with domestication. No sweep regions were shared between indicine and taurine breeds reflecting their divergent selection histories. A primary finding of this study is the sensitivity of results to assay resolution. Despite the bias towards common SNPs in the BovineSNP50 design, false positive sweep regions appear to be common due to the limited resolution of the assay. This assay design bias leads to the detection of breed-specific sweep regions, or regions shared by a small number of breeds, restricting the suite of selected phenotypes detected to primarily those associated with breed characteristics.
This guest post is by Pleuni Pennings on the paper “Oh sister, where art thou? Indirect fitness benefit could maintain a host defense trait”, available from the arXiv here. This is cross-posted from her website here
Tobias Pamminger, Susanne Foitzik, Dirk Metzler and I analyzed the small scale spatial structure of ants of the species Temnothorax longispinosus. These ants are the host of a slavemaking ant. The slavemakers go on raids, and steal young from the host species to work as slaves in their nests. We wanted to know whether the slaves still have relatives in the nearby nests. If they do, then their behavior – which influences the slavemakers – could have an effect on their relatives and therefore on their indirect fitness.
To find out if slaves are related to their neighbours, we collected lots of ant nests (they nest in acorns), both in New York and in West Virginia, marked exactly where we found them and genotyped them at six microsatellites.
Photograph by Andreas Gros
Temnothorax longispinosus in acorn
We put little flags at the exact location of an ant nest to measure the distances between the nests.
This is one of the figures from the manuscript. Plot R (from West Virginia) is is shown to demonstrate the distribution of colonies within a plot and to show the distribution of alleles of one of the six microsatellite loci (GT1) among colonies. Each colony is represented by a pie-diagram with the frequencies of different GT1 alleles amongst the genotyped individuals of the colony. R3 is a slavemaker nest (we genotyped the slaves, not the slavemakers) and shares most of its alleles with the free nest R7. R13 and R15 are free living host colonies in close proximity and appear to be related.
Our main conclusion is that the enslaved ants are indeed related to their neighbors. The manuscript can be found on the arXiv here: http://arxiv.org/abs/1212.0790
The manuscript was peer-reviewed at Peerage of Science, a new and very useful community of scientists who agree to review each others papers fairly. See http://www.peerageofscience.org/
The manuscript is part of Tobias Pamminger’s PhD thesis. Tobias defends his thesis this week in Mainz!! Congrats Tobias!
Tobias came up with the awesome title for the paper “Oh sister, where art thou? Indirect fitness benefit could maintain a host defense trait.”
Age of an allele and gene genealogies of nested subsamples for populations admitting large offspring numbers
(Submitted on 8 Dec 2012)
Coalescent processes, including mutation, are derived from Moran type population models admitting large offspring numbers. Including mutation in the coalescent process allows for quantifying the turnover of alleles by computing the distribution of the number of original alleles still segregating in the population at a given time in the past. The turnover of alleles is considered for specific classes of the Moran model admitting large offspring numbers. Versions of the Kingman coalescent are also derived whose rates are functions of the mean and variance of the offspring distribution. High variance in the offspring distribution results in higher turnover and younger age of alleles than predicted by the usual Kingman coalescent.
Fast Algorithms for Reconciliation under Hybridization and Incomplete Lineage Sorting
Yun Yu, Luay Nakhleh
(Submitted on 9 Dec 2012)
Reconciling a gene tree with a species tree is an important task that reveals much about the evolution of genes, genomes, and species, as well as about the molecular function of genes. A wide array of computational tools have been devised for this task under certain evolutionary events such as hybridization, gene duplication/loss, or incomplete lineage sorting. Work on reconciling gene tree with species phylogenies under two or more of these events have also begun to emerge. Our group recently devised both parsimony and probabilistic frameworks for reconciling a gene tree with a phylogenetic network, thus allowing for the detection of hybridization in the presence of incomplete lineage sorting. While the frameworks were general and could handle any topology, they are computationally intensive, rendering their application to large datasets infeasible. In this paper, we present two novel approaches to address the computational challenges of the two frameworks that are based on the concept of ancestral configurations. Our approaches still compute exact solutions while improving the computational time by up to five orders of magnitude. These substantial gains in speed scale the applicability of these unified reconciliation frameworks to much larger data sets. We discuss how the topological features of the gene tree and phylogenetic network may affect the performance of the new algorithms. We have implemented the algorithms in our PhyloNet software package, which is publicly available in open source.
Reconstructing Roma history from genome-wide data
Priya Moorjani, Nick Patterson, Po-Ru Loh, Mark Lipson, Péter Kisfali, Bela I Melegh, Michael Bonin, Ľudevít Kádaši, Olaf Rieß, Bonnie Berger, David Reich, Béla Melegh
(Submitted on 7 Dec 2012)
The Roma people, living throughout Europe, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1000-1500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry-deriving from a combination of European and South Asian sources- and that the date of admixture of South Asian and European ancestry was about 850 years ago. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which we hypothesize was followed by a major demographic expansion once the population arrived in Europe.
Oh sister, where art thou? Indirect fitness benefit could maintain a host defense trait
Tobias Pamminger, Susanne Foitzik, Dirk Metzler, Pleuni S. Pennings
(Submitted on 4 Dec 2012)
Population structure can affect the evolution of parasite virulence and host defense, a hypothesis that has been confirmed by studies focusing on large spatial scales. In contrast, we examine the small scale population structure of a host species and investigate whether it could explain the evolution of a defense trait against slavemaking ants. Slavemaking ants steal worker brood from host colonies, which will later serve as slaves to rear parasite offspring. The host species Temnothorax longispinosus has evolved an effective post-enslavement defense mechanism; instead of taking care of the slavemaker young, these slaves kill a high proportion of the parasite offspring. Because slaves never reproduce, they were thought to be trapped in an evolutionary dead end without the possibility of evolving such defense traits. Using detailed microsatellite data on a small spatial scale we can demonstrate that slaves can gain indirect fitness benefits by reducing parasite pressure on nearby host colonies, because these are often closely related to the slaves. Our genetic analyses indicate that polydomy, i.e., the occupation of several nest sites by a single colony, is sufficient to explain the elevated relatedness values between slaves and the surrounding host colonies, which may benefit from the slaves’ rebellion behavior.
GWAPP: A Web Application for Genome-wide Association Mapping in A. thaliana
Ümit Seren (1), Bjarni J. Vilhjálmssona (1 and 2), Matthew W. Horton (1 and 3), Dazhe Meng (4), Petar Forai (1), Yu S. Huang (4), Quan Long (1), Vincent Segura (5), Magnus Nordborg (1 and 2) ((1) Gregor Mendel, Institute Austrian Academy of Sciences, (2) Molecular and Computational Biology, University of Southern California, (3) Department of Ecology and Evolution, University of Chicago, (4) Center for Neurobehavioral Genetics, Semel Institute, University of California Los Angeles, (5) INRA, France)
(Submitted on 4 Dec 2012)
Arabidopsis thaliana is an important model organism for understanding the genetics and molecular biology of plants. Its highly selfing nature, together with other important features, such as small size, short generation time, small genome size, and wide geographic distribution, make it an ideal model organism for understanding natural variation. Genome-wide association studies (GWAS) have proven a useful technique for identifying genetic loci responsible for natural variation in A. thaliana. Previously genotyped accessions (natural inbred lines) can be grown in replicate under different conditions, and phenotyped for different traits. These important features greatly simplify association mapping of traits and allow for systematic dissection of the genetics of natural variation by the entire Arabidopsis community. To facilitate this, we present GWAPP, an interactive web-based application for conducting GWAS in A. thaliana. Using an efficient Python implementation of a linear mixed model, traits measured for a subset of 1386 publicly available ecotypes can be uploaded and mapped with an efficient mixed model and other methods in just a couple of minutes. GWAPP features an extensive, interactive, and a user-friendly interface that includes interactive manhattan plots and interactive local and genome-wide LD plots. It facilitates exploratory data analysis by implementing features such as the inclusion of candidate SNPs in the model as cofactors.
Deep-sequencing of the Peach Latent Mosaic Viroid Reveals New Aspects of Population Heterogeneity
Jean-Pierre Sehi Glouzon, François Bolduc, Rafael Najmanovich, Shengrui Wang, Jean-Pierre Perreault
(Submitted on 3 Dec 2012)
Viroids are small circular single-stranded infectious RNAs that are characterized by a relatively high mutation level. Knowledge of their sequence heterogeneity remains largely elusive, and, as yet, no strategy attempting to address this question from a population dynamics point of view is in place. In order to address these important questions, a GF305 indicator peach tree was infected with a single variant of the Avsunviroidae family member Peach latent mosaic viroid (PLMVd). Six months post-inoculation, full-length circular conformers of PLMVd were isolated, deep-sequenced and the resulting sequences analyzed using an original bioinformatics scheme specifically designed and developed in order to evaluate the richness of a given the sequence’s population. Two distinct libraries were analyzed, and yielded 1125 and 1061 different PLMVd variants respectively, making this study the most productive to date (by more than an order of magnitude) in terms of the reporting of novel viroid sequences. Sequence variants exhibiting up to ~20% of mutations relative to the inoculated viroid were retrieved, clearly illustrating the high divergence dynamic inside a unique population. Using a novel hierarchical clustering algorithm, the different variants obtained were grouped into either 7 or 8 clusters depending on the library being analyzed. Most of the sequences contained, on average, between 4.6 and 6.3 mutations relative to the variant used initially to inoculate the plant. Interestingly, it was possible to reconstitute the sequence evolution between these clusters. On top of providing a reliable pipeline for the treatment of viroid deep-sequencing, this study sheds new light on the importance of the sequence variation that may take place in a viroid population and which may result in the formation of a quasi-species.
Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets
Adina Chuang Howe, Jason Pell, Rosangela Canino-Koning, Rachel Mackelprang, Susannah Tringe, Janet Jansson, James M. Tiedje, C. Titus Brown
(Submitted on 1 Dec 2012)
Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real metagenome. These sequences show position-specific bias in shotgun reads, suggestive of sequencing artifacts, and are only minimally incorporated into contigs by assembly. The removal of these sequences prior to assembly results in similar assembly content for most metagenomes and enables the use of graph partitioning to decrease assembly memory and time requirements.