Efficient moment-based inference of admixture parameters and sources of gene flow

Efficient moment-based inference of admixture parameters and sources of gene flow
Mark Lipson, Po-Ru Loh, Alex Levin, David Reich, Nick Patterson, Bonnie Berger
(Submitted on 11 Dec 2012)

The recent explosion in available genetic data has led to significant advances in understanding the demographic histories of and relationships among human populations. It is still a challenge, however, to infer reliable parameter values for complicated models involving many populations. Here we present MixMapper, an efficient, interactive method for constructing phylogenetic trees including admixture events using single nucleotide polymorphism (SNP) genotype data. MixMapper implements a novel two-phase approach to admixture inference using moment statistics, first building an unadmixed scaffold tree and then adding admixed populations by solving systems of equations that express allele frequency divergences in terms of mixture parameters. Importantly, all features of the tree, including topology, sources of gene flow, branch lengths, and mixture proportions, are optimized automatically from the data and include estimates of statistical uncertainty. MixMapper also uses a new method to express branch lengths in easily interpretable drift units. We apply MixMapper to recently published data for HGDP individuals genotyped on a SNP array designed especially for use in population genetics studies, obtaining confident results for 30 populations, 20 of them admixed. Notably, we confirm a signal of ancient admixture in European populations—including previously undetected admixture in Sardinians and Basques—involving a proportion of 20-40% ancient northern Eurasian ancestry.

Our paper: Oh sister, where art thou? Indirect fitness benefit could maintain a host defense trait

This guest post is by Pleuni Pennings on the paper “Oh sister, where art thou? Indirect fitness benefit could maintain a host defense trait”, available from the arXiv here. This is cross-posted from her website here

Tobias Pamminger, Susanne Foitzik, Dirk Metzler and I analyzed the small scale spatial structure of ants of the species Temnothorax longispinosus. These ants are the host of a slavemaking ant. The slavemakers go on raids, and steal young from the host species to work as slaves in their nests. We wanted to know whether the slaves still have relatives in the nearby nests. If they do, then their behavior – which influences the slavemakers – could have an effect on their relatives and therefore on their indirect fitness.

To find out if slaves are related to their neighbours, we collected lots of ant nests (they nest in acorns), both in New York and in West Virginia, marked exactly where we found them and genotyped them at six microsatellites.

Ants in acorn

Photograph by Andreas Gros
Temnothorax longispinosus in acorn

US2009 132

We put little flags at the exact location of an ant nest to measure the distances between the nests.

Microsat Data

This is one of the figures from the manuscript. Plot R (from West Virginia) is is shown to demonstrate the distribution of colonies within a plot and to show the distribution of alleles of one of the six microsatellite loci (GT1) among colonies. Each colony is represented by a pie-diagram with the frequencies of different GT1 alleles amongst the genotyped individuals of the colony. R3 is a slavemaker nest (we genotyped the slaves, not the slavemakers) and shares most of its alleles with the free nest R7. R13 and R15 are free living host colonies in close proximity and appear to be related.

Our main conclusion is that the enslaved ants are indeed related to their neighbors. The manuscript can be found on the arXiv here: http://arxiv.org/abs/1212.0790

The manuscript was peer-reviewed at Peerage of Science, a new and very useful community of scientists who agree to review each others papers fairly. See http://www.peerageofscience.org/

The manuscript is part of Tobias Pamminger’s PhD thesis. Tobias defends his thesis this week in Mainz!! Congrats Tobias!

Tobias came up with the awesome title for the paper “Oh sister, where art thou? Indirect fitness benefit could maintain a host defense trait.”

Age of an allele and gene genealogies of nested subsamples for populations admitting large offspring numbers

Age of an allele and gene genealogies of nested subsamples for populations admitting large offspring numbers
Bjarki Eldon
(Submitted on 8 Dec 2012)

Coalescent processes, including mutation, are derived from Moran type population models admitting large offspring numbers. Including mutation in the coalescent process allows for quantifying the turnover of alleles by computing the distribution of the number of original alleles still segregating in the population at a given time in the past. The turnover of alleles is considered for specific classes of the Moran model admitting large offspring numbers. Versions of the Kingman coalescent are also derived whose rates are functions of the mean and variance of the offspring distribution. High variance in the offspring distribution results in higher turnover and younger age of alleles than predicted by the usual Kingman coalescent.

Fast Algorithms for Reconciliation under Hybridization and Incomplete Lineage Sorting

Fast Algorithms for Reconciliation under Hybridization and Incomplete Lineage Sorting
Yun Yu, Luay Nakhleh
(Submitted on 9 Dec 2012)

Reconciling a gene tree with a species tree is an important task that reveals much about the evolution of genes, genomes, and species, as well as about the molecular function of genes. A wide array of computational tools have been devised for this task under certain evolutionary events such as hybridization, gene duplication/loss, or incomplete lineage sorting. Work on reconciling gene tree with species phylogenies under two or more of these events have also begun to emerge. Our group recently devised both parsimony and probabilistic frameworks for reconciling a gene tree with a phylogenetic network, thus allowing for the detection of hybridization in the presence of incomplete lineage sorting. While the frameworks were general and could handle any topology, they are computationally intensive, rendering their application to large datasets infeasible. In this paper, we present two novel approaches to address the computational challenges of the two frameworks that are based on the concept of ancestral configurations. Our approaches still compute exact solutions while improving the computational time by up to five orders of magnitude. These substantial gains in speed scale the applicability of these unified reconciliation frameworks to much larger data sets. We discuss how the topological features of the gene tree and phylogenetic network may affect the performance of the new algorithms. We have implemented the algorithms in our PhyloNet software package, which is publicly available in open source.

Reconstructing Roma history from genome-wide data

Reconstructing Roma history from genome-wide data

Priya Moorjani, Nick Patterson, Po-Ru Loh, Mark Lipson, Péter Kisfali, Bela I Melegh, Michael Bonin, Ľudevít Kádaši, Olaf Rieß, Bonnie Berger, David Reich, Béla Melegh
(Submitted on 7 Dec 2012)

The Roma people, living throughout Europe, are a diverse population linked by the Romani language and culture. Previous linguistic and genetic studies have suggested that the Roma migrated into Europe from South Asia about 1000-1500 years ago. Genetic inferences about Roma history have mostly focused on the Y chromosome and mitochondrial DNA. To explore what additional information can be learned from genome-wide data, we analyzed data from six Roma groups that we genotyped at hundreds of thousands of single nucleotide polymorphisms (SNPs). We estimate that the Roma harbor about 80% West Eurasian ancestry-deriving from a combination of European and South Asian sources- and that the date of admixture of South Asian and European ancestry was about 850 years ago. We provide evidence for Eastern Europe being a major source of European ancestry, and North-west India being a major source of the South Asian ancestry in the Roma. By computing allele sharing as a measure of linkage disequilibrium, we estimate that the migration of Roma out of the Indian subcontinent was accompanied by a severe founder event, which we hypothesize was followed by a major demographic expansion once the population arrived in Europe.

Deep-sequencing of the Peach Latent Mosaic Viroid Reveals New Aspects of Population Heterogeneity

Deep-sequencing of the Peach Latent Mosaic Viroid Reveals New Aspects of Population Heterogeneity
Jean-Pierre Sehi Glouzon, François Bolduc, Rafael Najmanovich, Shengrui Wang, Jean-Pierre Perreault
(Submitted on 3 Dec 2012)

Viroids are small circular single-stranded infectious RNAs that are characterized by a relatively high mutation level. Knowledge of their sequence heterogeneity remains largely elusive, and, as yet, no strategy attempting to address this question from a population dynamics point of view is in place. In order to address these important questions, a GF305 indicator peach tree was infected with a single variant of the Avsunviroidae family member Peach latent mosaic viroid (PLMVd). Six months post-inoculation, full-length circular conformers of PLMVd were isolated, deep-sequenced and the resulting sequences analyzed using an original bioinformatics scheme specifically designed and developed in order to evaluate the richness of a given the sequence’s population. Two distinct libraries were analyzed, and yielded 1125 and 1061 different PLMVd variants respectively, making this study the most productive to date (by more than an order of magnitude) in terms of the reporting of novel viroid sequences. Sequence variants exhibiting up to ~20% of mutations relative to the inoculated viroid were retrieved, clearly illustrating the high divergence dynamic inside a unique population. Using a novel hierarchical clustering algorithm, the different variants obtained were grouped into either 7 or 8 clusters depending on the library being analyzed. Most of the sequences contained, on average, between 4.6 and 6.3 mutations relative to the variant used initially to inoculate the plant. Interestingly, it was possible to reconstitute the sequence evolution between these clusters. On top of providing a reliable pipeline for the treatment of viroid deep-sequencing, this study sheds new light on the importance of the sequence variation that may take place in a viroid population and which may result in the formation of a quasi-species.

Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets

Illumina Sequencing Artifacts Revealed by Connectivity Analysis of Metagenomic Datasets
Adina Chuang Howe, Jason Pell, Rosangela Canino-Koning, Rachel Mackelprang, Susannah Tringe, Janet Jansson, James M. Tiedje, C. Titus Brown
(Submitted on 1 Dec 2012)

Sequencing errors and biases in metagenomic datasets affect coverage-based assemblies and are often ignored during analysis. Here, we analyze read connectivity in metagenomes and identify the presence of problematic and likely a-biological connectivity within metagenome assembly graphs. Specifically, we identify highly connected sequences which join a large proportion of reads within each real metagenome. These sequences show position-specific bias in shotgun reads, suggestive of sequencing artifacts, and are only minimally incorporated into contigs by assembly. The removal of these sequences prior to assembly results in similar assembly content for most metagenomes and enables the use of graph partitioning to decrease assembly memory and time requirements.

ZRT1 harbors an excess of nonsynonymous polymorphism and shows evidence of balancing selection in Saccharomyces cerevisiae

ZRT1 harbors an excess of nonsynonymous polymorphism and shows evidence of balancing selection in Saccharomyces cerevisiae
Elizabeth K. Engle, Justin C. Fay
(Submitted on 1 Dec 2012)

Estimates of the fraction of nucleotide substitutions driven by positive selection vary widely across different species. Accounting for different estimates of positive selection has been difficult, in part because selection on polymorphism within a species is known to obscure a signal of positive selection between species. While methods have been developed to control for the confounding effects of negative selection against deleterious polymorphism, the impact of balancing selection on estimates of positive selection has not been assessed. In Saccharomyces cerevisiae, there is no signal of positive selection within protein coding sequences as the ratio of nonsynonymous to synonymous polymorphism is higher than that of divergence. To investigate the impact of balancing selection on estimates of positive selection we examined five genes with high rates of nonsynonymous polymorphism in S. cerevisiae relative to divergence from S. paradoxus. One of the genes, a high affinity zinc transporter ZRT1, shows an elevated rate of synonymous polymorphism indicative of balancing selection. The high rate of synonymous polymorphism coincides with nonsynonymous divergence between three haplotype groups, which we find to be functionally indistinguishable. We conclude that balancing selection is not likely to be a common cause of genes harboring a large excess of nonsynonymous polymorphism in yeast.

Most viewed on Haldane’s Sieve: November 2012

The most viewed preprints on Haldane’s Sieve in November 2012 were:

Membrane environment imposes unique selection pressures on transmembrane domains of G protein-coupled receptors

Membrane environment imposes unique selection pressures on transmembrane domains of G protein-coupled receptors
Stephanie J. Spielman, Claus O. Wilke
(Submitted on 25 Nov 2012)

We have investigated the influence of the plasma membrane environment on the molecular evolution of G protein-coupled receptors (GPCRs), the largest receptor family in Metazoa. In particular, we have analyzed the site-specific rate variation across the two primary structural partitions, transmembrane (TM) and extramembrane (EM), of these membrane proteins. We find that transmembrane domains evolve more slowly than do extramembrane domains, though TM domains display increased rate heterogeneity relative to their EM counterparts. Although the majority of residues across GPCRs experience strong to weak purifying selection, many GPCRs experience positive selection at both TM and EM residues, albeit with a slight bias towards the EM. Further, a subset of GPCRs, chemosensory receptors (including olfactory and taste receptors), exhibit increased rates of evolution relative to other GPCRs, an effect which is more pronounced in their TM spans. Although it has been previously suggested that the TM’s low evolutionary rate is caused by their high percentage of buried residues, we show that their attenuated rate seems to stem from the strong biophysical constraints of the membrane itself, or by functional requirements.