Population subdivision with migration can facilitate evolution on rugged fitness landscapes

Population subdivision with migration can facilitate evolution on rugged fitness landscapes
Anne-Florence Bitbol, David J. Schwab
(Submitted on 1 Aug 2013)

We show that subdivision of an asexual population into demes connected by migration significantly accelerates the crossing of fitness valleys and plateaus over a wide parameter range, both with respect to the non-subdivided population and with respect to a single deme. We predict the existence of a parameter range where valley or plateau crossing by the metapopulation is as fast as that of the fastest deme, and we verify this prediction using stochastic simulations. Finally, we extend our work to the case of a large population connected by migration to one or several smaller islands.

Distortion of genealogical properties when the sample is very large

Distortion of genealogical properties when the sample is very large
Anand Bhaskar, Andrew G. Clark, Yun S. Song
(Submitted on 1 Aug 2013)

Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands if not millions of individuals. In addition to posing computational challenges, such large sample sizes call for carefully re-examining the theoretical foundation underlying commonly-used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For realistic demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the tradeoff between accuracy and computational efficiency, we propose a hybrid algorithm that utilizes the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.

Maximum likelihood evidence for Neandertal admixture in Eurasian populations from three genomes

Maximum likelihood evidence for Neandertal admixture in Eurasian populations from three genomes
Konrad Lohse, Laurent A.F. Frantz
(Submitted on 31 Jul 2013)

Although there has been much interest in estimating divergence and admixture from genomic data, it has proven difficult to distinguish gene flow after divergence from alternative histories involving structure in the ancestral population. The lack of a formal test to distinguish these scenarios has sparked recent controversy about the possibility of interbreeding between Neandertals and modern humans in Eurasia. We derive the probability of mutational configurations in non-recombining sequence blocks under alternative histories of divergence with admixture and ancestral structure. Dividing the genome into short blocks makes it possible to compute maximum likelihood estimates of parameters under both models. We apply this method to triplets of human Neandertal genomes and quantify the relative support for models of long-term population structure in the ancestral African popuation and admixture from Neandertals into Eurasian populations after their expansion out of Africa. Our analysis allows us — for the first time — to formally reject a history of ancestral population structure and instead reveals strong support for admixture from Neandertals into Eurasian populations at a higher rate (3.4%-7.9%) than suggested previously.

Late-replicating CNVs as a source of new genes

Late-replicating CNVs as a source of new genes
David Juan, Daniel Rico, Tomas Marques-Bonet, Oscar Fernandez-Capetillo, Alfonso Valencia
(Submitted on 31 Jul 2013)

Asynchronous replication of the genome has been associated with different rates of point mutation and copy number variation (CNV) in human populations. Here, we explored if the bias in the generation of CNV that is associated to DNA replication timing might have conditioned the birth of new protein-coding genes during evolution. We show that genes that were duplicated during primate evolution are more commonly found among the human genes located in late-replicating CNV regions. We traced the relationship between replication timing and the evolutionary age of duplicated genes. Strikingly, we found that there is a significant enrichment of evolutionary younger duplicates in late replicating regions of the human and mouse genome. Indeed, the presence of duplicates in late replicating regions gradually decreases as the evolutionary time since duplication extends. Our results suggest that the accumulation of recent duplications in late replicating CNV regions is an active process influencing genome evolution.

Most viewed on Haldane’s Sieve: July 2013

The most viewed preprints in July 2013 were:

SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching


SlopMap: a software application tool for quick and flexible identification of similar sequences using exact k-mer matching

Ilya Y. Zhbannikov, Samuel S. Hunter, Matthew L. Settles, James A. Foster
(Submitted on 31 Jul 2013)

With the advent of Next-Generation (NG) sequencing, it has become possible to sequence an entire genome quickly and inexpensively. However, in some experiments one only needs to extract and assembly a portion of the sequence reads, for example when performing transcriptome studies, sequencing mitochondrial genomes, or characterizing exomes. With the raw DNA-library of a complete genome it would appear to be a trivial problem to identify reads of interest. But it is not always easy to incorporate well-known tools such as BLAST, BLAT, Bowtie, and SOAP directly into a bioinformatics pipelines before the assembly stage, either due to in- compatibility with the assembler’s file inputs, or because it is desirable to incorporate information that must be extracted separately. For example, in order to incorporate flowgrams from a Roche 454 sequencer into the Newbler assembler it is necessary to first extract them from the original SFF files. We present SlopMap, a bioinformatics software utility which allows rapid identification similar to provided target sequences from either Roche 454 or Illumnia DNA library. With a simple and intuitive command- line interface along with file output formats compatible with assembly programs, SlopMap can be directly embedded in biological data processing pipeline without any additional programming work. In addition, SlopMap preserves flowgram information needed for Roche 454 assembler.

The missing heritability revealed in Arabidopsis thaliana

The missing heritability revealed in Arabidopsis thaliana
Xia Shen
(Submitted on 30 Jul 2013)

Although high-throughput genomic data are widely available, a large proportion of the narrow sense heritability of many complex traits have not been successfully uncovered. In this study, focusing on phenotype prediction, I show that by properly selecting a small number of loci, a significant amount of missing heritability can be revealed. The results provide new insights into the missing heritability problem and the underlying genetic architecture of complex traits.

The Population Genetic Signature of Polygenic Local Adaptation

The Population Genetic Signature of Polygenic Local Adaptation
Jeremy J. Berg, Graham Coop
(Submitted on 29 Jul 2013)

Adaptation in response to selection on polygenic phenotypes occurs via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that have undergone local adaptation. Using GWAS data, we estimate the mean additive genetic value for a give phenotype across many populations as simple weighted sums of allele frequencies. We model the expected differentiation of GWAS loci among populations under neutrality to develop simple tests of selection across an arbitrary number of populations with arbitrary population structure. To find support for the role of specific environmental variables in local adaptation we test for correlations with the estimated genetic values. We also develop a general test of local adaptation to identify overdispersion of the estimated genetic values values among populations. This test is a natural generalization of QST /FST comparisons based on GWAS predictions. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles. We apply our tests to the human genome diversity panel dataset using GWAS data for six different traits. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.

A path integral formulation of the Wright-Fisher process with genic selection

A path integral formulation of the Wright-Fisher process with genic selection
Joshua G. Schraiber
(Submitted on 29 Jul 2013)

The Wright-Fisher process with selection is an important tool in population genetics theory. Traditional analysis of this process relies on the diffusion approximation. The diffusion approximation is usually studied in a partial differential equations framework. In this paper, I introduce a path integral formalism to study the Wright-Fisher process with selection and use that formalism to obtain a simple perturbation series to approximate the transition density. The perturbation series can be understood in terms of Feynman diagrams, which have a simple probabilistic interpretation in terms of selective events. The perturbation series proves to be an accurate approximation of the transition density for weak selection and is shown to be arbitrarily accurate for any selection coefficient.

Ancient west Eurasian ancestry in southern and eastern Africa

Ancient west Eurasian ancestry in southern and eastern Africa
Joseph K. Pickrell, Nick Patterson, Po-Ru Loh, Mark Lipson, Bonnie Berger, Mark Stoneking, Brigitte Pakendorf, David Reich
(Submitted on 30 Jul 2013)

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter-gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger-Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to approximately 900-1,800 years ago, and show that it had the largest demographic impact in Khoisan populations that speak Khoe-Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to approximately 2,700 – 3,300 years ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa, and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa.