An efficient group test for genetic markers that handles confounding.

An efficient group test for genetic markers that handles confounding. (arXiv:1205.0793v1 [q-bio.GN])
by Jennifer Listgarten, Christoph Lippert, David Heckerman

Approaches for testing groups of variants for association with complex traits are becoming critical. Examples of groups typically include a set of rare or common variants within a gene, but could also be variants within a pathway or any other set. These tests are critical for aggregation of weak signal within a group, allow interplay among variants to be captured, and also reduce the problem of multiple hypothesis testing. Unfortunately, these approaches do not address confounding by, for example, family relatedness and population structure, a problem that is becoming more important as larger data sets are used to increase power. We introduce a new approach for group tests that can handle confounding, based on Bayesian linear regression, which is equivalent to the linear mixed model. The approach uses two sets of covariates (equivalently, two random effects), one to capture the group association signal and one to capture confounding. We also introduce a computational speedup for the two-random-effects model that makes this approach feasible even for extremely large cohorts, whereas it otherwise would not be. Application of our approach to richly structured GAW14 data, comprising over eight ethnicities and many related family members, demonstrates that our method successfully corrects for population structure, while application of our method to WTCCC Crohn’s disease and hypertension data demonstrates that our method recovers genes not recoverable by univariate analysis, while still correcting for confounding structure.

Landscape genomic tests for associations between loci and environmental gradients

Landscape genomic tests for associations between loci and environmental gradients
Eric Frichot (1), Sean Schoville (1), Guillaume Bouchard (2), Olivier François (1) ((1) UJF, CNRS, TIMC-IMAG, FRANCE, (2) Xerox Research Center Europe, France)
(Submitted on 15 May 2012)

Adaptation to local environments often occurs through natural selection acting on large number of alleles, each having a weak phenotypic effect. One way to detect those alleles is by identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures. Here we proposed an integrated framework based on population genetics, ecological modeling and machine learning techniques for screening genomes for signatures of local adaptation. We implemented fast algorithms using a hierarchical Bayesian mixed model based on a variant of principal component analysis in which residual population structure is introduced via unobserved or latent factors. Our algorithms can detect correlations between environmental and genetic variation at the same time as they infer the background levels of population structure. We provided evidence that latent factor models efficiently estimated random effects due to population history and isolation-by-distance mechanisms when computing gene-environment correlations, and that they decreased the number of false-positive associations in genome scans for selection. We applied these models to plant and human genetic data and we detected several genes with functions related to multicellular organ development exhibiting unusual correlations with climatic gradients.

Emergence of clones in sexual populations

Emergence of clones in sexual populations
Richard A. Neher, Marija Vucelja, Marc Mézard, Boris I. Shraiman
(Submitted on 9 May 2012 (v1), last revised 21 Jul 2012 (this version, v2))

In sexual population, recombination reshuffles genetic variation and produces novel combinations of existing alleles, while selection amplifies the fittest genotypes in the population. If recombination is more rapid than selection, populations consist of a diverse mixture of many genotypes, as is observed in many populations. In the opposite regime, which is realized for example in the facultatively sexual populations that outcross in only a fraction of reproductive cycles, selection can amplify individual genotypes into large clones. Such clones emerge when the fitness advantage of some of the genotypes is large enough that they grow to a significant fraction of the population despite being broken down by recombination. The occurrence of this “clonal condensation” depends, in addition to the outcrossing rate, on the heritability of fitness. Clonal condensation leads to a strong genetic heterogeneity of the population which is not adequately described by traditional population genetics measures, such as Linkage Disequilibrium. Here we point out the similarity between clonal condensation and the freezing transition in the Random Energy Model of spin glasses. Guided by this analogy we explicitly calculate the probability, Y, that two individuals are genetically identical as a function of the key parameters of the model. While Y is the analog of the spin-glass order parameter, it is also closely related to rate of coalescence in population genetics: Two individuals that are part of the same clone have a recent common ancestor.

Welcome to Haldane’s sieve

The ease of communication facilitated by the Internet has dramatically affected the process of scientific communication in many fields. Most notably, many physics, math, and economics communities have adopted a system in which new research papers are immediately distributed throughout the world prior to formal evaluation in the form of peer review. This system allows for rapid distribution of “bleeding edge” results among all the experts in a field, allowing them to see and build upon the most recent advances.

This practice has historically been uncommon in biology, where instead results are generally made available to the community (including many people qualified to judge them) only after a delay of generally around six months to a year, during which a paper is reviewed, formatted, and published. We believe this is unfortunate. However, there is growing pressure in some parts of biology (in particular our fields of evolutionary and population genetics) to follow physics and math in posting papers to preprint servers ahead of formal publication.

Some authors have a variety of reasonable concerns about posting their papers to preprint servers. In particular, one worry is that, in a morass of online content, their work will not reach the relevant audience. Others see no benefit in posting their papers prior to review if they will not receive useful feedback. The goal of Haldane’s Sieve is to partially remedy these issues. We aim to provide a simple feed of preprints in the fields of evolutionary and population genetics (though we may later expand to other fields). Thus, instead of checking arXiv, PeerJ, or Figshare for relevant preprints, readers in these fields could simply check Haldane’s Sieve.

What to expect

As described above, most posts to Haldane’s Sieve will be basic descriptions of relevant preprints, with little to no commentary. All posts will have comment sections where discussion of the papers will be welcome. A second type of post will be detailed comments on a preprint of particular interest to a contributor. These posts could take the style of a journal review, or may simply be some brief comments. We hope they will provide useful feedback to the authors of the preprint. Finally, there will be posts by authors of preprints in which they describe their work and place it in broader context.

We ask the commenters to remember that by submitting articles to preprint servers the authors (often biologists) are taking a somewhat unusual step. Therefore, comments should be phrased in a constructive manner to aid the authors.

Authors: Our choice of what to post reflects our interests and knowledge, so we will only post a biased subset of evolutionary, population, and statistical genetics preprints that attract our interest. We will endeavor to be somewhat thorough but we will doubtless miss some interesting preprints, e.g. especially if they are not in the quantitative biology arXiv subfield. If you want us to link to your preprint please drop us a line, our emails can easily be found via our University sites. Alternatively send a tweet to @Haldanessieve.

Why “Haldane’s Sieve”?

A brief description of the name of this site is perhaps in order. When a new beneficial allele arises in a population, the probability that it eventually reaches fixation is influenced by a number of factors. One of these is the dominance coefficient of the allele. The reason the dominance coefficient matters is because early in the life of the allele, while it is at low frequency, it is almost always present in the population in heterozygous form. Therefore all else being equal, dominant beneficial alleles can increase in frequency due to selection faster than recessive alleles, increasing their probability of eventual fixation (or establishment in the population). This effect was noted by Haldane (Haldane 1924,1927) and has become known as “Haldane’s sieve” (Turner 1981; Charlesworth 1992). Analogously, we seek to increase the exposure of interesting papers early in their lifespan, hopefully increasing the probability that they reach their target audience.

A nameless wit has pointed out to us that preprints would really count as standing variation in this analogy and might therefore not be subject to Haldane’s sieve (see Orr and Betancourt Genetics 2001 ). We leave it to the reader to decide whether the analogy holds.

Image

The image of Haldane is from wikipedia.
The image of sieve is from fdctsevilla who kindly uses the creative commons 2.0. It’s surprisingly difficult to find a usable picture of a sieve.

Graham Coop and Joe Pickrell