On The External Branches Of Coalescent Processes With Multiple Collisions With An Emphasis On The Bolthausen-Sznitman Coalescent

On The External Branches Of Coalescent Processes With Multiple Collisions With An Emphasis On The Bolthausen-Sznitman Coalescent
Jean-Stephane Dhersin (IG, LAGA), Martin Moehle
(Submitted on 15 Sep 2012)

A recursion for the joint moments of the external branch lengths for coalescents with multiple collisions (\Lambda-coalescents) is provided. This recursion is used to derive asymptotic expansions as the sample size n tends to infinity for the moments of the total external branch length of the Bolthausen–Sznitman coalescent. The proof is based on an elementary difference method. An alternative differential equation method is developed which can be used to obtain exact solutions for the joint moments of the external branch lengths for the Bolthausen–Sznitman coalescent. The results for example show that the lengths of two randomly chosen external branches are positively correlated for the Bolthausen–Sznitman coalescent, whereas they are negatively correlated for the Kingman coalescent provided that n\ge 4.

Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data

Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data

Darren Kessner, Tom Turner, John Novembre
(Submitted on 19 Sep 2012)

DNA samples are often pooled, either by experimental design, or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of sequencing. Alternatively, the sample itself may be a mixture of multiple species or strains (e.g. bacterial species comprising a microbiome, or pathogen strains in a blood sample). We present an expectation-maximization (EM) algorithm for estimating haplotype frequencies in a pooled sample directly from mapped sequence reads, in the case where the possible haplotypes are known. This method is relevant to the analysis of pooled sequencing data from selection experiments, as well as the calculation of proportions of different strains within a metagenomics sample. Our method outperforms existing methods based on single- site allele frequencies, as well as simple approaches using sequence read data. We have implemented the method in a freely available open-source software tool.

Our paper: Integrated analysis of variants and pathways in genome-wide association studies using polygenic models of disease

[This author post is by Peter Carbonetto on Integrated analysis of variants and pathways in genome-wide association studies using polygenic models of disease, available from the arXiv here.]

I expect that most readers of this blog appreciate the impact that genome-wide association studies have had on our understanding of many common diseases. Still, I think it is important to reiterate a major appeal of genome-wide association studies: the analysis is conceptually straightforward to understand, even for people who have never had to suffer through a course on statistics or epidemiology. To find links between genetic loci and disease, the analysis consists of systematically searching across the genome for variants that show statistically significant correlation with susceptibility to disease. These correlations signal the presence of nearby genes—or perhaps DNA elements that regulate other genes—that are risk factors for disease.

Many readers of this blog will also appreciate, due to the multifactorial nature of most common diseases, the difficulty of establishing compelling evidence for disease-variant correlations. Hence the search for more effective data-driven strategies for discovering genetic factors underlying common diseases.

One strategy is to assess evidence for the accumulation, or “enrichment,” of disease-conferring mutations within known biological pathways. The intuition is that identifying the accumulation of small genetic effects acting in a common pathway is easier than mapping the individual genes within the pathway that contribute to disease susceptibility.

We asked whether identifying these enriched pathways can also give us useful feedback about the individual gene variants associated with disease. To answer this question, we developed a statistical method that adjusts the support for disease-variant associations to reflect enrichment of associations in a pathway. Our approach was to introduce an enrichment parameter that quantifies the increase in the probability that each variant in the pathway is associated with disease risk.

Is this a valid approach? To investigate, we applied our approach to data from the Wellcome Trust Crohn’s disease study from 2007. First, we identified a broad class of cytokine signaling genes that were enriched for genetic associations with Crohn’s disease. Next, by prioritizing variants in this pathway, we discovered candidates for association—including the STAT3 gene, the IBD5 locus, and the MHC class II genes—that were not identified in conventional analyses of the same data. These results help validate our approach, as these genetic associations have been independently confirmed in other studies and meta-analyses with much larger combined samples.

Several other important lessons emerged from our case study:

1. Interrogate as many pathways as possible. Because we collected over 3000 candidate pathways from several sources (Reactome, KEGG, BioCarta, BioCyc, etc.), many of the pathways highlighted in previous analyses of the same data were eclipsed by much stronger enrichment signals in our analysis.

2. Assess evidence for combinations of enriched pathways. Some pathways become interesting only after assessing enrichment of the pathway in combination with another pathway.

3. Account for the heterogeneity of effect sizes in Crohn’s disease. One of the assumptions we made in our analysis, mainly out of convenience, was that the additive effects on disease risk are normally distributed. While this assumption simplified this analysis, we suspect that a normal distribution does not adequately capture the smaller effect sizes in pathways, leading to a loss of power to detect enriched pathways.

At conferences, and around the lab, I’ve heard many complaints about pathway analysis (or gene set enrichment analysis) for genome-wide association studies. One complaint is that the results are difficult to interpret. Another common complaint is that the findings are sensitive to arbitrary significance thresholds. While we didn’t devote much space in the paper to a discussion of these issues, we believe that our approach offers a coherent solution to many of these problems.

Ultimately, we would like other researchers to use our methods to analyze data from their own genome-wide association studies. We tried to make our paper as accessible as possible, especially to biologists that are not well-acquainted with Bayesian approaches, by carefully explaining how to interpret the Bayes factors and posterior statistics used in the analysis. We are working on releasing the full source code (in R and MATLAB) for all our methods, and accompanying documentation.

Peter Carbonetto

Our paper: The genetic prehistory of southern Africa

[This author post is by Joe Pickrell (@joe_pickrell), Nick Patterson, Mark Stoneking, David Reich, and Brigitte Pakendorf on The genetic prehistory of southern Africa, available from arXiv here]

The indigenous populations of southern Africa are phenotypically, linguistically, culturally, and genetically diverse. Although many groups speak Bantu languages (having arrived in the region during an expansion of Iron-Age agriculturalists), there are a number of populations who speak diverse non-Bantu languages with heavy use of click consonants. We refer to these populations as “Khoisan“. Most of the Khoisan populations are hunter-gatherers, but some are pastoralists; the extensive linguistic and cultural diversity of the Khoisan (who live in a relatively small region around the Kalahari semi-desert) is historically puzzling.

Two hunter-gatherer (or formerly hunter-gatherer) populations in East Africa, the Hadza and Sandawe, also speak languages that also make use of click consonants. Linguists see little in common between the languages in southern Africa and Hadza, although Sandawe might be genealogically related to some of the Khoisan languages. Nevertheless, the shared use of click consonants and a foraging lifestyle led many to hypothesize that the southern African Khoisan populations are genetically related to the Hadza and Sandawe, which would imply that their ancestors were once considerably more widespread. This hypothesis has been controversial for decades.

Tree relating the Khoisan-like proportion of ancestry (shown in blue in the barplot) in Khoisan, Hadza, and Sandawe after accounting for non-Khoisan admixture.

In our study, we use genetic data to address the history of the diverse groups within southern Africa and their relationship to the Hadza and Sandawe. Specifically, we genotyped individuals from 16 Khoisan populations, 5 neighboring populations that speak Bantu languages, and the Hadza (the latter thanks to Brenna Henn, Joanna Mountain, and Carlos Bustamante) on a SNP array designed for studies of human history, in that the SNP ascertainement scheme is known and includes SNPs ascertained in the Khoisan. We then merged in Hadza and Sandawe samples from a recent paper by Joseph Lachance, Sarah Tishkoff and colleagues. The main conclusions are as follows:

  1. Within the southern African Khoisan, there are two genetic groups, which correspond roughly to populations in the northwest and southeast Kalahari semi-desert. Populations from these two groups have been labeled in the tree in this post (see also Figure 1B in the preprint). We estimate that these two groups diverged within the last 30,000 years. However, this date should be taken as an upper bound due to point #2 below.
  2. All southern African Khoisan groups are admixed with non-Khoisan populations. Even the most isolated Khoisan groups (i.e. the “San” from the HGDP, who are included in the “Ju|’hoan_North” group in our paper) show some evidence of admixture with agricultualist and/or pastoralist groups. A subtle technical point is that this had not been previously noticed because methods that rely on correlations in allele frequencies are sometimes unable to detect admixture if all populations are admixed (this is related to Mr. Razib Khan’s post on why ADMIXTURE is not a test for admixure). To get around this, we developed new methods based on the decay of linkage disequilibrum.
  3. The Hadza and Sandawe trace part of their ancestry to admixture with a population related to the Khoisan. After accounting for admixture, we built a tree of “Khoisan-like” ancestry in the southern and eastern African populations (see the Figure above). The striking thing is that the Hadza and Sandawe fall with high confidence on the same branch as the Khoisan. This suggests that, prior to subsequent migrations of food-producing peoples over most of sub-Saharan Africa, populations related to the Khoisan were indeed spread continuously over a huge geographic range including Tanzania and southern Africa.

We’re excited about these results for a number of reasons. First of all, we’re now on our way towards understanding the history of the diverse Khoisan populations–for years these populations have been treated as genetically equivalent, but it’s clear that each population has its own complex history. Secondly, with the new statistical methods we’ve developed we were able to show not only the varying amounts of admixture that has occurred at different times in southern African populations, but were also able to peel away these layers of admixture to learn about the relationships among Khoisan populations that existed thousands of years ago. Finally, we think that these results have important implications for work using genetics to understand the geographic origin of modern humans within Africa. Though both southern and eastern Africa have been proposed as potential origins, from the tree in this post, we see no genetic evidence in favor of either; from our point of view this question remains open.

Joe Pickrell, Nick Patterson, Mark Stoneking, David Reich, and Brigitte Pakendorf

An age-of-allele test of neutrality for transposable element insertions not at equilibrium

An age-of-allele test of neutrality for transposable element insertions not at equilibrium

Justin P. Blumenstiel, Miaomiao He, Casey M. Bergman
(Submitted on 16 Sep 2012)

How natural selection acts to limit the proliferation of transposable elements (TEs) in genomes has been of interest to evolutionary biologists for many years. To describe TE dynamics in populations, many previous studies have relied on the assumption of equilibrium between transposition and selection. However, since TE invasions are known to happen in bursts through time, this assumption may not be reasonable. Here we derive a test of neutrality for TE insertions that does not rely on the assumption of transpositional equilibrium. We consider the case of TE insertions that have been ascertained from a single haploid reference genome sequence and have had their allele frequency estimated in a population sample. By conditioning on age information provided within the sequence of a TE insertion in the form of the number of substitutions that have occurred within the fragment since insertion into a reference genome, we derive the probability distribution for the TE allele frequency in a population sample under neutrality. Taking models of population fluctuation into account, we then test the fit of predictions of our model to allele frequency data from 190 retrotransposon insertion loci in North American and African populations of Drosophila melanogaster. Using this non-equilibrium model, we are able to explain about 80% of the variance in TE insertion allele frequencies. Controlling for nonequilibrium dynamics of transposition and host demography, we demonstrate how one may detect negative selection acting against most TEs as well as evidence for a small subset of TEs being driven to high frequency by positive selection. Our work establishes a new framework for the analysis of the evolutionary forces governing large insertion mutations like TEs or gene duplications.

Thoughts on: The date of interbreeding between Neandertals and modern humans.

The following are my (Graham Coop, @graham_coop) brief thoughts on Sriram Sankararaman et al.’s arXived article: “The date of interbreeding between Neandertals and modern humans.”. You can read the authors’ guest post here, along with comments by Sriram and others.

Overall it’s a great article, so I thought I’d spend sometime talking about the interpretation of the results. Please feel free to comment, our main reason for doing these posts is to facilitate early discussion of preprints.

The authors analysis relies on measuring the correlation along the genome between alleles that may have been inherited from the putative admixture event [so called admixture. The idea being that if there was in fact no admixture and these alleles have just been inherited from the common ancestral population (>300kya) then these correlations should be very weak, as there has been plenty of time for recombination to break down the correlation between these markers. If there has been a single admixture event, the rate at which the correlation decays with the genetic distance between the markers is proportional to this admixture time [i.e. slower decay for a more recent event, as there is less time for recombination]. These ideas for testing for admixture have been around in the literature for sometime [e.g. Machado et al], its the application and genome-wide application that is novel.

As you can tell from the title and abstract of the paper, the authors find pretty robust evidence that this curve is decaying slower than we’d expect if there had been no gene flow, and estimate this “admixture time” to be 37k-86k years ago. However, as the authors are careful to note in their discussion, this is not a definitive answer to whether modern humans and Neandertals interbred, nor is this number a definite time of admixture. Obviously the biological implications of the admixture result will get a lot of discussion, so I thought I’d instead spend a moment on these caveats. [This post has run long, so I’ll only get to the 1st point in this post and perhaps return to write another post on this later].

Okay so did Neandertals actually mate with humans?

The difficulty [as briefly discussed by the authors] is that we cannot know for sure from this analysis that the time estimated is the time of gene flow from Neandertals, and not some [now extinct] population that is somewhat closer to Neandertals than any modern humans.

Consider the figure below. We would like to say that the cartoon history on the left is true, where gene flow has happened directly from Neandertals into some subset of humans. The difficulty is that the same decay curve could be generated by the scenario on the right, where gene flow has occurred from some other population that shares more of its population history with Neandertals than any current day human population does.

Why is this? Well allele frequency change that occurred in the red branch [e.g. due to genetic drift] means that the frequencies in population X and Neandertals are correlated. This means that when we ask questions about correlations along the genome between alleles shared between Neanderthals and humans, we are also asking questions about correlations along the genome between population X and modern humans. So under scenario B I think the rate of decay of the correlation calculated in the paper is a function only of the admixture time of population X with Europeans, and so there may have been no direct admixture from Neandertals into Eurasians*.

First thing is first, that doesn’t diminish how interesting the result is. If interpretation of the decay as a signal of admixture is correct, then it still means that Eurasians interbred with some ancient human population, which was closer to Neandertals than other modern humans. That seems pretty awesome, regardless of whether that population is Neanderthals or some yet undetermined group.

At this point you are likely saying: well we know that Neandertals existed as a [somewhat] separate population/species who are these population X you keep talking about and where are their remains? Population X could easily be a subset of what we call Neandertals, in which case you’ve been reading this all for no reason [if you only want to know if we interbred with Neandertals]. However, my view is that in the next decade of ancient human population history things are going to get really interesting. We have already seen this from the Denisovian papers [1,2], and the work of ancient admixture in Africa (e.g. Hammer et al. 2011, Lachance et al. 2012). We will likely discover a bunch of cryptic somewhat distinct ancient populations, that we’ve previously [rightly] grouped into a relatively small number of labels based on their morphology and timing in the fossil record. We are not going to have names for many of these groups, but with large amounts of genomic data [ancient and modern] we are going to find all sorts of population structure. The question then becomes not an issue of naming these populations, but understanding the divergence and population genetic relationship among them.

There’s a huge range of (likely more plausible) scenarios that are hybrids between A and B that I think would still give the same difficulties with interpretations. For example, ongoing low levels of gene flow from population X into the Ancestral “population” of modern humans, consistent with us calling population X modern humans [see Figure below, **]. But all of the scenarios likely involve some thing pretty interesting happening in the past 100,000 years, with some form of contact between Eurasians and a somewhat diverged population.

As I say, the authors to their credit take the time in the discussion to point out this caveat. I thought some clarification of why this is the case would be helpful. The tools to address this problem more thoroughly are under development by some of the authors on this paper [Patterson et al 2012] and others [Lawson et al.]. So these tools along with more sequencing of ancient remains will help clarify all of this. It is an exciting time for human population genomics!

* I think I’m right in saying that the intercept of the curve with zero is the only thing that changes between Fig 1A and Fig 1B.

** Note that in the case shown in Figure 2, I think Sriram et al are mostly dating the red arrow, not any of the earlier arrows. This is because they condition their subset of alleles to represent introgression into European and to be at low frequency in Africa. We would likely not be able to date the deeper admixture arrow into the ancestor on Eurasian/Africa using the authors approach, as [I think] it relies on having a relatively non-admixed population to use as a control.

Robust identification of local adaptation from allele frequencies

Robust identification of local adaptation from allele frequencies

Torsten Günther, Graham Coop
(Submitted on 13 Sep 2012)

Comparing allele frequencies among populations that differ in environment has long been a tool for detecting loci involved in local adaptation. However, such analyses are complicated by an imperfect knowledge of population allele frequencies and neutral correlations of allele frequencies among populations due to shared population history and gene flow. Here we develop a set of methods to robustly test for unusual allele frequency patterns, and correlations between environmental variables and allele frequencies while accounting for these complications based on a Bayesian model previously implemented in the software Bayenv. Using this model, we calculate a set of `standardized allele frequencies’ that allows investigators to apply tests of their choice to multiple populations, while accounting for sampling and covariance due to population history. We illustrate this first by showing that these standardized frequencies can be used to calculate powerful tests to detect non-parametric correlations with environmental variables, which are also less prone to spurious results due to outlier populations. We then demonstrate how these standardized allele frequencies can be used to construct a test to detect SNPs that deviate strongly from neutral population structure. This test is conceptually related to FST but should be more powerful as we account for population history. We also extend the model to next-generation sequencing of population pools, which is a cost-efficient way to estimate population allele frequencies, but it implies an additional level of sampling noise. The utility of these methods is demonstrated in simulations and by re-analyzing human SNP data from the HGDP populations. An implementation of our method will be available from this http URL.

Our paper: A genetic variant near olfactory receptor genes influences cilantro preference

For our next guest post Nick Eriksson (@nkeriks) writes about his ArXived paper with other 23andMe folks: A genetic variant near olfactory receptor genes influences cilantro preference ArXived here

First a little background about research at 23andMe. We have over 150,000 genotyped customers, a large proportion of whom answer surveys online. We run GWAS on pretty much everything trait you can think of (at least everything that is easily reported and possibly related to genetics). Around 2010, we started to ask a couple of questions about cilantro: if people like it, and if they perceive a soapy taste to it.

Fast forward a couple of years, and we have tens of thousands of people answering these questions. We start to see an interesting finding: one SNP significantly associated with both cilantro dislike and perceiving a soapy taste. Best of all, it was in a cluster of olfactory receptor genes.

The sense of smell is pretty cool. Humans have hundreds of olfactory receptor genes that encode G protein-coupled receptors. We perceive smells due to the binding of specific chemicals (“odorants”) to these receptors. There are maybe 1000 total olfactory receptors in various mammalian genomes, but it’s not totally clear which are pseudogenes. There has probably been some loss of these genes in humans as our sense of smell has become less critical. These genes appear in clusters in the genome, which makes it pretty hard for GWAS to pick out a specific gene. For example, in the first 23andMe paper, we identified a variant in a different cluster of olfactory receptors that affected whether you perceive a certain smell in your urine after eating asparagus. However, we still don’t know what the true functional variant in that region is.

Luckily, one of the olfactory receptors near our cilantro SNP turns out to be very well studied. It is known to bind to about 30 different aldehydes, including some of the chemicals that give cilantro its famous odor. So at the core this is a pretty simple paper. We found one significant association; it has as good of a functional story as you’ll see in nearly any GWAS. There are a couple of complications, however. First, we studied two related traits: soapy taste detection and cilantro dislike. They’re relatively correlated (r^2 about 0.33), and they are both associated with the same SNP. It looks like the association is stronger with soapy taste detection (and this trait seemed like it would be less influenced by environment than cilantro dislike), so we used soapy taste as the main phenotype.

The second complicated story is our heritability calculation. We saw about 9% heritability (tagged by the SNPs on our array). However, the confidence interval was pretty huge (-3% to 21%). Roughly, you could think of things falling into three heritability classes: high (height, celiac, type 1 diabetes), medium (type 2 diabetes, Crohn’s) and low (lung, colorectal, and maybe breast cancer). I think that’s about as accurate as the current heritability numbers can get. Our calculation puts cilantro soapy-taste detection into the low heritability group. There is the complication that this is only additive heritability tagged by common SNPs, so this phenotype could actually be very heritable, with most of the action coming from rare variants. But in my opinion, that’s doubtful.

Coming out of mathematics, I’ve always posted my papers to preprint servers. Luckily, this fits in well with 23andMe’s mission of making research faster, more participatory, and more fun. We’ve published all our papers so far in open access journals and have posted a couple of them to Nature Preceedings (before it shut down). I also write everything in LaTeX, so posting to the arXiv is a refreshing change (as compared to most biology journals where you have to undergo a conversion from LaTeX to word that makes everything look terrible (a particular pet peeve of mine with PLOS journals, which I otherwise love)).

I’m very curious to see how posting to the arXiv will affect publicity. Our papers tend to get a fair bit of press. However, I don’t know how the press will deal with one opportunity to report on the paper now (when the results are fresh and novel, but published on a site reporters will mostly not know about) and then another opportunity when the paper gets “blessed” via peer review. Because most of our papers are relatively straightforward GWAS (and we have a lot of coauthors here who have read and written a huge number of such papers), I think getting the data out on a preprint server is particularly important. However, we really need a Genetics category in q-bio!

Feedback on the paper would be most welcome. I’d love to see a replication or a nice functional study to followup, of course. I also think this is a good example for teaching people about genetics. A number of the issues that come up in this paper are a little tricky, but are good examples for understanding the how difficult it is to predict something based on genetics. On the technical side, I’m most curious if there are methods that might give a nice way of analyzing these two correlated traits together. We’ve tried a few regression based approaches for this sort of problem, but haven’t thought of anything entirely satisfactory.

Nick Eriksson

Our paper: The date of interbreeding between Neandertals and modern humans

This post is by Sriram Sankararaman, Nick Patterson, Heng Li, Svante Pääbo, and David Reich on their paper The date of interbreeding between Neandertals and modern humans arXived here

The relationship between modern humans and archaic hominins such as Neandertals has been the subject of intense debate. The sequencing of a Neandertal genome, a couple of years back (Green et al, Science 2010), showed that Neandertals are more closely related to non-African genomes than African genomes. One possible model consistent with this observation is one involving gene flow from Neandertals to modern non-Africans after the divergence of African and non-African populations. Another model that can explain these observations is one in which the population ancestral to modern humans and Neandertals is structured e.g. imagine that the population ancestral to Neandertals and modern humans consists of three groups, A,B and C, where A,B and C represent the ancestors of modern Africans, non-Africans and Neandertals respectively. The extra proximity of Neandertals to non-Africans over Africans could occur if A and B, and B and C exchanged genes with each other followed by C diverging to form Neandertals, and A and B not completely hybridizing before their divergence to form Africans and non-Africans.

The Neandertal (Green et al, Science 2010) and the Denisova genome (Reich et al, Nature 2010) papers considered the possibility of both models — either scenario was shown to produce the skew in the observed D-statistics (a measure of the excess sharing of alleles across groups) that led to Neandertals appearing closer to non-Africans than Africans. Indeed, a recent paper by Eriksson and Manica (Eriksson and Manica, PNAS 2012) used an Approximate Bayesian Computation framework with D-statistics as the summary statistics and arrived at similar conclusions.

A paper from Monty Slatkin’s group (Yang et al, MBE 2012) attempted to differentiate the two scenarios by using the site frequency spectrum. Yang et al considered the site frequency spectrum in Europeans conditioned on observing a derived allele in Neandertal and an ancestral allele in Africans (termed the doubly-conditioned frequency spectrum, dcfs). They used theory and simulations to show that an ancient structure model produces a linear dcfs. On the other hand, they showed that recent gene flow can produce an excess of rare variants which matches the observed dcfs. Interestingly, they also observed that bottlenecks post gene flow had the effect of making the dcfs linear suggesting that gene flow from Neandertals could not have preceded strong bottlenecks in the non-African populations.

A different idea that we explored was to ask if patterns of linkage disequilibrium (LD) might discriminate the two scenarios. If we could pick out haplotypes that came into modern humans from Neandertal, recombination is expected to break these haplotypes down at a fixed rate every generation (assuming neutrality). Haplotypes that came in 1000 generations ago (under recent gene flow) should be expected to be 10 times longer on average than haplotypes that came in 10000 generations ago (under ancient structure). And if we could measure LD precisely enough, we could even date these ancient events. To date such ancient events, we had to address two technical challenges : i) measures of LD can be sensitive to demographic events, ii) for events that occurred 1000s of generations ago, we need to measure LD at size scales at which genetic maps can be quite noisy and this noise can bias estimates of dates.

Theory indicates that the expected LD (measured by Lewontin’s D), across SNPs that arose on the Nenadertal lineage and introgressed, decays exponentially with genetic distance at a rate given by the time of gene flow and is robust to demographic events. This result does not hold in practice due to imperfect ascertainment of these SNPs. We did simulations to show that this decay of LD does provide accurate estimates and can differentiate gene flow and ancient structure. We also came up with a model to assess errors in genetic maps which we then used to obtain a corrected date.

Our results support the recent gene flow scenario with a likely date of gene flow into the ancestors of modern Europeans 37000-86000 years BP although this does not exclude the possibility of ancient structure. A broader methodological question we are exploring is whether LD-based analyses might be generally applicable as a tool for dating other ancient gene flow events.

Sriram Sankararaman, Nick Patterson, Heng Li, Svante Pääbo, and David Reich

Genome-wide analysis points to roles for extracellular matrix remodeling, the visual cycle, and neuronal development in myopia

Genome-wide analysis points to roles for extracellular matrix remodeling, the visual cycle, and neuronal development in myopia

Amy K. Kiefer, Joyce Y. Tung, Chuong B. Do, David A. Hinds, Joanna L. Mountain, Uta Francke, Nicholas Eriksson
(Submitted on 10 Sep 2012)

Myopia, or nearsightedness, is the most common eye disorder, resulting primarily from excess elongation of the eye. The etiology of myopia, although known to be complex, is poorly understood. Here we report the largest ever genome-wide association study (43,360 participants) on myopia in Europeans. We performed a survival analysis on age of myopia onset and identified 19 significant associations (p < 5e-8), two of which are replications of earlier associations with refractive error. These 19 associations in total explain 2.7% of the variance in myopia age of onset, and point towards a number of different mechanisms behind the development of myopia. One association is in the gene PRSS56, which has previously been linked to abnormally small eyes; one is in a gene that forms part of the extracellular matrix (LAMA2); two are in or near genes involved in the regeneration of 11-cis-retinal (RGR and RDH5); two are near genes known to be involved in the growth and guidance of retinal ganglion cells (ZIC2, SFRP1); and five are in or near genes involved in neuronal signaling or development. These novel findings point towards multiple genetic factors involved in the development of myopia and suggest that complex interactions between extracellular matrix remodeling, neuronal development, and visual signals from the retina may underlie the development of myopia in humans.