For our next guest post Nick Eriksson (@nkeriks) writes about his ArXived paper with other 23andMe folks: A genetic variant near olfactory receptor genes influences cilantro preference ArXived here
First a little background about research at 23andMe. We have over 150,000 genotyped customers, a large proportion of whom answer surveys online. We run GWAS on pretty much everything trait you can think of (at least everything that is easily reported and possibly related to genetics). Around 2010, we started to ask a couple of questions about cilantro: if people like it, and if they perceive a soapy taste to it.
Fast forward a couple of years, and we have tens of thousands of people answering these questions. We start to see an interesting finding: one SNP significantly associated with both cilantro dislike and perceiving a soapy taste. Best of all, it was in a cluster of olfactory receptor genes.
The sense of smell is pretty cool. Humans have hundreds of olfactory receptor genes that encode G protein-coupled receptors. We perceive smells due to the binding of specific chemicals (“odorants”) to these receptors. There are maybe 1000 total olfactory receptors in various mammalian genomes, but it’s not totally clear which are pseudogenes. There has probably been some loss of these genes in humans as our sense of smell has become less critical. These genes appear in clusters in the genome, which makes it pretty hard for GWAS to pick out a specific gene. For example, in the first 23andMe paper, we identified a variant in a different cluster of olfactory receptors that affected whether you perceive a certain smell in your urine after eating asparagus. However, we still don’t know what the true functional variant in that region is.
Luckily, one of the olfactory receptors near our cilantro SNP turns out to be very well studied. It is known to bind to about 30 different aldehydes, including some of the chemicals that give cilantro its famous odor. So at the core this is a pretty simple paper. We found one significant association; it has as good of a functional story as you’ll see in nearly any GWAS. There are a couple of complications, however. First, we studied two related traits: soapy taste detection and cilantro dislike. They’re relatively correlated (r^2 about 0.33), and they are both associated with the same SNP. It looks like the association is stronger with soapy taste detection (and this trait seemed like it would be less influenced by environment than cilantro dislike), so we used soapy taste as the main phenotype.
The second complicated story is our heritability calculation. We saw about 9% heritability (tagged by the SNPs on our array). However, the confidence interval was pretty huge (-3% to 21%). Roughly, you could think of things falling into three heritability classes: high (height, celiac, type 1 diabetes), medium (type 2 diabetes, Crohn’s) and low (lung, colorectal, and maybe breast cancer). I think that’s about as accurate as the current heritability numbers can get. Our calculation puts cilantro soapy-taste detection into the low heritability group. There is the complication that this is only additive heritability tagged by common SNPs, so this phenotype could actually be very heritable, with most of the action coming from rare variants. But in my opinion, that’s doubtful.
Coming out of mathematics, I’ve always posted my papers to preprint servers. Luckily, this fits in well with 23andMe’s mission of making research faster, more participatory, and more fun. We’ve published all our papers so far in open access journals and have posted a couple of them to Nature Preceedings (before it shut down). I also write everything in LaTeX, so posting to the arXiv is a refreshing change (as compared to most biology journals where you have to undergo a conversion from LaTeX to word that makes everything look terrible (a particular pet peeve of mine with PLOS journals, which I otherwise love)).
I’m very curious to see how posting to the arXiv will affect publicity. Our papers tend to get a fair bit of press. However, I don’t know how the press will deal with one opportunity to report on the paper now (when the results are fresh and novel, but published on a site reporters will mostly not know about) and then another opportunity when the paper gets “blessed” via peer review. Because most of our papers are relatively straightforward GWAS (and we have a lot of coauthors here who have read and written a huge number of such papers), I think getting the data out on a preprint server is particularly important. However, we really need a Genetics category in q-bio!
Feedback on the paper would be most welcome. I’d love to see a replication or a nice functional study to followup, of course. I also think this is a good example for teaching people about genetics. A number of the issues that come up in this paper are a little tricky, but are good examples for understanding the how difficult it is to predict something based on genetics. On the technical side, I’m most curious if there are methods that might give a nice way of analyzing these two correlated traits together. We’ve tried a few regression based approaches for this sort of problem, but haven’t thought of anything entirely satisfactory.