Bayesian test for co-localisation between pairs of genetic association studies using summary statistics

Bayesian test for co-localisation between pairs of genetic association studies using summary statistics
Claudia Giambartolomei (1), Damjan Vukcevic (2), Eric E. Schadt (3), Aroon D. Hingorani (1), Chris Wallace (4), Vincent Plagnol (1) ((1) University College London (UCL), London, UK, (2) Royal Children’s Hospital, Melbourne, Australia, (3) Mount Sinai School of Medicine, New York USA, (4) University of Cambridge, Cambridge, UK)
(Submitted on 17 May 2013)

Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. A key feature of the method is the ability to derive the key output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at (this http URL). We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including > 100,000 individuals of European ancestry. Our co-localisation results are broadly consistent with the conclusion from the published meta-analysis. Combining all lipid biomarkers, our re-analysis supported 29 out of 38 reported co-localisation results with eQTLs. Two clearly discordant findings (IFT172, CPNE1), as well as multiple new co-localisation results, highlight the value of a formal systematic statistical test. Our findings provide information about the causal gene in associated intervals and have direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.

From Many, One: Genetic Control of Prolificacy during Maize Domestication

From Many, One: Genetic Control of Prolificacy during Maize Domestication
David M. Wills, Clinton Whipple, Shohei Takuno, Lisa E. Kursel, Laura M. Shannon, Jeffrey Ross-Ibarra, John F. Doebley
(Submitted on 4 Mar 2013)

A reduction in number and an increase in size of inflorescences is a common aspect of plant domestication. When maize was domesticated from teosinte, the number and arrangement of ears changed dramatically. Teosinte has long lateral branches that bear multiple small ears at their nodes and tassels at their tips. Maize has much shorter lateral branches that are tipped by a single large ear with no additional ears at the branch nodes. To investigate the genetic basis of this difference in prolificacy (the number of ears on a plant), we performed a genome-wide QTL scan. A large effect QTL for prolificacy (prol1.1) was detected on the short arm of chromosome one in a location that has previously been shown to influence multiple domestication traits. We fine-mapped prol1.1 to a 2.7 kb interval or causative region upstream of the grassy tillers1 gene, which encodes a homeodomain leucine zipper transcription factor. Tissue in situ hybridizations reveal that the maize allele of prol1.1 is associated with up-regulation of gt1 expression in the nodal plexus. Given that maize does not initiate secondary ear buds, the expression of gt1 in the nodal plexus in maize may suppress their initiation. Population genetic analyses indicate positive selection on the maize allele of prol1.1, causing a partial sweep that fixed the maize allele throughout most of domesticated maize. This work shows how a subtle cis-regulatory change in tissue specific gene expression altered plant architecture in a way that improved the harvestability of maize.

Population Genetics of Rare Variants and Complex Diseases

Population Genetics of Rare Variants and Complex Diseases
M. Cyrus Maher, Lawrence H. Uricchio, Dara G. Torgerson, Ryan D. Hernandez
(Submitted on 12 Feb 2013)

Identifying drivers of complex traits from the noisy signals of genetic variation obtained from high throughput genome sequencing technologies is a central challenge faced by human geneticists today. We hypothesize that the variants involved in complex diseases are likely to exhibit non-neutral evolutionary signatures. Uncovering the evolutionary history of all variants is therefore of intrinsic interest for complex disease research. However, doing so necessitates the simultaneous elucidation of the targets of natural selection and population-specific demographic history. Here we characterize the action of natural selection operating across complex disease categories, and use population genetic simulations to evaluate the expected patterns of genetic variation in large samples. We focus on populations that have experienced historical bottlenecks followed by explosive growth (consistent with most human populations), and describe the differences between evolutionarily deleterious mutations and those that are neutral. Genes associated with several complex disease categories exhibit stronger signatures of purifying selection than non-disease genes. In addition, loci identified through genome-wide association studies of complex traits also exhibit signatures consistent with being in regions recurrently targeted by purifying selection. Through simulations, we show that population bottlenecks and rapid growth enables deleterious rare variants to persist at low frequencies just as long as neutral variants, but low frequency and common variants tend to be much younger than neutral variants. This has resulted in a large proportion of modern-day rare alleles that have a deleterious effect on function, and that potentially contribute to disease susceptibility.

Our paper: An experimental test for genetic constraints in Drosophila melanogaster

Our next guest post is by Ian Dworkin (@IanDworkin) on his paper (along with coauthors) An experimental test for genetic constraints in Drosophila melanogaster.

We have recently posted a (heavily revised) manuscript to arXiv detailing how we used the fruit fly Drosophila melanogaster (you can read here about why these little flies are so wonderful) to test a particular hypothesis about a genetic constraint, and more generally how our knowledge of development may inform us about the structure of the genetic variance-covariance matrix, G. Also we developed a really cool set of statistical models that evaluated our explicit hypotheses (more on that right at the end of the post)!

As a quick reminder (or introduction), G summarizes both how much genetic variation particular traits have, as well as how much traits co-vary genetically. This covariation can be due to “pleiotropy” which is a fancy word for when a gene (or a mutation in that gene) influences more than one trait. ie. a mutation might influence both your eye and hair colour). These traits can also covary together when two or more alleles (each influencing different traits) are physically close to each other (linked) and recombination has not had enough time to break these combinations apart. I highly recommend Jeff Conner’s recent review in Evolution for a nice review of these (and other concepts related to some issues I discuss below).

Evolutionary biology, in particular evolutionary quantitative genetics thinks a lot about the G-matrix, and how it interacts with natural selection (or drift) to generate evolutionary change. This is summarized by the now famous equation linking change in trait means(Δ) as a function of both genetic variation (and covariation) and the strength of natural selection (usually measured as a so-called selection gradient, β). This is the multivariate (more than one trait) version of the breeders equation (made most famous by all of the seminal work by R. Lande).

Δz̄=Gβ


Why do we care so much about this little equation? It encapsulates many pretty heady ideas.  First and foremost that you can not have evolutionary change without genetic variation. That’s right, natural selection by itself is not enough. You can have very strong selection for traits (such as running speed) to survive better with a predator around, but if there is no heritable variation for running speed, no (evolutionary) change will happen in the proceeding generations (and good luck with that tiger coming your way). However, once you have to consider multiple traits (running speed, endurance and hearing), we have to think about whether there is available genetic variations for combinations of traits, and whether these are “oriented” in a similar direction to natural selection. If not, it may be that evolutionary change with be slowed considerably (even if each traits seems to have lots of heritable variation). Of course if the genetic variation for all of these traits is pointing in the same direction as selection, then evolution may proceed very quickly indeed! The ideas get more interesting and complex from there, but they are not the for this discussion (the paper above by Jeff Conner, and this great review by Katrina McGuigan are definitely worth reading for more on this).

In any case, much thought has been given to how this G matrix can change both by natural selection and by other factors such as new mutation. Depending on how G changes, future evolutionary potential might change, which is pretty cool if you think about it! How might G change then? These are important ideas, because while we can estimate what G looks like, and how it might change (in particular due to natural selection), it is much harder to know what it will look like far in the future, making our ability to predict long term evolutionary change more difficult.
So what might help us predict G? One idea is that our knowledge of developmental biology will help us understand the effects of mutations, and thus G. If so, developmental biology could be a particularly powerful way of predicting the potential for evolutionary change, or lack there of (a so called developmental constraint).

To test this idea, I decided to use a homeotic mutation. Homeosis is the term used for when one structure (like an arm) is transformed (during development) to another (related) structure like a leg.  In fruitflies homeotic mutations are the stuff of legend (and nobel prizes), in particular for the wonderful cases of the poor critters growing with legs (instead of antenna) out of their heads, or four winged flies. You can see wonderful examples of mutations causing such homeotic changes in flies and other critters here.

In our case we used a much weaker and subtler homeotic mutation Ubx1, which causes slight, largely quantitative changes. For example with this mutation, the third set of legs on the fly would be expected to resemble (in terms of lengths of the different parts of the leg) the second set of legs (flies like all insects have 3 sets of legs as adults). We wanted to know whether when we changed the third legs to look like second legs, would the G for the transformed third leg look that of a normal third leg or a normal second leg? Thus we were trying to predict changes in G based on what we know (a priori) of development and genetics in the fruitfly.

So what did we find? The most important points are summarized in figure 2 and table 3 (if you want to check out the paper that is). The TL’DR version is this: Yes, the legs homeotically transformed like we expected, but G of the mutant legs did not really change very much from that of a normal third leg. In other words, our knowledge of development did not really help us much in understanding changes in G. There are a few reasons why (which we explain in the paper), but I think that it is an interesting punchline, and I will leave it up to you to decide what it means (and if our experiment, analysis and interpretation are reasonable and logically consistent).

I also really want to give a shout out to one of the co-authors (JH) who developed the particular statistical model that we ended up using. He developed a set of explicit models that really helped us test our specific hypotheses directly with the data and experimental design at hand. This is sadly rarely done with statistics, so it is worth reading just for that! I really think (hope?) that this combination of approaches can be very useful for evolutionary genetics. Let me know what you think!

Natural selection. VI. Partitioning the information in fitness and characters by path analysis

Natural selection. VI. Partitioning the information in fitness and characters by path analysis
Steven A. Frank
(Submitted on 22 Jan 2013)

Three steps aid in the analysis of selection. First, describe phenotypes by their component causes. Components include genes, maternal effects, symbionts, and any other predictors of phenotype that are of interest. Second, describe fitness by its component causes, such as an individual’s phenotype, its neighbors’ phenotypes, resource availability, and so on. Third, put the predictors of phenotype and fitness into an exact equation for evolutionary change, providing a complete expression of selection and other evolutionary processes. The complete expression separates the distinct causal roles of the various hypothesized components of phenotypes and fitness. Traditionally, those components are given by the covariance, variance, and regression terms of evolutionary models. I show how to interpret those statistical expressions with respect to information theory. The resulting interpretation allows one to read the fundamental equations of selection and evolution as sentences that express how various causes lead to the accumulation of information by selection and the decay of information by other evolutionary processes. The interpretation in terms of information leads to a deeper understanding of selection and heritability, and a clearer sense of how to formulate causal hypotheses about evolutionary process. Kin selection appears as a particular type of causal analysis that partitions social effects into meaningful components.

Does your gene need a background check? How genetic background impacts the analysis of mutations, genes, and evolution

Does your gene need a background check? How genetic background impacts the analysis of mutations, genes, and evolution
Chris H. Chandler, Sudarshan Chari, Ian Dworkin
(Submitted on 12 Jan 2013)

The premise of genetic analysis is that a causal link exists between phenotypic and allelic variation. Yet it has long been documented that mutant phenotypes are not a simple result of a single DNA lesion, but rather are due to interactions of the focal allele with other genes and the environment. Although an experimentally rigorous approach, focusing on individual mutations and isogenic control strains, has facilitated amazing progress within genetics and related fields, a glimpse back suggests that a vast complexity has been omitted from our current understanding of allelic effects. Armed with traditional genetic analyses and the foundational knowledge they have provided, we argue that the time and tools are ripe to return to the under-explored aspects of gene function and embrace the context-dependent nature of genetic effects. We assert that a broad understanding of genetic effects and the evolutionary dynamics of alleles requires identifying how mutational outcomes depend upon the wild-type genetic background. Furthermore, we discuss how best to exploit genetic background effects to broaden genetic research programs.

Improving the Efficiency of Genomic Selection

Improving the Efficiency of Genomic Selection
Marco Scutari, Ian Mackay, David J. Balding
(Submitted on 10 Jan 2013)

We investigate two approaches to increase the efficiency of phenotypic prediction from genome-wide markers, which is a key step for genomic selection (GS) in plant and animal breeding. The first approach is feature selection based on Markov blankets, which provide a theoretically-sound framework for identifying non-informative markers. Fitting GS models using only the informative markers results in simpler models, which may allow cost savings from reduced genotyping. We show that this is accompanied by no loss, and possibly a small gain, in predictive power for four GS models: partial least squares (PLS), ridge regression, LASSO and elastic net. The second approach is the choice of kinship coefficients for genomic best linear unbiased prediction (GBLUP). We compare kinships based on different combinations of centring and scaling of marker genotypes, and a newly proposed kinship measure that adjusts for linkage disequilibrium (LD).
We illustrate the use of both approaches and examine their performances using three real-world data sets from plant and animal genetics. We find that elastic net with feature selection and GBLUP using LD-adjusted kinships performed similarly well, and were the best-performing methods in our study.

easyGWAS: An integrated interspecies platform for performing genome-wide association studies

easyGWAS: An integrated interspecies platform for performing genome-wide association studies

Dominik Grimm, Bastian Greshake, Stefan Kleeberger, Christoph Lippert, Oliver Stegle, Bernhard Schölkopf, Detlef Weigel, Karsten Borgwardt
(Submitted on 19 Dec 2012)

Motivation: The rapid growth in genome-wide association studies (GWAS) in plants and animals has brought about the need for a central resource that facilitates i) performing GWAS, ii) accessing data and results of other GWAS, and iii) enabling all users regardless of their background to exploit the latest statistical techniques without having to manage complex software and computing resources.
Results: We present easyGWAS, a web platform that provides methods, tools and dynamic visualizations to perform and analyze GWAS. In addition, easyGWAS makes it simple to reproduce results of others, validate findings, and access larger sample sizes through merging of public datasets.
Availability: Detailed method and data descriptions as well as tutorials are available in the supplementary materials. easyGWAS is available at this http URL
Contact: dominik.grimm@tuebingen.mpg.de

GWAPP: A Web Application for Genome-wide Association Mapping in A. thaliana

GWAPP: A Web Application for Genome-wide Association Mapping in A. thaliana
Ümit Seren (1), Bjarni J. Vilhjálmssona (1 and 2), Matthew W. Horton (1 and 3), Dazhe Meng (4), Petar Forai (1), Yu S. Huang (4), Quan Long (1), Vincent Segura (5), Magnus Nordborg (1 and 2) ((1) Gregor Mendel, Institute Austrian Academy of Sciences, (2) Molecular and Computational Biology, University of Southern California, (3) Department of Ecology and Evolution, University of Chicago, (4) Center for Neurobehavioral Genetics, Semel Institute, University of California Los Angeles, (5) INRA, France)
(Submitted on 4 Dec 2012)

Arabidopsis thaliana is an important model organism for understanding the genetics and molecular biology of plants. Its highly selfing nature, together with other important features, such as small size, short generation time, small genome size, and wide geographic distribution, make it an ideal model organism for understanding natural variation. Genome-wide association studies (GWAS) have proven a useful technique for identifying genetic loci responsible for natural variation in A. thaliana. Previously genotyped accessions (natural inbred lines) can be grown in replicate under different conditions, and phenotyped for different traits. These important features greatly simplify association mapping of traits and allow for systematic dissection of the genetics of natural variation by the entire Arabidopsis community. To facilitate this, we present GWAPP, an interactive web-based application for conducting GWAS in A. thaliana. Using an efficient Python implementation of a linear mixed model, traits measured for a subset of 1386 publicly available ecotypes can be uploaded and mapped with an efficient mixed model and other methods in just a couple of minutes. GWAPP features an extensive, interactive, and a user-friendly interface that includes interactive manhattan plots and interactive local and genome-wide LD plots. It facilitates exploratory data analysis by implementing features such as the inclusion of candidate SNPs in the model as cofactors.

Evolution of male life histories and age-dependent sexual signals under female choice

Evolution of male life histories and age-dependent sexual signals under female choice
Joel James Adamson
(Submitted on 16 Nov 2012)

Strategic models have predicted that males could benefit from age-dependent sexual advertisement following evolution of increased lifespan. Dynamical considerations may play a crucial role in the origin of age-dependent sexual signals, despite strategic advantages in populations with established signals and preferences. I investigated the problem that rare trait-bearing males may suffer low viability due to small young-age signals, restricting the favorable conditions for age-dependent trait evolution. I also ask when age-dependence will prevail during trait evolution if males bearing age-dependent traits co-occur with males carrying age-independent traits. I used numerical simulations to analyze the evolution of an age-structured haploid population with no genetic drift. Age-dependence limits the evolution of male traits to cases of relatively weak selection against the trait, but the trait fixes at smaller sizes when age-dependent than when age-independent. When mode of expression (age-dependence versus age-independence) evolved along with the trait, age-independence prevailed over much of parameter space, although mode of expression remained polymorphic at small trait sizes under weak selection. The ubiquity of age-dependent traits in nature shows that many species’ life-histories satisfy the conditions for age-dependent trait evolution. My results suggest that high adult male survival facilitates sexual selection by favoring the evolution of age-dependent sexual signals under fairly broad conditions.