Bayesian test for co-localisation between pairs of genetic association studies using summary statistics

Bayesian test for co-localisation between pairs of genetic association studies using summary statistics
Claudia Giambartolomei (1), Damjan Vukcevic (2), Eric E. Schadt (3), Aroon D. Hingorani (1), Chris Wallace (4), Vincent Plagnol (1) ((1) University College London (UCL), London, UK, (2) Royal Children’s Hospital, Melbourne, Australia, (3) Mount Sinai School of Medicine, New York USA, (4) University of Cambridge, Cambridge, UK)
(Submitted on 17 May 2013)

Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. A key feature of the method is the ability to derive the key output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at (this http URL). We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including > 100,000 individuals of European ancestry. Our co-localisation results are broadly consistent with the conclusion from the published meta-analysis. Combining all lipid biomarkers, our re-analysis supported 29 out of 38 reported co-localisation results with eQTLs. Two clearly discordant findings (IFT172, CPNE1), as well as multiple new co-localisation results, highlight the value of a formal systematic statistical test. Our findings provide information about the causal gene in associated intervals and have direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.

The deleterious mutation load is insensitive to recent population history

The deleterious mutation load is insensitive to recent population history
Yuval B. Simons, Michael C. Turchin, Jonathan K. Pritchard, Guy Sella
(Submitted on 9 May 2013)

Human populations have undergone dramatic changes in population size in the past 100,000 years, including a severe bottleneck of non-African populations and recent explosive population growth. There is currently great interest in how these demographic events may have affected the burden of deleterious mutations in individuals and the allele frequency spectrum of disease mutations in populations. Here we use population genetic models to show that–contrary to previous conjectures–recent human demography has likely had very little impact on the average burden of deleterious mutations carried by individuals. This prediction is supported by exome sequence data showing that African American and European American individuals carry very similar burdens of damaging mutations. We next consider whether recent population growth has increased the importance of very rare mutations in complex traits. Our analysis predicts that for most classes of disease variants, rare alleles are unlikely to contribute a large fraction of the total genetic variance, and that the impact of recent growth is likely to be modest. However, for diseases that have a direct impact on fitness, strongly deleterious rare mutations likely do play important roles, and the impact of very rare mutations will be far greater as a result of recent growth. In summary, demographic history has dramatically impacted patterns of variation in different human populations, but these changes have likely had little impact on either genetic load or on the importance of rare variants for most complex traits.

Statistical Physics of Evolutionary Trajectories on Fitness Landscapes

Statistical Physics of Evolutionary Trajectories on Fitness Landscapes
Michael Manhart, Alexandre V. Morozov
(Submitted on 6 May 2013)

Random walks on multidimensional nonlinear landscapes are of interest in many areas of science and engineering. In particular, properties of adaptive trajectories on fitness landscapes determine population fates and thus play a central role in evolutionary theory. The topography of fitness landscapes and its effect on evolutionary dynamics have been extensively studied in the literature. We will survey the current research knowledge in this field, focusing on a recently developed systematic approach to characterizing path lengths, mean first-passage times, and other statistics of the path ensemble. This approach, based on general techniques from statistical physics, is applicable to landscapes of arbitrary complexity and structure. It is especially well-suited to quantifying the diversity of stochastic trajectories and repeatability of evolutionary events. We demonstrate this methodology using a biophysical model of protein evolution that describes how proteins maintain stability while evolving new functions.

Response to No gene-specific optimization of mutation rate in Escherichia coli

Response to No gene-specific optimization of mutation rate in Escherichia coli
Inigo Martincorena, Nicholas M. Luscombe
(Submitted on 7 May 2013)

In a letter published in Molecular Biology Evolution [10], Chen and Zhang argue that the variation of the mutation rate along the Escherichia coli genome that we recently reported [3] cannot be evolutionarily optimised. To support this claim they first attempt to calculate the selective advantage of a local reduction in the mutation rate and conclude that it is not strong enough to be favoured by selection. Second, they analyse the distribution of 166 mutations from a wild-type E. coli K12 MG1655 strain and 1,346 mutations from a repair-deficient strain, and claim to find a positive association between transcription and mutation rate rather than the negative association that we reported. Here we respond to this communication. Briefly, we explain how the long-standing theory of mutation-modifier alleles supports the evolution of local mutation rates within a genome by mechanisms acting on sufficiently large regions of a genome, which is consistent with our original observations [3,4]. We then explain why caution must be exercised when comparing mutations from repair deficient strains to data from wild-type strains, as different mutational processes dominate these conditions. Finally, a reanalysis of the data used by Zhang and Chen with an alternative expression dataset reveals that their conclussions are unreliable.

The role of twitter in the life cycle of a scientific publication

The role of twitter in the life cycle of a scientific publication
Emily S. Darling, David Shiffman, Isabelle M. Côté, Joshua A. Drew
(Submitted on 2 May 2013)

Twitter is a micro-blogging social media platform for short messages that can have a long-term impact on how scientists create and publish ideas. We investigate the usefulness of twitter in the development and distribution of scientific knowledge. At the start of the life cycle of a scientific publication, twitter provides a large virtual department of colleagues that can help to rapidly generate, share and refine new ideas. As ideas become manuscripts, twitter can be used as an informal arena for the pre-review of works in progress. Finally, tweeting published findings can communicate research to a broad audience of other researchers, decision makers, journalists and the general public that can amplify the scientific and social impact of publications. However, there are limitations, largely surrounding issues of intellectual property and ownership, inclusiveness and misrepresentations of science sound bites. Nevertheless, we believe twitter is a useful social media tool that can provide a valuable contribution to scientific publishing in the 21st century.

Our paper: Integrating influenza antigenic dynamics with molecular evolution

This guest post is by Trevor Bedford (@trvrb) on his paper (along with coauthors): Bedford et al. Integrating influenza antigenic dynamics with molecular evolution arXived here.

The influenza virus shows a remarkable capacity to evolve to escape human immunity. Many other viruses, like measles, do not have this capacity. After infection with measles, a person gains life-long immunity to the virus, and hence measles has become constrained to be a childhood infection. Continual antigenic evolution in influenza necessitates frequent vaccine updates to provide sufficient protection to circulating strains.

Antigenic differences between strains are commonly quantified using the hemagglutination inhibition (HI) assay, which measures the ability of antibodies created against one strain to interfere with virus from another strain. The resulting HI data is represented as a sparse matrix of comparisons between viruses from strains A, B, C… and sera from strains X, Y, Z… Taken by itself, this matrix is difficult to work with. Experienced virologists can pick up the loss of reactivity between groups of viruses in the noisy HI data, but these patterns are not fully quantified.

In our new paper, available on the arXiv, we extend techniques of multidimensional scaling (MDS) pioneered by Derek Smith and colleagues for the analysis of influenza antigenic data. Here, we attempted to bring the MDS antigenic model into a fully Bayesian framework and refer to the revised technique as Bayesian MDS (BMDS). In this model, viruses and sera are represented as 2D coordinates on an antigenic map in which their pairwise distances yield expectations for the HI titers, with antigenically similar viruses lying close to one another and antigenically distant viruses lying far apart.

By placing antigenic cartography in a Bayesian context, we are able to integrate other data sources, most notably sequence data. In this case, genetic sequences provide an evolutionary tree relating virus strains and we assume that antigenic location evolves along this tree in a 2D diffusion process. This process imposes a prior on antigenic locations in which evolutionary similar viruses have a prior expectation of lying close to one another on the map. In the paper, we use this BMDS / diffusion model to investigate patterns of antigenic evolution in 4 circulating lineages of influenza and show that antigenic drift determines to a large degree incidence patterns across time and across lineages.

The paper is also up on GitHub, which I’ll keep updating as it goes through the review process. The BMDS model is implemented in the software package BEAST and is available in the latest source code. I hope to provide tutorials on running the BMDS model in the not-to-distant future.

Slowing evolution is more effective than enhancing drug development for managing resistance

Slowing evolution is more effective than enhancing drug development for managing resistance
Nathan S. McClure, Troy Day
(Submitted on 29 Apr 2013)

Drug resistance is a serious public health problem that threatens to thwart our ability to treat many infectious diseases. Repeatedly, the introduction of new drugs has been followed by the evolution of resistance. In principle there are two ways to address this problem: (i) enhancing drug development, and (ii) slowing drug resistance. We present data and a modeling approach based on queueing theory that explores how interventions aimed at these two facets affect the ability of the entire drug supply system to provide service. Analytical and simulation-based results show that, all else equal, slowing the evolution of drug resistance is more effective at ensuring an adequate supply of effective drugs than is enhancing the rate at which new drugs are developed. This lends support to the idea that evolution management is not only a significant component of the solution to the problem of drug resistance, but may in fact be the most important component.

Positive selection drives faster-Z evolution in silkmoths

Positive selection drives faster-Z evolution in silkmoths
Timothy B. Sackton (1), Russell B. Corbett-Detig (1), Javaregowda Nagaraju (2), R. Lakshmi Vaishna (2), Kallare P. Arunkumar (2), Daniel L. Hartl (1) ((1) Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, USA, (2) Centre of Excellence for Genetics and Genomics of Silkmoths, Laboratory of Molecular Genetics, Centre for DNA Fingerprinting and Diagnostics, Hyderabad, India)
(Submitted on 29 Apr 2013)

Genes linked to X or Z chromosomes, which are hemizygous in the heterogametic sex, are predicted to evolve at different rates than those on autosomes. This faster-X effect can arise either as a consequence of hemizygosity which leads to more efficient selection for recessive beneficial mutations in the heterogametic sex, or as a consequence of reduced effective population size on the hemizygous chromosome, which leads to increased fixation of weakly deleterious mutations due to random genetic drift. Empirical results to date have suggested that, while the overall pattern across taxa is complicated, in general systems with male-heterogamy show a faster-X effect primarily attributable to more efficient selection, whereas systems with female-heterogamy show a faster-Z effect primarily attributable to increased drift. However, to date only a single female-heterogamic taxa has been investigated. In order to test the generality of the faster-Z pattern seen in birds, we sequenced the genome of the Lepidopteran insect Bombyx huttoni, a close outgroup of the domesticated silkmoth Bombyx mori. We show that silkmoths experience faster-Z evolution, but unlike in birds, the faster-Z effect appears to be attributable to more efficient positive selection in females. These results suggest that female-heterogamy alone is unlikely to be sufficient to explain the reduced efficacy of selection on the bird Z chromosome. Instead, it is likely that a combination of patterns of dosage compensation and overall effective population size, among other factors, influence patterns of faster-Z evolution.

Remote Homology Detection in Proteins Using Graphical Models

Remote Homology Detection in Proteins Using Graphical Models
Noah M. Daniels
(Submitted on 24 Apr 2013)

Given the amino acid sequence of a protein, researchers often infer its structure and function by finding homologous, or evolutionarily-related, proteins of known structure and function. Since structure is typically more conserved than sequence over long evolutionary distances, recognizing remote protein homologs from their sequence poses a challenge.
We first consider all proteins of known three-dimensional structure, and explore how they cluster according to different levels of homology. An automatic computational method reasonably approximates a human-curated hierarchical organization of proteins according to their degree of homology.
Next, we return to homology prediction, based only on the one-dimensional amino acid sequence of a protein. Menke, Berger, and Cowen proposed a Markov random field model to predict remote homology for beta-structural proteins, but their formulation was computationally intractable on many beta-strand topologies.
We show two different approaches to approximate this random field, both of which make it computationally tractable, for the first time, on all protein folds. One method simplifies the random field itself, while the other retains the full random field, but approximates the solution through stochastic search. Both methods achieve improvements over the state of the art in remote homology detection for beta-structural protein folds.

Timing of ancient human Y lineage depends on the mutation rate: A comment on Mendez et al

Timing of ancient human Y lineage depends on the mutation rate: A comment on Mendez et al
Melissa A. Wilson Sayres
(Submitted on 22 Apr 2013)

Mendez et al. recently report the identification of a Y chromosome lineage from an African American that is an outgroup to all other known Y haplotypes, and report a time to most recent common ancestor, TMRCA, for human Y lineages that is substantially longer than any previous estimate. The identification of a novel Y haplotype is always exciting, and this haplotype, in particular, is unique in its basal position on the Y haplotype tree. However, at 338 (237-581) thousand years ago, kya, the extremely ancient TMRCA reported by Mendez et al. is inconsistent with the known human fossil record (which estimate the age of anatomically modern humans at 195 +- 5 kya), with estimates from mtDNA (176.6 +- 11.3 kya, and 204.9 (116.8-295.7) kya) and with population genetic theory. The inflated TMRCA can quite easily be attributed to the extremely low Y chromosome mutation rate used by the authors.