Error-prone polymerase activity causes multinucleotide mutations in humans

Error-prone polymerase activity causes multinucleotide mutations in humans
Kelley Harris, Rasmus Nielsen
(Submitted on 5 Dec 2013)

About 2% of human genetic polymorphisms have been hypothesized to arise via multinucleotide mutations (MNMs), complex events that generate SNPs at multiple sites in a single generation. MNMs have the potential to accelerate the pace at which single genes evolve and to confound studies of demography and selection that assume all SNPs arise independently. In this paper, we examine clustered mutations that are segregating in a set of 1,092 human genomes, demonstrating that MNMs become enriched as large numbers of individuals are sampled. We leverage the size of the dataset to deduce new information about the allelic spectrum of MNMs, estimating the percentage of linked SNP pairs that were generated by simultaneous mutation as a function of the distance between the affected sites and showing that MNMs exhibit a high percentage of transversions relative to transitions. These findings are reproducible in data from multiple sequencing platforms. Among tandem mutations that occur simultaneously at adjacent sites, we find an especially skewed distribution of ancestral and derived dinucleotides, with GC→AA, GA→TT and their reverse complements making up 36% of the total. These same mutations dominate the spectrum of tandem mutations produced by the upregulation of low-fidelity Polymerase ζ in mutator strains of S. cerevisiae that have impaired DNA excision repair machinery. This suggests that low-fidelity DNA replication by Pol ζ is at least partly responsible for the MNMs that are segregating in the human population, and that useful information about the biochemistry of MNM can be extracted from ordinary population genomic data. We incorporate our findings into a mathematical model of the multinucleotide mutation process that can be used to correct phylogenetic and population genetic methods for the presence of MNMs.

Author post: The evolution of sex differences in disease genetics


This guest post is by Ted Morrow, Jessica Abbott, and Will Gilks on their review paper Gilks et al. “The evolution of sex differences in disease genetics”

Our paper forms part of a research project (2Sexes_1Genome, 2012-16) devoted to investigating how sex-specific and sexually antagonistic selection influences the genome, and in particular whether genetic variants that are maintained as a result of these forms of selection could contribute to disease risk. We had three main aims with our paper, which we outline below together with a motivation for each.

Our first aim was to summarise evidence for sex-dependent genetic architecture in complex traits that were otherwise shared between the sexes. We focused particularly on disease phenotypes in humans, although a range of complex traits from diverse taxa were considered. The motivation for this was to establish a baseline for how widespread or rare sex-specific genetic architecture is. An important paper in this respect, published in Nature Reviews Genetics (Ober et al., 2008) specifically addressed the question of sex-specific genetic architecture in human diseases. It reviewed selected examples within the human disease genetics literature for sex-specific effects on a range of phenotypes. They concluded that studies where sex was ignored would miss some important variants that contribute to disease risk. While the Ober et al. (2008) paper makes a robust case for investigating sex as a factor in genetic analyses, several other genome-wide association studies in the primary literature have been published since, suggesting that an up to date review of these would be worthwhile. We did not intend to conduct a full-scale meta-analysis, although that would probably be a very informative exercise given potential problems in terms of reporting bias, non-independence of traits, and selection of traits with known sexual dimorphism. Nonetheless, a clear pattern emerges of widespread evidence of sex-specific genetic architecture based on heritability estimates (see Figure 1 in our paper), eQTLs, gene manipulations, expression studies, and SNPs with sex-by-genotype effects (see Table 1 in our paper). A recently published paper (not included in our review) even reports 10 out of 13 loci reaching genome-wide significance for recombination rate having sex-specific effects (Kong et al., 2013).

The second aim was to show how evolutionary theory could provide ultimate explanations for the origins of sex-specific genetic architecture. In this way, we propose that a deeper understanding of why genes cause disease, and why some common diseases show sexually dimorphic expression, may emerge. The evolutionary theory of why the sexes may differ phenotypically goes back to Darwin’s observations (1871) of how selection acts in males and females. He characterized males as active competitors, engaging in physical battles with rivals or investing in costly signals with which to woo potential mates. Females, on the other hand were characterized as being coy and choosy. There is now good evidence that mate choice is something not only limited to females, and that sexual selection also operates well after copulation (i.e. sperm competition and cryptic female choice). The key point is that fundamental differences between the sexes occur in terms of investment in reproduction, and as a consequence the routes by which males and females may maximize their fitness are often different. In other words, both natural and sexual selection frequently take sex-specific forms in terms of strength and/or direction. The latter possibility that selection acts antagonistically between the sexes is well established in several laboratory and wild populations, including humans. From a human disease perspective, disease may occur as a result of an individual’s phenotypic difference (or departure) from an optimal phenotype (where a particular trait value has the greatest fitness). This difference could be the result of a genetic constraint imposed by an intersexual genetic correlation for that trait, or indirectly (i.e. pleiotropically) though genetic correlations with other traits. Sex-specific or sexually antagonistic selection could therefore maintain genetic variation within a population that is either less favourable or actually deleterious for one sex. A recent model (Morrow & Connallon, 2013) shows how alleles with sex-specific or sexually antagonistic effects will contribute more to genetic variation for disease predisposition than alleles that are deleterious to both sexes in equal measure, and achieve higher allele frequencies. As a result, sexual dimorphism in the genetic architecture of complex polygenic diseases would emerge within the population. This evolutionary model clearly indicates that the search for loci contributing to disease risk in humans would benefit from exploring sex-specific genetic effects.

The final aim was to provide readers with an overview of the analytical options available for detecting sex-specific associations in genome-wide studies of complex diseases and phenotypes. As we show, more studies are investigating and discovering sex-dependent effects using GWAS data, Common strategies are to separate or stratify the samples within case and control groups by sex, or to model sex as a covariate. The first approach reduces the statistical power to detect sex-dependent effects, and thus only strong ones will be detected. The second simply controls for any sex-specific effects, it is not intended to identify them. We instead advocate the inclusion of a genotype-by-sex interaction term in statistical models, available as an option in some of the commonly used analytical platforms such as GenABEL and PLINK.

Overall, we hope our article raises the profile of sex-specific genetic effects, a topic that is already apparently receiving increasing interest judging by the recent crop of sex-specific associations appearing in the GWAS literature. This forms a more general theme within the field of human disease genetics, of exploring the impact of interaction effects, such as genotype-by-environment interactions. The identification of strong main effects has had successes but the debate over the ‘missing heritability’ of complex traits has activated researchers to look beyond to more complex processes such as epistasis and environmental effects. We welcome any comments either here on Haldane’s Sieve or in the comments section of biorXiv where are article is currently posted.

References
2Sexes_1Genome. 2012-16. Edward H. Morrow. FP7 ERC Starting Grant – Evolutionary, population and environmental biology. http://www.2020-horizon.com/2SEXES-1GENOME-Sex-specific-genetic-effects-on-fitness-and-human-disease(2SEXES-1GENOME)-s2903.html
Darwin, C. 1871. The Descent of Man. Prometheus Books, New York.
Kong, A., Thorleifsson, G., Frigge, M.L., Masson, G., Gudbjartsson, D.F., Villemoes, R., et al. 2013. Common and low-frequency variants associated with genome-wide recombination rate. Nat. Genet. doi:10.1038/ng.2833.
Morrow, E.H. & Connallon, T. 2013. Implications of sex-specific selection for the genetic basis of disease. Evol. Appl. doi:10.1111/eva.12097.
Ober, C., Loisel, D.A. & Gilad, Y. 2008. Sex-specific genetic architecture of human disease. Nat Rev Genet 9: 911–922.

Biophysical Fitness Landscapes for Transcription Factor Binding Sites

Biophysical Fitness Landscapes for Transcription Factor Binding Sites
Allan Haldane, Michael Manhart, Alexandre V. Morozov
(Submitted on 3 Dec 2013)

Evolutionary trajectories and phenotypic states available to cell populations are ultimately dictated by intermolecular interactions between DNA, RNA, proteins, and other molecular species. Here we study how evolution of gene regulation in a single-cell eukaryote S. cerevisiae is affected by the interactions between transcription factors (TFs) and their cognate genomic sites. Our study is informed by high-throughput in vitro measurements of TF-DNA binding interactions and by a comprehensive collection of genomic binding sites. Using an evolutionary model for monomorphic populations evolving on a fitness landscape, we infer fitness as a function of TF-DNA binding energy for a collection of 12 yeast TFs, and show that the shape of the predicted fitness functions is in broad agreement with a simple thermodynamic model of two-state TF-DNA binding. However, the effective temperature of the model is not always equal to the physical temperature, indicating selection pressures in addition to biophysical constraints caused by TF-DNA interactions. We find little statistical support for the fitness landscape in which each position in the binding site evolves independently, showing that epistasis is common in evolution of gene regulation. Finally, by correlating TF-DNA binding energies with biological properties of the sites or the genes they regulate, we are able to rule out several scenarios of site-specific selection, under which binding sites of the same TF would experience a spectrum of selection pressures depending on their position in the genome. These findings argue for the existence of universal fitness landscapes which shape evolution of all sites for a given TF, and whose properties are determined in part by the physics of protein-DNA interactions.

High Genetic Diversity and Adaptive Potential of Two Simian Hemorrhagic Fever Viruses in a Wild Primate Population

High Genetic Diversity and Adaptive Potential of Two Simian Hemorrhagic Fever Viruses in a Wild Primate Population
Adam L. Bailey, Michael Lauck, Andrea Weiler, Samuel D. Sibley, Jorge M. Dinis, Zachary Bergman, Chase W. Nelson, Michael Correll, Michael Gleicher, David Hyeroba, Alex Tumukunde, Geoffrey Weny, Colin Chapman, Jens Kuhn, Austin Hughes, Thomas C. Friedrich, Tony L. Goldberg, David H. O’Connor

Key biological properties such as high genetic diversity and high evolutionary rate enhance the potential of certain RNA viruses to adapt and emerge. Identifying viruses with these properties in their natural hosts could dramatically improve disease forecasting and surveillance. Recently, we discovered two novel members of the viral family Arteriviridae: simian hemorrhagic fever virus (SHFV)-krc1 and SHFV-krc2, infecting a single wild red colobus (Procolobus rufomitratus tephrosceles) in Kibale National Park, Uganda. Nearly nothing is known about the biological properties of SHFVs in nature, although the SHFV type strain, SHFV-LVR, has caused devastating outbreaks of viral hemorrhagic fever in captive macaques. Here we detected SHFV-krc1 and SHFV-krc2 in 40% and 47% of 60 wild red colobus tested, respectively. We found viral loads in excess of 1×10^6-1×10^7 RNA copies per milliliter of blood plasma for each of these viruses. SHFV-krc1 and SHFV-krc2 also showed high genetic diversity at both the inter- and intra-host levels. Analyses of synonymous and non-synonymous nucleotide diversity across viral genomes revealed patterns suggestive of positive selection in SHFV open reading frames (ORF) 5 (SHFV-krc2 only) and 7 (SHFV-krc1 and SHFV-krc2). Thus, these viruses share several important properties with some of the most rapidly evolving, emergent RNA viruses.

Variational Inference of Population Structure in Large SNP Datasets

Variational Inference of Population Structure in Large SNP Datasets
Anil Raj, Matthew Stephens, Jonathan K Pritchard

Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a dataset and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data, and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias towards detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

Author post: Evolution at two levels of gene expression in yeast

This guest post is by Carlo Arteri and Hunter Fraser on their preprint Evolution at two levels of gene expression in yeast, arXived here

Taking studies of regulatory evolution to the next level: translation

Understanding the molecular basis of regulatory variation within and between species has become a major focus of modern genetics. For instance, the majority of identified human disease-risk alleles lie in non-coding regions of the genome, suggesting that they affect gene regulation (Epstein 2009). Furthermore, it has been argued that regulatory changes have played a dominant role in explaining uniquely human attributes (King and Wilson 1975). However, our knowledge of gene regulatory evolution is based almost entirely on studies of mRNA levels, despite both the greater functional importance of protein abundance, and evidence that post-transcriptional regulation is pervasive. The availability of high-throughput methods for measuring mRNA abundance coupled to the lack of comparable methods at the protein level have contributed to this focus; however, a new method known as ribosome profiling (Ingolia et al. 2009) has enabled us to study divergence in the regulation of translation.

‘Riboprofiling’ involves the construction of two RNA-seq libraries: one measuring mRNA abundance (the ‘mRNA’ fraction), and the second capturing the portion of the transcriptome that is actively being translated by ribosomes (the ‘Ribo’ fraction). We performed riboprofiling on interspecific hybrids of two closely related species of budding yeast, Saccharomyces cerevisiae and S. paradoxus, (~5 million years diverged) as well as the parental strains. As both parental alleles at a locus share the same trans cellular environment in the hybrid, differences in the relative allelic abundance (termed allele-specific expression, or ASE) reveal cis-regulatory divergence. Consequently, interspecies differences not attributable to cis-effects indicate trans divergence. By measuring differences in the magnitudes of ASE between the two hybrid riboprofiling fractions, we identified independent cis and trans regulatory changes in both mRNA abundance and translational efficiency.

We found that both cis and trans regulatory divergence in translation are widespread, and of comparable magnitude to divergence at the mRNA level – indicating that we miss much regulatory evolution by focusing on mRNA in isolation. Moreover, we observed an overwhelming bias towards divergence in opposing parental directions, suggesting the action of stabilizing selection in order to maintain more similar protein levels between species than would be expected by comparing mRNA abundances alone. Interestingly, while we confirmed the results of previous studies indicating that both cis and trans regulatory divergence at the mRNA level are associated with the presence of TATA boxes and nucleosome free regions in promoters, no such relationship was found for translational divergence, indicating that these regulatory systems have different underlying architectures.

We also searched for evidence of polygenic selection in and between both regulatory levels by applying a recently developed modification of Orr’s sign test (Orr 1998; Fraser et al. 2010; Bullard et al. 2010). Under neutral divergence, no pattern is expected with regards to the parental direction of up or down-regulating alleles among orthologs within a functional group (e.g., a pathway or multi-gene complex). However, a significant bias towards one parental lineage is evidence of lineage-specific selection. This analysis uncovered evidence of polygenic selection at both regulatory levels in a number of functional groups. In particular, genes involved in tolerance to heavy metals were enriched for reinforcing divergence in mRNA abundance and translation favoring S. cerevisiae. Increased tolerance to these metals has been observed in S. cerevisiae (Warringer et al. 2011), suggesting that domesticated yeasts have experienced a history of polygenic adaptation across regulatory levels allowing them to grow on metals such as copper. Finally, we also uncovered multiple instances of stop-codon readthrough that are conserved between species, highlighting yet another post-transcriptional mechanism leading to increased proteomic diversity.

By applying a novel approach to a long-standing question, our analysis has revealed the underappreciated complexity of post-transcriptional regulatory divergence. We argue that partitioning the search for the locus of selection into the binary categories of ‘coding’ vs. ‘regulatory’ overlooks the many opportunities for selection to act at multiple regulatory levels along the path from genotype to phenotype.

References:

Bullard JH, Mostovoy Y, Dudoit S, Brem RB. 2010. Polygenic and directional regulatory evolution across pathways in Saccharomyces. Proc Natl Acad Sci USA 107: 5058-5063.

Epstein DJ. 2009. Cis-regulatory mutations in human disease. Brief Funct Genomic Proteomic 8: 310–316.

Fraser HB, Moses AM, Schadt EE. 2010. Evidence for widespread adaptive evolution of gene expression in budding yeast. Proc Natl Acad Sci USA 107: 2977-2982.

Ingolia NT, Ghaemmaghami S, Newman JR, Weissman JS. 2009. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324:218-223.

King MC, Wilson AC. 1975. Evolution at two levels in humans and chimpanzees. Science 188: 107-116.

Orr HA. 1998. Testing natural selection vs. genetic drift in phenotypic evolution using quantitative trait locus data. Genetics 149: 2099-2104.

Warringer J, Zörgö E, Cubillos FA, Zia A, Gjuvsland A, Simpson JT, Forsmark A, Durbin R, Omholt SW, Louis EJ, Liti G, Moses A, Blomberg A. 2011. Trait variation in yeast is defined by population history. PLoS Genet 7 :e1002111.

A Robust Model-free Approach for Rare Variants Association Studies Incorporating Gene-Gene and Gene-Environmental Interactions

A Robust Model-free Approach for Rare Variants Association Studies Incorporating Gene-Gene and Gene-Environmental Interactions
Ruixue Fan, Shaw-Hwa Lo
(Submitted on 2 Dec 2013)

Recently more and more evidence suggests that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G-G) and gene-environmental (G-E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G-G or G-E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G-G and G-E interactions.

Ploidy and the Predictability of Evolution in Fisher’s Geometric Model

Ploidy and the Predictability of Evolution in Fisher’s Geometric Model
Sandeep Venkataram, Diamantis Sellis, Dmitri A Petrov

Predicting adaptive evolutionary trajectories is a primary goal of evolutionary biology. One can differentiate between forward and backward predictability, where forward predictability measures the likelihood of the same adaptive trajectory occurring in independent evolutions and backward predictability measures the likelihood of a particular adaptive path given the knowledge of starting and final states. Recent studies have attempted to measure both forward and backward predictability using experimental evolution in asexual haploid microorganisms. Similar experiments in diploid organisms have not been conducted. Here we simulate adaptive walks using Fisher’s Geometric Model in haploids and diploids and find that adaptive walks in diploids are less forward- and more backward-predictable than adaptive walks in haploids. We argue that the difference is due to the ability of diploids in our simulations to generate transiently stable polymorphisms and to allow adaptive mutations of larger phenotypic effect. As stable polymorphisms can be generated in both haploid and diploid natural populations through a number of mechanisms, we argue that inferences based on experiments in which adaptive walks proceed through succession of monomorphic states might miss many of the key features of adaptation.

Evolution at two levels of gene expression in yeast

Evolution at two levels of gene expression in yeast
Carlo G. Artieri, Hunter B. Fraser
(Submitted on 27 Nov 2013)

Despite the greater functional importance of protein levels, our knowledge of gene expression evolution is based almost entirely on studies of mRNA levels. In contrast, our understanding of how translational regulation evolves has lagged far behind. Here we have applied ribosome profiling – which measures both global mRNA levels and their translation rates – to two species of Saccharomyces yeast and their interspecific hybrid in order to assess the relative contributions of changes in mRNA abundance and translation to regulatory evolution. We report that both cis and trans-acting regulatory divergence in translation are abundant, affecting at least 35% of genes. The majority of translational divergence acts to buffer changes in mRNA abundance, suggesting a widespread role for stabilizing selection acting across regulatory levels. Nevertheless, we observe evidence of lineage-specific selection acting on a number of yeast functional modules, including instances of reinforcing selection acting at both levels of regulation. Finally, we also uncover multiple instances of stop-codon readthrough that are conserved between species. Our analysis reveals the under-appreciated complexity of post-transcriptional regulatory divergence and indicates that partitioning the search for the locus of selection into the binary categories of ‘coding’ vs. ‘regulatory’ may overlook a significant source of selection, acting at multiple regulatory levels along the path from genotype to phenotype.

Author post: Patterns of positive selection in seven ant genomes

This guest post is by Julien Roux on Roux et al. “Patterns of positive selection in seven ant genomes“, arXived here.

The publication of the honeybee genome in 2006 can be considered the birth date of "sociogenomics", a research field whose agenda is to understand social life in molecular terms. Recently, this field has entered a period of rapid discovery with the publication of full genome sequences of multiple Hymenoptera species. In particular, the release of seven ant genome sequences gave us the opportunity to look for the molecular origins of some of the spectacular adaptations of the ant lineage, through patterns of positive selection on amino-acid substitutions in ant genes. We used rigorous methods to detect episodic positive selection while controlling for false positives inspired by the database Selectome. All data is publicly available for people to reuse.

An original aspect of our paper is that we analyzed not only ant genomes, but also data from 10 species of bees and 12 species of flies with the same methods to permit an unbiased comparison of positive selection patterns between lineages. For example, immune genes were enriched for positive selection signal in all three lineages. This may not look surprising since these are classical hits of positive selection scans, but it was previously hypothesized that the evolution of social hygienic behaviors in ants and bees may have relaxed the selective pressure on immune genes. Our analysis indicates that this effect is either absent or relatively small.

Other hypotheses have been put forward in relation to the evolution of sociality in Hymenoptera. Notably, it was proposed that the challenges of social life in the colonies should be reflected by increased positive selection signal on neurogenesis genes. Similarly, because communication is mostly based on chemical signals in colonies of social insects, it was suggested that increased positive selection should be observed on olfactory receptors compared to non-social insects. Our results question both these hypotheses, since we observed that increased positive selection on these classes of genes does not coincide with (but predated) the evolution of sociality in Hymenoptera.

Finally, the comparison between the three lineages allowed us to pinpoint some patterns that were most likely specific to the ant lineage. We found less positive selection on genes related to metabolism in ants compared to bees and flies. We think this could be the sign of relaxed selection on these genes, possibly in relation to the important reduction on metabolic needs with the loss of flight in ant workers. By contrast, we identified a robust pattern of directional selection specific to the ant lineage on genes functioning in the mitochondria. Several pieces of evidence suggest that this pattern might be linked to the remarkable lifespan extension that evolved in the ant lineage. Queens of some ant species can indeed live up to 100 times longer than solitary insects, (that is up to 30 years!). Positive selection possibly played a role in optimizing the activity of mitochondria, where the respiratory chain is the primary source of production of Reactive Oxidative Species (ROS), an important proximal cause for aging, thus contributing to the evolution of increased lifespan in ants.

In conclusion, protein level episodic positive selection appears to have played an important role in the evolution of social insects, notably regarding strong mitochondrial adaptation in ants.