Selection and explosive growth may hamper the performance of rare variant association tests

Selection and explosive growth may hamper the performance of rare variant association tests

Lawrence H. Uricchio , John S. Witte , Ryan D. Hernandez
doi: http://dx.doi.org/10.1101/015917

Much recent debate has focused on the role of rare variants in complex phenotypes. However, it is well known that rare alleles can only contribute a substantial proportion of the phenotypic variance when they have much larger effect sizes than common variants, which is most easily explained by natural selection constraining trait-altering alleles to low frequency. It is also plausible that demographic events will influence the genetic architecture of complex traits. Unfortunately, most rare variant association tests do not explicitly model natural selection or non-equilibrium demography. Here, we develop a novel evolutionary model of complex traits. We perform numerical calculations and simulate phenotypes under this model using inferred human demographic and selection parameters. We show that rare variants only contribute substantially to complex traits under very strong assumptions about the relationship between effect size and selection strength. We then assess the performance of state-of-the-art rare variant tests using our simulations across a broad range of model parameters. Counterintuitively, we find that statistical power is lowest when rare variants make the greatest contribution to the additive variance, and that power is substantially lower under our model than previously studied models. While many empirical studies have attempted to identify causal loci using rare variant association methods, few have reported novel associations. Some authors have interpreted this to mean that rare variants contribute little to heritability, but our results show that an alternative explanation is that rare variant tests have less power than previously estimated.

Most viewed on Haldane’s Sieve: February 2015

The most viewed posts on Haldane’s Sieve this month were:

Association mapping reveals the role of mutation-selection balance in the maintenance of genomic variation for gene expression.

Association mapping reveals the role of mutation-selection balance in the maintenance of genomic variation for gene expression.

Emily Josephs , Young Wha Lee , John R. Stinchcombe , Stephen I Wright
doi: http://dx.doi.org/10.1101/015743

The evolutionary forces that maintain genetic variation for quantitative traits within populations remain unknown. One hypothesis suggests that variation is maintained by a balance between new mutations and their removal by selection and drift. Theory predicts that this mutation-selection balance will result in an excess of low-frequency variants and a negative correlation between minor allele frequency and selection coefficients. Here, we test these predictions using the genetic loci associated with total expression variation (‘eQTLs’) and allele-specific expression variation (‘aseQTLs’) mapped within a single population of the plant Capsella grandiflora. In addition to finding eQTLs and aseQTLs for a large fraction of genes, we show that alleles at these loci are rarer than expected and exhibit a negative correlation between effect size and frequency. Overall, our results show that mutation-selection balance is the dominant contributor to genomic variation for expression within a single, outcrossing population.

Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii

Extensive de novo mutation rate variation between individuals and across the genome of Chlamydomonas reinhardtii

Rob W Ness , Andrew D Morgan , Radhakrishnan B Vasanthakrishnan , Nick Colegrave , Peter D Keightley
doi: http://dx.doi.org/10.1101/015693

Describing the process of spontaneous mutation is fundamental for understanding the genetic basis of disease, the threat posed by declining population size in conservation biology, and in much evolutionary biology. However, directly studying spontaneous mutation is difficult because of the rarity of de novo mutations. Mutation accumulation (MA) experiments overcome this by allowing mutations to build up over many generations in the near absence of natural selection. In this study, we sequenced the genomes of 85 MA lines derived from six genetically diverse wild strains of the green alga Chlamydomonas reinhardtii. We identified 6,843 spontaneous mutations, more than any other study of spontaneous mutation. We observed seven-fold variation in the mutation rate among strains and that mutator genotypes arose, increasing the mutation rate dramatically in some replicates. We also found evidence for fine-scale heterogeneity in the mutation rate, driven largely by the sequence flanking mutated sites, and by clusters of multiple mutations at closely linked sites. There was little evidence, however, for mutation rate heterogeneity between chromosomes or over large genomic regions of 200Kbp. Using logistic regression, we generated a predictive model of the mutability of sites based on their genomic properties, including local GC content, gene expression level and local sequence context. Our model accurately predicted the average mutation rate and natural levels of genetic diversity of sites across the genome. Notably, trinucleotides vary 17-fold in rate between the most mutable and least mutable sites. Our results uncover a rich heterogeneity in the process of spontaneous mutation both among individuals and across the genome.

Phen-Gen: Combining Phenotype and Genotype to Analyze Rare Disorders

Phen-Gen: Combining Phenotype and Genotype to Analyze Rare Disorders

Asif Javed , Saloni Agrawal , Pauline Ng
doi: http://dx.doi.org/10.1101/015727

We introduce Phen-Gen, a method which combines patient’s disease symptoms and sequencing data with prior domain knowledge to identify the causative gene(s) for rare disorders. Simulations reveal that the causal variant is ranked first in 88% cases when it is coding; which is 52% advantage over a genotype-only approach and outperforms existing methods by 13-58%. If disease etiology is unknown, the causal variant is assigned top-rank in 71% of simulations.

Catch me if you can: Adaptation from standing genetic variation to a moving phenotypic optimum

Catch me if you can: Adaptation from standing genetic variation to a moving phenotypic optimum

Sebastian Matuszewski , Joachim Hermisson , Michael Kopp
doi: http://dx.doi.org/10.1101/015685
AbstractInfo/HistoryMetrics Preview PDF
Abstract

Adaptation lies at the heart of Darwinian evolution. Accordingly, numerous studies have tried to provide a formal framework for the description of the adaptive process. Out of these, two complementary modelling approaches have emerged: While so-called adaptive-walk models consider adaptation from the successive fixation of de-novo mutations only, quantitative genetic models assume that adaptation proceeds exclusively from pre-existing standing genetic variation. The latter approach, however, has focused on short-term evolution of population means and variances rather than on the statistical properties of adaptive substitutions. Our aim is to combine these two approaches by describing the ecological and genetic factors that determine the genetic basis of adaptation from standing genetic variation in terms of the effect-size distribution of individual alleles. Specifically, we consider the evolution of a quantitative trait to a gradually changing environment. By means of analytical approximations, we derive the distribution of adaptive substitutions from standing genetic variation, that is, the distribution of the phenotypic effects of those alleles from the standing variation that become fixed during adaptation. Our results are checked against individual-based simulations. We find that, compared to adaptation from de-novo mutations, (i) adaptation from standing variation proceeds by the fixation of more alleles of small effect; (ii) populations that adapt from standing genetic variation can traverse larger distances in phenotype space and, thus, have a higher potential for adaptation if the rate of environmental change is fast rather than slow.

Pervasive adaptation of gene expression in Drosophila

Pervasive adaptation of gene expression in Drosophila

Armita Nourmohammad, Joachim Rambeau, Torsten Held, Johannes Berg, Michael Lassig
(Submitted on 23 Feb 2015)

Gene expression levels are important molecular quantitative traits that link genotypes to molecular functions and fitness. In Drosophila, population-genetic studies in recent years have revealed substantial adaptive evolution at the genomic level. However, the evolutionary modes of gene expression have remained controversial. Here we present evidence that adaptation dominates the evolution of gene expression levels in flies. We show that 64% of the observed expression divergence across seven Drosophila species are adaptive changes driven by directional selection. Our results are derived from the variation of expression within species and the time-resolved divergence across a family of related species, using a new inference method for selection. We identify functional classes of adaptively regulated genes, as well as sex-specific adaptation occurring predominantly in males. Our analysis opens a new avenue to map system-wide selection on molecular quantitative traits independently of their genetic basis.

Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes

Calibrating the Human Mutation Rate via Ancestral Recombination Density in Diploid Genomes

Mark Lipson , Po-Ru Loh , Sriram Sankararaman , Nick Patterson , Bonnie Berger , David Reich
doi: http://dx.doi.org/10.1101/015560

The human mutation rate is an essential parameter for studying the evolution of our species, interpreting present-day genetic variation, and understanding the incidence of genetic disease. Nevertheless, our current estimates of the rate are uncertain. Classical methods based on sequence divergence have yielded significantly larger values than more recent approaches based on counting de novo mutations in family pedigrees. Here, we propose a new method that uses the fine-scale human recombination map to calibrate the rate of accumulation of mutations. By comparing local heterozygosity levels in diploid genomes to the genetic distance scale over which these levels change, we are able to estimate a long-term mutation rate averaged over hundreds or thousands of generations. We infer a rate of 1.65 +/- 0.10 x 10^(-8) mutations per base per generation, which falls in between phylogenetic and pedigree-based estimates, and we suggest possible mechanisms to reconcile our estimate with previous studies. Our results support intermediate-age divergences among human populations and between humans and other great apes.

Differential Evolution Approach to Detect Recent Admixture

Differential Evolution Approach to Detect Recent Admixture

Konstantin Kozlov , Dmitry Chebotarov , Mehedi Hassan , Petr Triska , Martin Triska , Pavel Flegontov , Tatiana V Tatarinova
doi: http://dx.doi.org/10.1101/015446

The genetic structure of human populations is extraordinarily complex and of fundamental importance to studies of anthropology, evolution, and medicine. As increasingly many individuals are of mixed origin, there is an unmet need for tools that can infer multiple origins. Misclassification of such individuals can lead to incorrect and costly misinterpretations of genomic data, primarily in disease studies and drug trials. We present an advanced tool to infer ancestry that can identify the biogeographic origins of highly mixed individuals. reAdmix can incorporate individual’s knowledge of ancestors (e.g. having some ancestors from Turkey or a Scottish grandmother). reAdmix is an online tool available at http://chcb.saban-chla.usc.edu/reAdmix/.

Maximum Likelihood Estimation and Phylogenetic Tree based Backward Elimination for reconstructing Viral Haplotypes in a Population

Maximum Likelihood Estimation and Phylogenetic Tree based Backward Elimination for reconstructing Viral Haplotypes in a Population

Raunaq Malhotra, Steven Wu, Allen Rodrigo, Mary Poss, Raj Acharya
(Submitted on 14 Feb 2015)

A viral population can contain a large and diverse collection of viral haplotypes which play important roles in maintaining the viral population. We present an algorithm for reconstructing viral haplotypes in a population from paired-end Next Generation Sequencing (NGS) data. We propose a novel polynomial time dynamic programming based approximation algorithm for generating top paths through each node in De Bruijn graph constructed from the paired-end NGS data. We also propose two novel formulations for obtaining an optimal set of viral haplotypes for the population using the paths generated by the approximation algorithm. The first formulation obtains a maximum likelihood estimate of the viral population given the observed paired-end reads. The second formulation obtains a minimal set of viral haplotypes retaining the phylogenetic information in the population. We evaluate our algorithm on simulated datasets varying on mutation rates and genome length of the viral haplotypes. The results of our method are compared to other methods for viral haplotype estimation. While all the methods overestimate the number of viral haplotypes in a population, the two proposed optimality formulations correctly estimate the exact sequence of all the haplotypes in most datasets, and recover the overall diversity of the population in all datasets. The haplotypes recovered from popular methods are biased toward the reference sequence used for mapping of reads, while the proposed formulations are reference-free and retain the overall diversity in the population.