Deleterious passengers in adapting populations

Deleterious passengers in adapting populations
Benjamin H Good, Michael M Desai
Subjects: Populations and Evolution (q-bio.PE)

Most new mutations are deleterious and are eventually eliminated by natural selection. But in an adapting population, the rapid amplification of beneficial mutations can hinder the removal of deleterious variants in nearby regions of the genome, altering the patterns of sequence evolution. Here, we analyze the interactions between beneficial “driver” mutations and linked deleterious “passengers” during the course of adaptation. We derive analytical expressions for the substitution rate of a deleterious mutation as a function of its fitness cost, as well as the reduction in the beneficial substitution rate due to the genetic load of the passengers. We find that the fate of each deleterious mutation varies dramatically with the rate and spectrum of beneficial mutations, with a non-monotonic dependence on both the population size and the rate of adaptation. By quantifying this dependence, our results allow us to estimate which deleterious mutations will be likely to fix, and how many of these mutations must arise before the progress of adaptation is significantly reduced.

Locus architecture affects mRNA expression levels in Drosophila embryos

Locus architecture affects mRNA expression levels in Drosophila embryos
Tara Lydiard-Martin, Meghan Bragdon, Kelly B Eckenrode, Zeba Wunderlich, Angela H DePace

Structural variation in the genome is common due to insertions, deletions, duplications and rearrangements. However, little is known about the ways structural variants impact gene expression. Developmental genes are controlled by multiple regulatory sequence elements scattered over thousands of bases; developmental loci are therefore a good model to test the functional impact of structural variation on gene expression. Here, we measured the effect of rearranging two developmental enhancers from the even-skipped (eve) locus in Drosophila melanogaster blastoderm embryos. We systematically varied orientation, order, and spacing of the enhancers in transgenic reporter constructs and measured expression quantitatively at single cell resolution in whole embryos to detect changes in both level and position of expression. We found that the position of expression was robust to changes in locus organization, but levels of expression were highly sensitive to the spacing between enhancers and order relative to the promoter. Our data demonstrate that changes in locus architecture can dramatically impact levels of gene expression. To quantitatively predict gene expression from sequence, we must therefore consider how information is integrated both within enhancers and across gene loci.

RNA-seq gene profiling – a systematic empirical comparison

RNA-seq gene profiling – a systematic empirical comparison
Nuno A Fonseca, John A Marioni, Alvis Brazma

Accurately quantifying gene expression levels is a key goal of experiments using RNA-sequencing to assay the transcriptome. This typically requires aligning the short reads generated to the genome or transcriptome before quantifying expression of pre-defined sets of genes. Differences in the alignment/quantification tools can have a major effect upon the expression levels found with important consequences for biological interpretation. Here we address two main issues: do different analysis pipelines affect the gene expression levels inferred from RNA-seq data? And, how close are the expression levels inferred to the “true” expression levels? We evaluate fifty gene profiling pipelines in experimental and simulated data sets with different characteristics (e.g, read length and sequencing depth). In the absence of knowledge of the ‘ground truth’ in real RNAseq data sets, we used simulated data to assess the differences between the true expression and those reconstructed by the analysis pipelines. Even though this approach does not take into account all known biases present in RNAseq data, it still allows to assess the accuracy of the gene expression values inferred by different analysis pipelines. The results show that i) overall there is a high correlation between the expression levels inferred by the best pipelines and the true quantification values; ii) the error in the estimated gene expression values can vary considerably across genes; and iii) a small set of genes have expression estimates with consistently high error (across data sets and methods). Finally, although the mapping software is important, the quantification method makes a greater difference to the results.

qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots

qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots

Stephen D. Turner

Summary: Genome-wide association studies (GWAS) have identified thousands of human trait-associated single nucleotide polymorphisms. Here, I describe a freely available R package for visualizing GWAS results using Q-Q and manhattan plots. The qqman package enables the flexible creation of manhattan plots, both genome-wide and for single chromosomes, with optional highlighting of SNPs of interest. Availability: qqman is released under the GNU General Public License, and is freely available on the Comprehensive R Archive Network (http://cran.r-project.org/package=qqman). The source code is available on GitHub (https://github.com/stephenturner/qqman).

Cosi2 : An efficient simulator of exact and approximate coalescent with selection

Cosi2 : An efficient simulator of exact and approximate coalescent with selection

Ilya Shlyakhter, Pardis C. Sabeti, Stephen F. Schaffner

Motivation: Efficient simulation of population genetic samples under a given demographic model is a prerequisite for many analyses. Coalescent theory provides an efficient framework for such simulations, but simulating longer regions and higher recombination rates remains challenging. Simulators based on a Markovian approximation to the coalescent scale well, but do not support simulation of selection. Gene conversion is not supported by any published coalescent simulators that support selection. Results: We describe cosi2 , an efficient simulator that supports both exact and approximate coalescent simulation with positive selection. cosi2 improves on the speed of existing exact simulators, and permits further speedup in approximate mode while retaining support for selection. cosi2 supports a wide range of demographic scenarios including recombination hot spots, gene conversion, population size changes, population structure and migration. cosi2 implements coalescent machinery efficiently by tracking only a small subset of the Ancestral Recombination Graph, sampling only relevant recombination events, and using augmented skip lists to represent tracked genetic segments. To preserve support for selection in approximate mode, the Markov approximation is implemented not by moving along the chromosome but by performing a standard backwards-in-time coalescent simulation while restricting coalescence to node pairs with overlapping or near-overlapping genetic material. We describe the algorithms used by cosi2 and present comparisons with existing selection simulators.

Properties of selected mutations and genotypic landscapes under Fisher’s Geometric Model

Properties of selected mutations and genotypic landscapes under Fisher’s Geometric Model

François Blanquart, Guillaume Achaz, Thomas Bataillon, Olivier Tenaillon
(Submitted on 14 May 2014)

The fitness landscape – the mapping between genotypes and fitness – determines properties of the process of adaptation. Several small genetic fitness landscapes have recently been built by selecting a handful of beneficial mutations and measuring fitness of all combinations of these mutations. Here we generate several testable predictions for the properties of these landscapes under Fisher’s geometric model of adaptation (FGMA). When far from the fitness optimum, we analytically compute the fitness effect of beneficial mutations and their epistatic interactions. We show that epistasis may be negative or positive on average depending on the distance of the ancestral genotype to the optimum and whether mutations were independently selected or co-selected in an adaptive walk. Using simulations, we show that genetic landscapes built from FGMA are very close to an additive landscape when the ancestral strain is far from the optimum. However, when close to the optimum, a large diversity of landscape with substantial ruggedness and sign epistasis emerged. Strikingly, landscapes built from different realizations of stochastic adaptive walks in the same exact conditions were highly variable, suggesting that several realizations of small genetic landscapes are needed to gain information about the underlying architecture of the global adaptive landscape.

When genomes collide: multiple modes of germline misregulation in a dysgenic syndrome of Drosophila virilis

When genomes collide: multiple modes of germline misregulation in a dysgenic syndrome of Drosophila virilis
Mauricio A. Galdos, Alexandra A. Erwin, Michelle L. Wickersheim, Chris C. Harrison, Kendra D. Marr, Justin Blumenstiel

In sexually reproducing species the union of gametes that are not closely related can result in genomic incompatibility. Hybrid dysgenic syndromes represent a form of genomic incompatibility that can arise when transposable element (TE) abundance differs between two parents. When TEs lacking in the female parent are transmitted paternally, a lack of corresponding silencing small RNAs (piRNAs) transmitted through the female germline can lead to TE mobilization in progeny. The epigenetic nature of this phenomenon is demonstrated by the fact that genetically identical females of the reciprocal cross are normal. Here we show that in the hybrid dysgenic syndrome of Drosophila virilis, an excess of paternally inherited TE families leads not only to increased expression of these TEs, but also coincides with derepression of TEs in equal abundance within parents. Moreover, TE derepression is stable as flies age and associated with piRNA biogenesis defects for only some TEs. At the same time, TE activation is associated with a genome wide shift in the distribution of endogenous gene expression and an increase in abundance of off-target genic piRNAs. To identify regions of the maternal genome that most protect against dysgenesis, we performed an F3 backcross analysis. We find that pericentric regions play a dominant role in maternal protection. This F3 backcross approach additionally allowed us to clarify the properties of genic paramutation in D. virilis. Overall, results support a model in which early germline events in dysgenesis establish a chronic, stable state of mis-expression that is maintained through adulthood. Such early events in the germline that are mediated by parent-of-origin effects may be important in determining patterns of gene expression in natural populations.

Quadri-allele frequency spectrum in a coalescent topology for mutations in non-constant population size

Quadri-allele frequency spectrum in a coalescent topology for mutations in non-constant population size

Arka Bhattacharya
(Submitted on 11 May 2014)

The sample frequency spectrum of a segregating site is the probability distribution of a sample of alleles from a genetic locus, conditional on observing the sample to have more than one clearly different phenotypes. We present a model for analyzing quadri-allele frequency spectrum, where the ancestral population diverged into three populations at a certain divergence time and the resulting mutations on the branches of the coalescent tree gave rise to three different derived alleles, which could be observed in the present generation along with the ancestral allele. The model has been analyzed for non-constant population size, assuming we had a certain number of extant lineages at the divergence time and no migration occurs between the populations.

Effective Genetic Risk Prediction Using Mixed Models

Effective Genetic Risk Prediction Using Mixed Models

David Golan, Saharon Rosset
(Submitted on 12 May 2014)

To date, efforts to produce high-quality polygenic risk scores from genome-wide studies of common disease have focused on estimating and aggregating the effects of multiple SNPs. Here we propose a novel statistical approach for genetic risk prediction, based on random and mixed effects models. Our approach (termed GeRSI) circumvents the need to estimate the effect sizes of numerous SNPs by treating these effects as random, producing predictions which are consistently superior to current state of the art, as we demonstrate in extensive simulation. When applying GeRSI to seven phenotypes from the WTCCC study, we confirm that the use of random effects is most beneficial for diseases that are known to be highly polygenic: hypertension (HT) and bipolar disorder (BD). For HT, there are no significant associations in the WTCCC data. The best existing model yields an AUC of 54%, while GeRSI improves it to 59%. For BD, using GeRSI improves the AUC from 55% to 62%. For individuals ranked at the top 10% of BD risk predictions, using GeRSI substantially increases the BD relative risk from 1.4 to 2.5.

diCal-IBD: demography-aware inference of identity-by-descent tracts in unrelated individuals

diCal-IBD: demography-aware inference of identity-by-descent tracts in unrelated individuals

Paula Tataru, Jasmine A. Nirody, Yun S. Song

Summary: We present a tool, diCal-IBD, for detecting identity-by-descent (IBD) tracts between pairs of genomic sequences. Our method builds on a recent demographic inference method based on the coalescent with recombination, and is able to incorporate demographic information as a prior. Simulation study shows that diCal-IBD has significantly higher recall and precision than that of existing IBD detection methods, while retaining reasonable accuracy for IBD tracts as small as 0.1 cM. Availability: https://sourceforge.net/projects/dical-ibd/ Contact: yss@eecs.berkeley.edu