The Genetic Architecture of Gene Expression Levels in Wild Baboons

The Genetic Architecture of Gene Expression Levels in Wild Baboons

Jenny Tung, Xiang Zhou, Susan C Alberts, Matthew Stephens, Yoav Gilad

Gene expression variation is well documented in human populations and its genetic architecture has been extensively explored. However, we still know little about the genetic architecture of gene expression variation in other species, particularly our closest living relatives, the nonhuman primates. To address this gap, we performed an RNA sequencing (RNA-seq)-based study of 63 wild baboons, members of the intensively studied Amboseli baboon population in Kenya. Our study design allowed us to measure gene expression levels and identify genetic variants using the same data set, enabling us to perform complementary mapping of putative cis-acting expression quantitative trait loci (eQTL) and measurements of allele-specific expression (ASE) levels. We discovered substantial evidence for genetic effects on gene expression levels in this population. Surprisingly, we found more power to detect individual eQTL in the baboons relative to a HapMap human data set of comparable size, probably as a result of greater genetic variation, enrichment of SNPs with high minor allele frequencies, and longer-range linkage disequilibrium in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes. Interestingly, genes with eQTL significantly overlapped between the baboon and human data sets, suggesting that some genes may tolerate more genetic perturbation than others, and that this property may be conserved across species. Finally, we used a Bayesian sparse linear mixed model to partition genetic, demographic, and early environmental contributions to variation in gene expression levels. We found a strong genetic contribution to gene expression levels for almost all genes, while individual demographic and environmental effects tended to be more modest. Together, our results establish the feasibility of eQTL mapping using RNA-seq data alone, and act as an important first step towards understanding the genetic architecture of gene expression variation in nonhuman primates.

Sampling through time and phylodynamic inference with coalescent and birth-death models

Sampling through time and phylodynamic inference with coalescent and birth-death models

Erik M. Volz, Simon DW Frost
(Submitted on 28 Aug 2014)

Many population genetic models have been developed for the purpose of inferring population size and growth rates from random samples of genetic data. We examine two popular approaches to this problem, the coalescent and the birth-death-sampling model, in the context of estimating population size and birth rates in a population growing exponentially according to the birth-death branching process. For sequences sampled at a single time, we found the coalescent and the birth-death-sampling model gave virtually indistinguishable results in terms of the growth rates and fraction of the population sampled, even when sampling from a small population. For sequences sampled at multiple time points, we find that the birth-death model estimators are subject to large bias if the sampling process is misspecified. Since birth-death-sampling models incorporate a model of the sampling process, we show how much of the statistical power of birth-death-sampling models arises from the sequence of sample times and not from the genealogical tree. This motivates the development of a new coalescent estimator, which is augmented with a model of the known sampling process and is potentially more precise than the coalescent that does not use sample time information.

C. elegans harbors pervasive cryptic genetic variation for embryogenesis

C. elegans harbors pervasive cryptic genetic variation for embryogenesis

Annalise Paaby, Amelia White, David Riccardi, Kristin Gunsalus, Fabio Piano, Matthew Rockman

Conditionally functional mutations are an important class of natural genetic variation, yet little is known about their prevalence in natural populations or their contribution to disease risk. Here, we describe a vast reserve of cryptic genetic variation, alleles that are normally silent but which affect phenotype when the function of other genes is perturbed, in the gene networks of C. elegans embryogenesis. We find evidence that cryptic-effect loci are ubiquitous and segregate at intermediate frequencies in the wild. The cryptic alleles demonstrate low developmental pleiotropy, in that specific, rather than general, perturbations are required to reveal them. Our findings underscore the importance of genetic background in characterizing gene function and provide a model for the expression of conditionally functional effects that may be fundamental in basic mechanisms of trait evolution and the genetic basis of disease susceptibility.

Determination of Nonlinear Genetic Architecture using Compressed Sensing

Determination of Nonlinear Genetic Architecture using Compressed Sensing

Chiu Man Ho, Stephen D.H. Hsu
(Submitted on 27 Aug 2014)

We introduce a statistical method that can reconstruct nonlinear genetic models (i.e., including epistasis, or gene-gene interactions) from phenotype-genotype (GWAS) data. The computational and data resource requirements are similar to those necessary for reconstruction of linear genetic models (or identification of gene-trait associations), assuming a condition of generalized sparsity, which limits the total number of gene-gene interactions. An example of a sparse nonlinear model is one in which a typical locus interacts with several or even many others, but only a small subset of all possible interactions exist. It seems plausible that most genetic architectures fall in this category. Our method uses a generalization of compressed sensing (L1-penalized regression) applied to nonlinear functions of the sensing matrix. We give theoretical arguments suggesting that the method is nearly optimal in performance, and demonstrate its effectiveness on broad classes of nonlinear genetic models using both real and simulated human genomes.

Genomic and transcriptomic insights into the regulation of snake venom production

Genomic and transcriptomic insights into the regulation of snake venom production

Adam D Hargreaves, Martin T Swain, Matthew J Hegarty, Darren W Logan, John F Mulley

The gene regulatory mechanisms underlying the rapid replenishment of snake venom following expenditure are currently unknown. Using a comparative transcriptomic approach we find that venomous and non-venomous species produce similar numbers of secreted products in their venom or salivary glands and that only one transcription factor (Tbx3) is expressed in venom glands but not salivary glands. We also find evidence for temporal variation in venom production. We have generated a draft genome sequence for the painted saw-scaled viper, Echis coloratus, and identified conserved transcription factor binding sites in the upstream regions of venom genes. We find binding sites to be conserved across members of the same gene family, but not between gene families, indicating that multiple gene regulatory networks are involved in venom production. Finally, we suggest that negative regulation may be important for rapid activation of the venom replenishment cycle.

Fixation in large populations: a continuous view of a discrete problem

Fixation in large populations: a continuous view of a discrete problem

Fabio A. C. C. Chalub, Max O. Souza
(Submitted on 27 Aug 2014)

We study fixation in large, but finite populations with two types, and dynamics governed by birth-death processes. By considering a restricted class of such processes, which includes most classical evolutionary processes, we derive a continuous approximation for the probability of fixation that is valid beyond the weak-selection (WS) limit. Indeed, in the derivation three regimes naturally appear: selection-driven, balanced, and quasi-neutral — the latter two require WS, while the former can appear with or without WS. From the continuous approximations, we then obtain asymptotic approximations for evolutions with at most one equilibrium, in the selection-driven regime, that does not preclude a weak-selection regime. As an application, we show that the fixation pattern for the Hawk and Dove game satisfies what we term the one-half law: if the Evolutionary Stable Strategy (ESS) is outside a small interval around $\sfrac{1}{2}$, the fixation is of dominance type. We also show that outside of the weak-selection regime the dynamics of large populations can have very little resemblance to the infinite population case. In addition, we also show results for the case of two equilibria. Finally, we present a continuous restatement of the definition of an ESSN strategy, that is valid for large populations. We then present two applications of this restatement: we obtain a definition valid in the quasi-neutral regime that recovers the one-third law under linear fitness and, as a generalisation, we introduce the concept of critical-frequency.

Sexual dimorphism in epigenomic responses of stem cells to extreme fetal growth

Sexual dimorphism in epigenomic responses of stem cells to extreme fetal growth

Fabien Delahaye, Neil Ari Wijetunga, Hye J Heo, Jessica N Tozour, Yong Mei Zhao, John M Greally, Francine H Einstein

Extreme fetal growth is associated with increased susceptibility to a range of adult diseases through an unknown mechanism of cellular memory. We tested whether heritable epigenetic processes in long-lived CD34+ hematopoietic stem/progenitor cells (HSPCs) showed evidence for re-programming associated with the extremes of fetal growth. Here we show that both fetal growth restriction and over-growth are associated with global shifts towards DNA hypermethylation, targeting cis-regulatory elements in proximity to genes involved in glucose homeostasis and stem cell function. A sexually dimorphic response was found, intrauterine growth restriction (IUGR) associated with substantially greater epigenetic dysregulation in males but large for gestational age (LGA) growth affecting females predominantly. The findings are consistent with extreme fetal growth interacting with variable fetal susceptibility to influence cellular aging and metabolic characteristics through epigenetic mechanisms, potentially generating biomarkers that could identify infants at higher risk for chronic disease later in life.

Author post: An amino acid polymorphism in the Drosophila insulin receptor demonstrates pleiotropic and adaptive function in life history trait

This next guest post is by Annalise Paaby on her paper: Paaby et al. “An amino acid polymorphism in the Drosophila insulin receptor demonstrates pleiotropic and adaptive function in life history traits” bioRxived here.

Find the alleles!
Organisms vary, even within populations, in ways that appear adaptive. We would very much like to identify the genetic elements that encode these phenotypic differences—but this is a challenging task. For polygenic traits, the tiny contributions of single loci can be near-impossible to detect in an experimental setting. In contrast, natural selection operates on a grand scale, with power to discriminate between alleles. We took advantage of the fact that Drosophila melanogaster are distributed across an extreme environmental gradient in order to identify a specific polymorphism that contributes to adaptive variation.D. melanogaster live along the east coasts of North America and Australia. On both continents, flies in low-latitude, warm environments develop faster and are more fecund, while flies in high-latitude, cold environments live longer and are more resistant to most stresses.Knocking out insulin signaling genes extends lifespan, increases stress tolerance, and reduces reproduction. Given these phenotypes, we wondered whether insulin signaling genes might vary in natural populations and influence life history. In a paper published a few years ago, we showed that alleles of a polymorphism in the Insulin-like Receptor (InR) showed clines in frequency in both North America and Australia. Since the populations were founded at different times from different source populations, the replicated pattern on separate continents is good evidence that the polymorphism is a target of selection.

What is this polymorphism?
The polymorphism we discovered is a complex indel that disrupts a region of glutamines and histidines in the first exon of InR. In our original survey, we found many segregating alleles, all differing in length by multiples of three nucleotides.H owever, two alleles comprise the majority. An allele we call InRshort is common at high latitudes, and InRlong, which is six nucleotides longer, is common at low latitudes. The alleles differ in four amino acids across a span of 16 residues.

The alleles affect signaling
In our current study, we show that InRshort and InRlong affect levels of insulin signaling. We took InRshort and InRlong flies from a single population in New York, replaced the X and second chromosomes, and randomized the genetic backgrounds of the third chromosome, on which InR resides. We measured levels of insulin signaling in test lines by performing qPCR on seven transcriptional targets in the pathway, all downstream of the receptor.We found that for five of the seven targets (four of which were significant), signaling was highest in InRlong, lowest in InRshort, and intermediate in the heterozygote—suggesting that InRshort and InRlong act additively on signaling levels. The directionality of these results makes sense: reduction of insulin signaling is known to extend lifespan, increase stress tolerance and reduce reproductive success, and these are the phenotypes we see at high latitudes where InRshort is common.

Fluctuations over time
In our new study, we returned to the North American populations we evaluated five years prior. However, this time around we mapped 100-bp paired-end reads from pooled population samples. (These data relate to Alan Bergland’s larger exploration of spatial and temporal variation in D. melanogaster, described here on arXiv.) We called each of the discrete polymorphisms within the complex indel polymorphism—SNPs or small indels—individually. Some of those discrete polymorphisms distinguish between the InRshort and InRlong alleles, and they confirm that the clines persist in North America.We reasoned that alleles prevalent in high-latitude, cold climates might be selected for in the winter, and alleles prevalent in low-latitude, warm climates might be selected for in the summer. We examined a Pennsylvania population at multiple timepoints over three years and saw dramatic fluctuations in allele frequency (changes of approximately 20%) for discrete polymorphisms associated with InRshort and InRlong. As predicted, the “winter” and “summer” alleles were those common at high and low latitudes, respectively.However, the polymorphisms that showed the most dramatic fluctuations over seasonal time were not necessarily those with the strongest clines in frequency across geographical space. We suggest that aspects of demography and selection probably vary between seasonal and geographical environments, even in the face of apparently similar climatic pressures.

A question of pleiotropy
A longstanding question in the field of life history evolution is whether single alleles affect multiple traits at once (pleiotropy) or affect traits individually but reside near each other (linkage). The question itself arises from the observation, made many times over, that life history traits are typically correlated. For example, long-lived individuals often show reduced reproductive fitness. Longevity is also often positively correlated with the ability to tolerate stress. Do the same genetic variants encode multiple trait phenotypes?We assayed our InRshort and InRlong test lines for multiple phenotypes: fecundity, development time, body size and allometry, body weight and lipid content, tolerance for multiple stresses, and lifespan. We used the test lines described above, a replicate set of InRshort and InRlong lines derived from a second population, and lines in which we measured the effects of InRshort and InRlong in an InRhypomorph mutant background.Our full report can be found in the manuscript, but the take-home message is that InRshort and InRlong are significantly associated with all of the tested traits, in directions predicted by a selection regime favoring fast development time, rapid egg-laying, and high heat tolerance in warm climates, and resistance to cold and starvation stresses in cold climates. The InRshort allele was also associated with increased lifespan in males, though we do not necessarily expect that lifespan itself is associated with fitness.In conclusion, our results implicate insulin signaling as a major mediator of life history adaptation in D. melanogaster, and suggest that tradeoffs can be explained by extensive pleiotropy at a single locus.

Some other things I would like to mention
I value this study for its functional tests—phenotypic effects of candidate polymorphisms are often missing from evolutionary studies. However, and this is a major caveat: the InRshort and InRlong alleles were embedded in genotypic backgrounds that extended well beyond the locus in the test lines. On their own, I do not consider the functional tests definitive. But D. melanogaster have low linkage disequilibrium, which we know decays rapidly just outside our candidate polymorphism. In my opinion, the segregation of InRshort and InRlong in large, recombining wild populations pinpoints the functional alleles, while the experimental assays confirm our hypotheses about the selection regime.When we first measured fecundity, we counted every single egg laid by every single female over every single one of their lives. And the InRlong females, which we knew were more fecund—their culture bottles grew like gangbusters—laid only five more eggs on average than InRshort females! Highly non-significant. But, it looked like the InRlong flies laid eggs faster. We set up a different assay to measure eggs laid in the first day, and InRlong was six times more fecund. I think this provides an important lesson. We can easily imagine big fitness consequences for egg laying rate, but we might not think to measure it in the lab. Many studies, especially those from a molecular genetics point of view, have been keen to emphasize decoupling of lifespan and reproduction for so-called longevity genes. For conclusions drawn about natural genetic variants (which are the ones of utmost relevance, in my opinion), the question of tradeoffs must consider those fitness axes that are relevant to the wild organism. And these are often unknowable.We found that InRshort and InRlong were associated with smaller and larger body sizes, respectively. This makes sense in terms of levels of insulin signaling, but not in terms of body sizes in wild populations. High latitude flies are typically larger, not smaller. So, if InRshort and InRlong alleles affect body size, they either do so epistatically with other body size loci or they suffer antagonistic selection pressures along multiple fitness axes. Interesting!

DISEASES: Text mining and data integration of disease–gene associations

DISEASES: Text mining and data integration of disease–gene associations

Sune Pletscher-Frankild, Albert Pallejà, Kalliopi Tsafou, Janos X Binder, Lars Juhl Jensen

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease–gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease–gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a user-friendly web interface at, where the text-mining software and all associations are also freely available for download.

A genomic map of the effects of linked selection in Drosophila

A genomic map of the effects of linked selection in Drosophila

Eyal Elyashiv, Shmuel Sattath, Tina T. Hu, Alon Strustovsky, Graham McVicker, Peter Andolfatto, Graham Coop, Guy Sella
(Submitted on 23 Aug 2014)

Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of ‘linked selection’ on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of linked selection from non-classic sweeps. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.