The roles of standing genetic variation and evolutionary history in determining the evolvability of anti-predator strategies

The roles of standing genetic variation and evolutionary history in determining the evolvability of anti-predator strategies
Jordan Fish, Daniel R O’Donnell, Abhijna Parigi, Ian Dworkin, Aaron P Wagner
Standing genetic variation and the historical environment in which that variation arises (evolutionary history) are both potentially significant determinants of a population’s capacity for evolutionary response to a changing environment. We evaluated the relative importance of these two factors in influencing the evolutionary trajectories in the face of sudden environmental change. We used the open-ended digital evolution software Avida to examine how historic exposure to predation pressures, different levels of genetic variation, and combinations of the two, impact anti-predator strategies and competitive abilities evolved in the face of threats from new, invasive, predator populations. We show that while standing genetic variation plays some role in determining evolutionary responses, evolutionary history has the greater influence on a population’s capacity to evolve effective anti-predator traits. This adaptability likely reflects the relative ease of repurposing existing, relevant genes and traits, and the broader potential value of the generation and maintenance of adaptively flexible traits in evolving populations.

Genome-Wide Introgression Revealed Pervasive Hybrid Incompatibilities (HI) between Caenorhabditis species

Genome-Wide Introgression Revealed Pervasive Hybrid Incompatibilities (HI) between Caenorhabditis species
Yu Bi, Xiaoliang Ren, Cheung Yan, Jiaofang Shao, Dongying Xie, Zhongying Zhao

Systematic characterization of hybrid incompatibility (HI) between related species remains the key to understanding speciation. The genetic basis of HI has been intensively studied in Drosophila species, but remains largely unknown in other species, including nematodes. This is mainly due to the lack of a sister species with which C. elegans can mate and produce viable progeny. The recent discovery of a C. briggsae sister species, C. sp.9, opened up the possibility of dissecting the genetic basis of HI in nematode species. However, paucity of molecular and genetic tools has prevented the precise mapping of HI loci between the two species. To systematically isolate the HI loci between the nematode species pair, we first generated 96 chromosomally integrated, independent GFP insertions in the C. briggsae genome. We next mapped the GFP insertion site into defined locations using a method we had developed earlier. The dominant and visible markers facilitated the directional crossing of its linked genomic sequences into C. sp.9. We then backcrossed each individual marker into C. sp.9 for at least 15 generations and produced 111 independent introgression lines, which together represent most of the C. briggsae genome. We finally dissected the HI patterns by scoring embryonic lethality, larval arrest, sex ratio, fertility, male sterility and inviability in a subset of the introgression lines, and identified pervasive HIs between the two species. The study produced a genome-wide landscape of HI between nematode species for the first time. The initial crossing results confirmed the Haldane?s rule and the fertility data from homozygous introgressions supported the rule of large X effect. The large collection of introgression lines allows mapping of numerous HI loci into defined genomic regions between C. briggsae and C. sp.9, thus facilitating further characterization of their genetic and molecular mechanisms. Importantly, the study permits comparative analysis of speciation genetics between nematodes and other species.

Investigating speciation in face of polyploidization: what can we learn from approximate Bayesian computation approach?

Investigating speciation in face of polyploidization: what can we learn from approximate Bayesian computation approach?
Camille Roux, John Pannell

Despite its importance in the diversification of many eucaryote clades, particularly plants, detailed genomic analysis of polyploid species is still in its infancy, with published analysis of only a handful of model species to date. Fundamental questions concerning the origin of polyploid lineages (e.g., auto- vs. allopolyploidy) and the extent to which polyploid genomes display different modes of inheritance are poorly resolved for most polyploids, not least because they have hitherto required detailed karyotypic analysis or the analysis of allele segregation at multiple loci in pedigrees or artificial crosses, which are often not practical for non-model species. However, the increasing availability of sequence data for non-model species now presents an opportunity to apply established approaches for the evolutionary analysis of genomic data to polyploid species complexes. Here, we ask whether approximate Bayesian computation (ABC), applied to sequence data produced by next-generation sequencing technologies from polyploid taxa, allows correct inference of the evolutionary and demographic history of polyploid lineages and their close relatives. We use simulations to investigate how the number of sampled individuals, the number of surveyed loci and their length affect the accuracy and precision of evolutionary and demographic inferences by ABC, including the mode of polyploidisation, mode of inheritance of polyploid taxa, the relative timing of genome duplication and speciation, and effective populations sizes of contributing lineages. We also apply the ABC framework we develop to sequence data from diploid and polyploidy species of the plant genus Capsella, for which we infer an allopolyploid origin for tetra C. bursa-pastoris ≈ 90,000 years ago. In general, our results indicate that ABC is a promising and powerful method for uncovering the origin and subsequent evolution of polyploid species.

Cross-phenotype meta-analysis reveals large-scale trans-eQTLs mediating patterns of transcriptional co-regulation

Cross-phenotype meta-analysis reveals large-scale trans-eQTLs mediating patterns of transcriptional co-regulation
Boel Brynedal, Towfique Raj, Barbara E Stranger, Robert Bjornson, Benjamin M Neale, Benjamin F Voight, Chris Cotsapas
(Submitted on 7 Feb 2014)

Genetic variation affecting gene regulation is a central driver of phenotypic differences between individuals and can be used to uncover how biological processes are organized in a cell. Although detecting cis-eQTLs is now routine, trans-eQTLs have proven more challenging to find due to the modest variance explained and the multiple tests burden of testing millions of SNPs for association to thousands of transcripts. Here, we successfully map trans-eQTLs with the complementary approach of looking for SNPs associated to the expression of multiple genes simultaneously. We find 732 trans- eQTLs that replicate across two continental populations; each trans-eQTL controls large groups of target transcripts (regulons), which are part of interacting networks controlled by transcription factors. We are thus able to uncover co-regulated gene sets and begin describing the cell circuitry of gene regulation.

The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima

The arrival of the frequent: how bias in genotype-phenotype maps can steer populations to local optima

Ard A Louis, Steffen Schaper
(Submitted on 6 Feb 2014)

Genotype-phenotype (GP) maps specify how the random mutations that change genotypes generate variation by altering phenotypes, which, in turn, can trigger selection. Many GP maps share the following general properties: 1) The number of genotypes NG is much larger than the number of selectable phenotypes; 2) Neutral exploration changes the variation that is accessible to the population; 3) The distribution of phenotype frequencies Fp=Np/NG, with Np the number of genotypes mapping onto phenotype p, is highly biased: the majority of genotypes map to only a small minority of the phenotypes. Here we explore how these properties affect the evolutionary dynamics of haploid Wright-Fisher models that are coupled to a simplified and general random GP map or to a more complex RNA sequence to secondary structure map. For both maps the probability of a mutation leading to a phenotype p scales to first order as Fp, although for the RNA map there are further correlations as well. By using mean-field theory, supported by computer simulations, we show that the discovery time Tp of a phenotype p similarly scales to first order as 1/Fp for a wide range of population sizes and mutation rates in both the monomorphic and polymorphic regimes. These differences in the rate at which variation arises can vary over many orders of magnitude. Phenotypic variation with a larger Fp is therefore be much more likely to arise than variation with a small Fp. We show, using the RNA model, that frequent phenotypes (with larger Fp) can fix in a population even when alternative, but less frequent, phenotypes with much higher fitness are potentially accessible. In other words, if the fittest never `arrive’ on the timescales of evolutionary change, then they can’t fix. We call this highly non-ergodic effect the `arrival of the frequent’.

The disruption of trace element homeostasis due to aneuploidy as a unifying theme in the etiology of cancer

The disruption of trace element homeostasis due to aneuploidy as a unifying theme in the etiology of cancer

Johannes Engelken, Matthias Altmeyer, Renty Franklin

#### #### Abstract for Scientists: While decades of cancer research have firmly established multiple “hallmarks of cancer”, cancer’s genomic landscape remains to be fully understood. Particularly, the phenomenon of aneuploidy – gains and losses of large genomic regions, i.e. whole chromosomes or chromosome arms – and why most cancer cells are aneuploid remains enigmatic. This is despite the achievements of cytogenomics and whole genome sequencing which have successfully pinpointed focal amplifications and focal deletions as well as point mutations affecting numerous genes involved in carcinogenesis. A characteristic of many different cancers is the deregulation of the homeostasis of trace elements, such as copper (Cu), zinc (Zn) and iron (Fe). Concentrations of copper are markedly increased in cancer tissue and the blood plasma of cancer patients, while zinc levels are typically decreased. Here we discuss the hypothesis that the disruption of trace element homeostasis and the phenomenon of aneuploidy might be linked. Our tentative analysis of genomic data from diverse tumor types mainly from The Cancer Genome Atlas (TCGA) project suggests that gains and losses of metal transporter genes occur frequently and correlate well with transporter gene expression levels. Hereby they may confer a cancer-driving selective growth advantage at early and possibly also later stages during cancer development. This idea is consistent with recent observations in yeast, which suggest that through chromosomal gains and losses cells can adapt quickly to new carbon sources, nutrient starvation as well as to copper toxicity. In human cancer development, candidate driving events may include, among others, the gains of zinc transporter genes SLC39A1 and SLC39A4 on chromosome arms 1q and 8q, respectively, and the losses of zinc transporter genes SLC30A5, SLC39A14 and SLC39A6 on 5q, 8p and 18q. The recurrent gain of 3q might be associated with the iron transporter gene TFRC and the loss of 13q with the copper transporter gene ATP7B. By altering cellular trace element homeostasis (especially fluctuations in labile and total zinc) such events might contribute to the initiation of the malignant transformation. Consistently, it has been shown that zinc affects a number of the observed hallmark characteristics including DNA repair, inflammation and apoptosis. We term this model the “aneuploidy metal transporter cancer” (AMTC) hypothesis. While the AMTC hypothesis does not contradict the cancer-promoting role of point and focal mutations in established tumor suppressor genes and oncogenes (e.g. MYC, MYCN, TP53, PIK3CA, BRCA1, ERBB2), it seems possible that some of these mutations may be a response to the prior disruption of trace element homeostasis. We suggest a number of approaches for how this hypothesis could be tested experimentally and briefly touch on possible implications for cancer etiology, metastasis, drug resistance and therapy.

Nonparametric inference of the distribution of fitness effects across functional categories in humans

Nonparametric inference of the distribution of fitness effects across functional categories in humans

Fernando Racimo, Joshua G Schraiber

Quantifying the proportion of polymorphic mutations that are deleterious or neutral is of fundamental importance to our understanding of evolution, disease genetics and the maintenance of variation genome-wide. Here, we develop an approximation to the distribution of fitness effects (DFE) of segregating single-nucleotide mutations in humans. Unlike previous methods, we do not assume that synonymous mutations are neutral, or rely on fitting the DFE of new nonsynonymous mutations to a particular parametric probability distribution, which is poorly motivated on a biological level. We rely on a previously developed method that utilizes a variety of published annotations (including conservation scores, protein deleteriousness estimates and regulatory data) to score all mutations in the human genome based on how likely they are to be affected by negative selection, controlling for mutation rate. We map this score to a scale of fitness coefficients via maximum likelihood using diffusion theory and a Poisson random field model. We then use our coefficient mapping to quantify the distribution of all scored single-nucleotide polymorphisms in Yoruba and Europeans. Our method serves to approximate the DFE of any type of segregating mutations, regardless of its genomic consequence, and so allows us to compare the proportion of mutations that are negatively selected or neutral across various genomic categories, including different types of regulatory sites. We observe that the distribution of intergenic polymorphisms is highly leptokurtic, with a strong peak at neutrality, while the distribution of nonsynonymous polymorphisms is bimodal, with a neutral peak and a second peak at s ≈ −10^(−4). Other types of polymorphisms have shapes that fall roughly in between these two.

Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries using Sparse Linear Regression

Identifying Keystone Species in the Human Gut Microbiome from Metagenomic Timeseries using Sparse Linear Regression

Charles K. Fisher, Pankaj Mehta
(Submitted on 3 Feb 2014)

Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the interactions between species from sequence data. Any algorithm for inferring species interactions must overcome three obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions. Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct “keystone species”, Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut microbiome.

Genetic variants associated with motion sickness point to roles for inner ear development, neurological processes, and glucose homeostasis

Genetic variants associated with motion sickness point to roles for inner ear development, neurological processes, and glucose homeostasis

Bethann S Hromatka, Joyce Y Tung, Amy K Kiefer, Chuong B Do, David A Hinds, Nicholas Eriksson

Roughly one in three individuals is highly susceptible to motion sickness and yet the underlying causes of this condition are not well understood. Despite high heritability, no associated genetic factors have been discovered to date. Here, we conducted the first genome-wide association study on motion sickness in 80,494 individuals from the 23andMe database who were surveyed about car sickness. Thirty-five single-nucleotide polymorphisms (SNPs) were associated with motion sickness at a genome-wide-significant level (p< 5e-8). Many of these SNPs are near genes involved in balance, and eye, ear, and cranial development (e.g., PVRL3, TSHZ1, MUTED, HOXB3, HOXD3). Other SNPs may affect motion sickness through nearby genes with roles in the nervous system, glucose homeostasis, or hypoxia. We show that several of these SNPs display sex-specific effects, with as much as three times stronger effects in women. We searched for comorbid phenotypes with motion sickness, confirming associations with known comorbidities including migraines, postoperative nausea and vomiting (PONV), vertigo, and morning sickness, and observing new associations with altitude sickness and many gastrointestinal conditions. We also show that two of these related phenotypes (PONV and migraines) share underlying genetic factors with motion sickness. These results point to the importance of the nervous system in motion sickness and suggest a role for glucose levels in motion-induced nausea and vomiting, a finding that may provide insight into other nausea-related phenotypes such as PONV. They also highlight personal characteristics (e.g., being a poor sleeper) that correlate with motion sickness, findings that could help identify risk factors or treatments.

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences

motifDiverge: a model for assessing the statistical significance of gene regulatory motif divergence between two DNA sequences
Dennis Kostka, Tara Friedrich, Alisha K. Holloway, Katherine S. Pollard
(Submitted on 1 Feb 2014)

Next-generation sequencing technology enables the identification of thousands of gene regulatory sequences in many cell types and organisms. We consider the problem of testing if two such sequences differ in their number of binding site motifs for a given transcription factor (TF) protein. Binding site motifs impart regulatory function by providing TFs the opportunity to bind to genomic elements and thereby affect the expression of nearby genes. Evolutionary changes to such functional DNA are hypothesized to be major contributors to phenotypic diversity within and between species; but despite the importance of TF motifs for gene expression, no method exists to test for motif loss or gain. Assuming that motif counts are Binomially distributed, and allowing for dependencies between motif instances in evolutionarily related sequences, we derive the probability mass function of the difference in motif counts between two nucleotide sequences. We provide a method to numerically estimate this distribution from genomic data and show through simulations that our estimator is accurate. Finally, we introduce the R package {\tt motifDiverge} that implements our methodology and illustrate its application to gene regulatory enhancers identified by a mouse developmental time course experiment. While this study was motivated by analysis of regulatory motifs, our results can be applied to any problem involving two correlated Bernoulli trials.