The genetic architecture of local adaptation I: The genomic landscape of foxtail pine (Pinus balfouriana Grev. & Balf.) as revealed from a high-density linkage map

The genetic architecture of local adaptation I: The genomic landscape of foxtail pine (Pinus balfouriana Grev. & Balf.) as revealed from a high-density linkage map
Christopher J Friedline, Brandon M Lind, Erin M Hobson, Douglas E Harwood, Annette Delfino-Mix, Patricia E Maloney, Andrew J Eckert
doi: http://dx.doi.org/10.1101/011106

Explaining the origin and evolutionary dynamics of the genetic architecture of adaptation is a major research goal of evolutionary genetics. Despite controversy surrounding success of the attempts to accomplish this goal, a full understanding of adaptive genetic variation necessitates knowledge about the genomic location and patterns of dispersion for the genetic components affecting fitness-related phenotypic traits. Even with advances in next generation sequencing technologies, the production of full genome sequences for non-model species is often cost prohibitive, especially for tree species such as pines where genome size often exceeds 20 to 30 Gbp. We address this need by constructing a dense linkage map for fox- tail pine (Pinus balfouriana Grev. & Balf.), with the ultimate goal of uncovering and explaining the origin and evolutionary dynamics of adaptive genetic variation in natural populations of this forest tree species. We utilized megagametophyte arrays (n = 76–95 megagametophytes/tree) from four maternal trees in combination with double-digestion restriction site associated DNA sequencing (ddRADseq) to produce a consensus linkage map covering 98.58% of the foxtail pine genome, which was estimated to be 1276 cM in length (95% CI: 1174cM to 1378cM). A novel bioinformatic approach using iterative rounds of marker ordering and imputation was employed to produce single-tree linkage maps (507–17066 contigs/map; lengths: 1037.40–1572.80 cM). These linkage maps were collinear across maternal trees, with highly correlated marker orderings (Spearman’s ρ > 0.95). A consensus linkage map derived from these single-tree linkage maps contained 12 linkage groups along which 20 655 contigs were non-randomly distributed across 901 unique positions (n = 23 contigs/position), with an average spacing of 1.34 cM between adjacent positions. Of the 20 655 contigs positioned on the consensus linkage map, 5627 had enough sequence similarity to contigs contained within the most recent build of the loblolly pine (P. taeda L.) genome to identify them as putative homologs containing both genic and non-genic loci. Importantly, all 901 unique positions on the consensus linkage map had at least one contig with putative homology to loblolly pine. When combined with the other biological signals that predominate in our data (e.g., correlations of recombination fractions across single trees), we show that dense linkage maps for non-model forest tree species can be efficiently constructed using next generation sequencing technologies. We subsequently discuss the usefulness of these maps as community-wide resources and as tools with which to test hypotheses about the genetic architecture of adaptation.

CauseMap: Fast inference of causality from complex time series

CauseMap: Fast inference of causality from complex time series
M. Cyrus Maher​, Ryan D. Hernandez

Background: Establishing health-related causal relationships is a central pursuit in biomedical research. Yet, the interdependent non-linearity of biological systems renders causal dynamics laborious and at times impractical to disentangle. This pursuit is further impeded by the dearth of time series that are sufficiently long to observe and understand recurrent patterns of flux. However, as data generation costs plummet and technologies like wearable devices democratize data collection, we anticipate a coming surge in the availability of biomedically-relevant time series data. Given the life-saving potential of these burgeoning resources, it is critical to invest in the development of open source software tools that are capable of drawing meaningful insight from vast amounts of time series data.

Results: Here we present CauseMap, the first open source implementation of convergent cross mapping (CCM), a method for establishing causality from long time series data (> ~25 observations). Compared to existing time series methods, CCM has the advantage of being model-free and robust to unmeasured confounding that could otherwise induce spurious associations. CCM builds on Takens’ Theorem, a well-established result from dynamical systems theory that requires only mild assumptions. This theorem allows us to reconstruct high dimensional system dynamics using a time series of only a single variable. These reconstructions can be thought of as shadows of the true causal system. If the reconstructed shadows can predict points from the opposing time series, we can infer that the corresponding variables are providing views of the same causal system, and so are causally related. Unlike traditional metrics, this test can establish the directionality of causation, even in the presence of feedback loops. Furthermore, since CCM can extract causal relationships from times series of, e.g. a single individual, it may be a valuable tool to personalized medicine. We implement CCM in Julia, a high-performance programming language designed for facile technical computing. Our software package, CauseMap, is platform-independent and freely available as an official Julia package.

Conclusions: CauseMap is an efficient implementation of a state-of-the-art algorithm for detecting causality from time series data. We believe this tool will be a valuable resource for biomedical research and personalized medicine.

Most viewed on Haldane’s Sieve: October 2014

The most viewed preprints this month were:

Analyses of Eurasian wild and domestic pig genomes reveals long-term gene-flow during domestication

Analyses of Eurasian wild and domestic pig genomes reveals long-term gene-flow during domestication

Laurent A.F. Frantz, Joshua Schraiber, Ole Madsen, Hendrik-Jan Megens, Alex Cagan, Mirte Bosse, Yogesh Paudel, Richard P.M.A. Crooijmans, Greger Larson, Martien A.M. Groenen
doi: http://dx.doi.org/10.1101/010959

Traditionally, the process of domestication is assumed to be initiated by people, involve few individuals and rely on reproductive isolation between wild and domestic forms. However, an emerging zooarcheological consensus depicts animal domestication as a long-term process without reproductive isolation or strong intentional selection. Here, we ask whether pig domestication followed a traditional linear model, or a complex, reticulate model as predicted by zooarcheologists. To do so, we fit models of domestication to whole genome data from over 100 wild and domestic pigs. We found that the assumptions of traditional models, such as reproductive isolation and strong domestication bottlenecks, are incompatible with the genetic data and provide support for the zooarcheological theory of a complex domestication process. In particular, gene-flow from wild to domestic pigs was a ubiquitous feature of the domestication of pigs. In addition, we show that despite gene-flow, the genomes of domestic pigs show strong signatures of selection at loci that affect behaviour and morphology. Specifically, our results are consistent with independent parallel sweeps in two independent domestication areas (China and Anatolia) at loci linked to morphological traits. We argue that recurrent selection for domestic traits likely counteracted the homogenising effect of gene-flow from wild boars and created “islands of domestication” in the genome. Overall, our results suggest that genomic approaches that allow for more complex models of domestication to be embraced should be employed. The results from these studies will have significant ramifications for studies that attempt to infer the origin of domesticated animals.

E. coli populations in unpredictably fluctuating environments evolve to face novel stresses through enhanced efflux activity

E. coli populations in unpredictably fluctuating environments evolve to face novel stresses through enhanced efflux activity

Shraddha Madhav Karve, Sachit Daniel, Yashraj Chavhan, Abhishek Anand, Somendra Singh Kharola, Sutirth Dey
doi: http://dx.doi.org/10.1101/011007

There is considerable understanding about how laboratory populations respond to predictable (constant or deteriorating-environment) selection for single environmental variables like temperature or pH. However, such insights may not apply when selection environments comprise multiple variables that fluctuate unpredictably, as is common in nature. To address this issue, we grew replicate laboratory populations of E. coli in nutrient broth whose pH and concentrations of salt (NaCl) and hydrogen peroxide (H2O2) were randomly changed daily. After ~170 generations, the fitness of the selected populations had not increased in any of the three selection environments. However, these selected populations had significantly greater fitness in four novel environments which have no known fitness-correlation with tolerance to pH, NaCl or H2O2. Interestingly, contrary to expectations, hypermutators did not evolve. Instead, the selected populations evolved an increased ability for energy dependent efflux activity that might enable them to throw out toxins, including antibiotics, from the cell at a faster rate. This provides an alternate mechanism for how evolvability can evolve in bacteria and potentially lead to broad-spectrum antibiotic resistance, even in the absence of prior antibiotic exposure. Given that environmental variability is increasing in nature, this might have serious consequences for public-health.

Enhanced Transcriptome Maps from Multiple Mouse Tissues Reveal Evolutionary Constraint in Gene Expression for Thousands of Genes

Enhanced Transcriptome Maps from Multiple Mouse Tissues Reveal Evolutionary Constraint in Gene Expression for Thousands of Genes
Dmitri Pervouchine, Sarah Djebali, Alessandra Breschi, Carrie A Davis, Pablo Prieto Barja, Alex Dobin, Andrea Tanzer, Julien Lagarde, Chris Zaleski, Lei-Hoon See, Meagan Fastuca, Jorg Drenkow, Huaien Wang, Giovanni Bussotti, Baikang Pei, Suganthi Balasubramanian, Jean Monlong, Arif Harmanci, Mark Gerstein, Michael A Beer, Cedric Notredame, Roderic Guigo, Thomas R Gingeras
doi: http://dx.doi.org/10.1101/010884

We characterized by RNA-seq the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles obtained in human cell lines reveals substantial conservation of transcriptional programs, and uncovers a distinct class of genes with levels of expression across cell types and species, that have been constrained early in vertebrate evolution. This core set of genes capture a substantial and constant fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but it is associated with strong and conserved epigenetic marking, as well as to a characteristic post-transcriptional regulatory program in which sub-cellular localization and alternative splicing play comparatively large roles.

Comparative genomics reveals the origins and diversity of arthropod immune systems

Comparative genomics reveals the origins and diversity of arthropod immune systems
William J Palmer, Francis M Jiggins

While the innate immune system of insects is well-studied, comparatively little is known about how other arthropods defend themselves against infection. We have characterised key immune components in the genomes of five chelicerates, a myriapod and a crustacean. We found clear traces of an ancient origin of innate immunity, with some arthropods having Tolllike receptors and C3-complement factors that are more closely related in sequence or structure to vertebrates than other arthropods. Across the arthropods some components of the immune system, like the Toll signalling pathway, are highly conserved. However, there is also remarkable diversity. The chelicerates apparently lack the Imd signalling pathway and BGRPs–a key class of pathogen recognition receptors. Many genes have large copy number variation across species, and this may sometimes be accompanied by changes in function. For example, peptidoglycan recognition proteins (PGRPs) have frequently lost their catalytic activity and switch between secreted and intracellular forms. There has been extensive duplication of the cellular immune receptor Dscam in several species, which may be an alternative way to generate the high diversity that produced by alternative splicing in insects. Our results provide a detailed analysis of the immune systems of several important groups of animals and lay the foundations for functional work on these groups.

Counterinsurgency Doctrine Applied to Infectious Disease

Counterinsurgency Doctrine Applied to Infectious Disease
Benjamin C Kirkup ​

Recent scientific discoveries lead inexorably to the conclusion that the ‘total human’ incorporates a necessary body of numerous microbes, including bacteria. These bacteria play a very important role in immunity by actively resisting infections by outside bacteria; however, under certain conditions they can degrade their community. They can arrogate to themselves resources that normally flow through other metabolic pathways and form persistent biological structures. In this situation, these bacteria constitute an insurgency, with strategic ramifications.

Genome-wide comparative analysis reveals human- mouse regulatory landscape and evolution

Genome-wide comparative analysis reveals human- mouse regulatory landscape and evolution
Olgert Denas, Richard Sandstrom, Yong Cheng, Kathryn Beal, Javier Herrero, Ross Hardison, James Taylor
doi: http://dx.doi.org/10.1101/010926

Background: Because species-specific gene expression is driven by species-specific regulation, understanding the relationship between sequence and function of the regulatory regions in different species will help elucidate how differences among species arise. Despite active experimental and computational research, the relationships among sequence, conservation, and function are still poorly understood. Results: We compared transcription factor occupied segments (TFos) for 116 human and 35 mouse TFs in 546 human and 125 mouse cell types and tissues from the Human and the Mouse ENCODE projects. We based the map between human and mouse TFos on a one-to-one nucleotide cross-species mapper, bnMapper, that utilizes whole genome alignments (WGA). Our analysis shows that TFos are under evolutionary constraint, but a substantial portion (25.1% of mouse and 25.85% of human on average) of the TFos does not have a homologous sequence on the other species; this portion varies among cell types and TFs. Furthermore, 47.67% and 57.01% of the homologous TFos sequence shows binding activity on the other species for human and mouse respectively. However, 79.87% and 69.22% is repurposed such that it binds the same TF in different cells or different TFs in the same cells. Remarkably, within the set of TFos not showing conservation of occupancy, the corresponding genome regions in the other species are preferred locations of novel TFos. These events suggest that a substantial amount of functional regulatory sequences is exapted from other biochemically active genomic material. Despite substantial repurposing of TFos, we did not find substantial changes in their predicted target genes, suggesting that CRMs buffer evolutionary events allowing little or no change in the TF – target gene associations. Thus, the small portion of TFos with strictly conserved occupancy underestimates the degree of conservation of regulatory interactions. Conclusion: We mapped regulatory sequences from an extensive number of TFs and cell types between human and mouse. A comparative analysis of this correspondence unveiled the extent of the shared regulatory sequence across TFs and cell types under study. Importantly, a large part of the shared regulatory sequence repurposed on the other species. This sequence, fueled by turnover events, provides a strong case for exaptation in regulatory elements.

When is selection effective?

When is selection effective?
Simon Gravel
doi: http://dx.doi.org/10.1101/010934

Deleterious alleles are more likely to reach high frequency in small populations because of chance fluctuations in allele frequency. This may lead, over time, to reduced average fitness in the population. In that sense, selection is more `effective’ in larger populations. Many recent studies have considered whether the different demographic histories across human populations have resulted in differences in the number, distribution, and severity of deleterious variants, leading to an animated debate. This article seeks to clarify some terms of the debate by identifying differences in definitions and assumptions used in these studies and providing an intuitive explanation for the observed similarity in genetic load among populations. The intuition is verified through analytical and numerical calculations. First, even though rare variants contribute to load, they contribute little to load differences across populations. Second, the accumulation of non-recessive load after a bottleneck is slow for the weakly deleterious variants that contribute much of the long-term variation among populations. Whereas a bottleneck increases drift instantly, it affects selection only indirectly, so that fitness differences can keep accumulating long after a bottleneck is over. Third, drift and selection tend to have opposite effects on load differentiation under dominance models. Because of this competition, load differences across populations depend sensitively and intricately on past demographic events and on the distribution of fitness effects. A given bottleneck can lead to increased or decreased load for variants with identical fitness effects, depending on the subsequent population history. Because of this sensitivity, both classical population genetic intuition and detailed simulations are required to understand differences in load across populations.