An experimental test of the relationship between melanism and desiccation survival in insects

An experimental test of the relationship between melanism and desiccation survival in insects

Subhash Rajpurohit, Lisa Marie Peterson, Andrew Orr, Anthony J. Marlon, Allen G Gibbs
doi: http://dx.doi.org/10.1101/012369

We used experimental evolution to test the ?melanism-desiccation? hypothesis, which proposes that dark cuticle in several Drosophila species is an adaptation for increased desiccation tolerance. We selected for dark and light body pigmentation in replicated populations of D. melanogaster and assayed traits related to water balance. We also scored pigmentation and desiccation tolerance in populations selected for desiccation survival. Populations in both selection regimes showed large differences in the traits directly under selection. However, after over 40 generations of pigmentation selection, dark-selected populations were not more desiccation-tolerant than light-selected and control populations, nor did we find significant changes in carbohydrate amounts that could affect desiccation resistance. Body pigmentation of desiccation-selected populations did not differ from control populations after over 140 generations of selection. Our results do not support an important role for melanization in Drosophila water balance.

DensiTree 2: Seeing Trees Through the Forest

DensiTree 2: Seeing Trees Through the Forest

Remco Bouckaert, Joseph Heled
doi: http://dx.doi.org/10.1101/012401

Motivation: Phylogenetic analysis like Bayesian MCMC or bootstrapping result in a collection of trees. Trees are discrete objects and it is generally difficult to get a mental grip on a distributions over trees. Visualisation tools like DensiTree can give good intuition on tree distributions. It works by drawing all trees in the set transparently thus highlighting areas where the tree in the set agrees. In this way, both uncertainty in clade heights and uncertainty in topology can be visualised. In our experience, a vanilla DensiTree can turn out to be misleading in that it shows too much uncertainty due to wrongly ordering taxa or due to unlucky placement of internal nodes. Results: DensiTree is extended to allow visualisation of meta-data associated with branches such as population size and evolutionary rates. Furthermore, geographic locations of taxa can be shown on a map, making it easy to visually check there is some geographic pattern in a phylogeny. Taxa orderings have a large impact on the layout of the tree set, and advances have been made in finding better orderings resulting in significantly more informative visualisations. We also explored various methods for positioning internal nodes, which can improve the quality of the image. Together, these advances make it easier to comprehend distributions over trees. Availability: DensiTree is freely available from http://compevol. auckland.ac.nz/software/.

The genomic signature of social interactions regulating honey bee caste development

The genomic signature of social interactions regulating honey bee caste development
Svjetlana Vojvodic, Brian R Johnson, Brock Harpur, Clement Kent, Amro Zayed, Kirk E Anderson, Timothy Linksvayer
doi: http://dx.doi.org/10.1101/012385

Social evolution theory posits the existence of genes expressed in one individual that affect the traits and fitness of social partners. The archetypal example of reproductive altruism, honey bee reproductive caste, involves strict social regulation of larval caste fate by care-giving nurses. However, the contribution of nurse-expressed genes, which are prime socially-acting candidate genes, to the caste developmental program and to caste evolution remains mostly unknown. We experimentally induced new queen production by removing the current colony queen, and we used RNA sequencing to study the gene expression profiles of both developing larvae and their care-giving nurses before and after queen removal. By comparing the gene expression profiles between both queen-destined larvae and their nurses to worker-destined larvae and their nurses in queen-present and queen-absent conditions, we identified larval and nurse genes associated with larval caste development and with queen presence. Of 950 differentially-expressed genes associated with larval caste development, 82% were expressed in larvae and 18% were expressed in nurses. Behavioral and physiological evidence suggests that nurses may specialize in the short term feeding queen- versus worker-destined larvae. Estimated selection coefficients indicated that both nurse and larval genes associated with caste are rapidly evolving, especially those genes associated with worker development. Of the 1863 differentially-expressed genes associated with queen presence, 90% were expressed in nurses. Altogether, our results suggest that socially-acting genes play important roles in both the expression and evolution of socially-influenced traits like caste.

Evaluating intra- and inter-individual variation in the human placental transcriptome

Evaluating intra- and inter-individual variation in the human placental transcriptome
David A Hughes, Martin Kircher, Zhisong He, Song Guo, Genevieve L Fairbrother, Carlos S Moreno, Philipp Khaitovich, Mark Stoneking
doi: http://dx.doi.org/10.1101/012468

Background: Gene expression variation is a phenotypic trait of particular interest as it represents the initial link between genotype and other phenotypes. Analyzing how such variation apportions among and within groups allows for the evaluation of how genetic and environmental factors influence such traits. It also provides opportunities to identify genes and pathways that may have been influenced by non-neutral processes. Here we use a population genetics framework and next generation sequencing to evaluate how gene expression variation is apportioned among four human groups in a natural biological tissue, the placenta. Results: We estimate that on average, 33.2%, 58.9% and 7.8% of the placental transcriptome is explained by variation within individuals, among individuals and among human groups, respectively. Additionally, when technical and biological traits are included in models of gene expression they account for roughly 2% of total gene expression variation. Notably, the variation that is significantly different among groups is enriched in biological pathways associated with immune response, cell signaling and metabolism. Many biological traits demonstrated correlated changes in expression in numerous pathways of potential interest to clinicians and evolutionary biologists. Finally, we estimate that the majority of the human placental transcriptome (65% of expressed genes) exhibits expression profiles consistent with neutrality; the remainder are consistent with stabilizing selection (26%), directional selection (4.9%), or diversifying selection (4.8%). Conclusion: We apportion placental gene expression variation into individual, population and biological trait factors and identify how each influence the transcriptome. Additionally, we advance methods to associate expression profiles with different forms of selection.

Synthesis of phylogeny and taxonomy into a comprehensive tree of life

Synthesis of phylogeny and taxonomy into a comprehensive tree of life

Karen A Cranston, Open Tree of Life
doi: http://dx.doi.org/10.1101/012260

Reconstructing the phylogenetic relationships that unite all biological lineages (the tree of life) is a grand challenge of biology. However, the paucity of readily available homologous character data across disparately related lineages renders direct phylogenetic inference currently untenable. Our best recourse towards realizing the tree of life is therefore the synthesis of existing collective phylogenetic knowledge available from the wealth of published primary phylogenetic hypotheses, together with taxonomic hierarchy information for unsampled taxa. We combined phylogenetic and taxonomic data to produce a draft tree of life—the Open Tree of Life—containing 2.3 million tips. Realization of this draft tree required the assembly of two resources that should prove valuable to the community: 1) a novel comprehensive global reference taxonomy, and 2) a database of published phylogenetic trees mapped to this common taxonomy. Our open source framework facilitates community comment and contribution, enabling a continuously updatable tree when new phylogenetic and taxonomic data become digitally available. While data coverage and phylogenetic conflict across the Open Tree of Life illuminates significant gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point from which we can continue to improve through community contributions. Having a comprehensive tree of life will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change studies, agriculture, and genomics.

SpeedSeq: Ultra-fast personal genome analysis and interpretation

SpeedSeq: Ultra-fast personal genome analysis and interpretation

Colby Chiang, Ryan M Layer, Gregory G Faust, Michael R Lindberg, David B Rose, Erik P Garrison, Gabor T Marth, Aaron R Quinlan, Ira M Hall
doi: http://dx.doi.org/10.1101/012179

Comprehensive interpretation of human genome sequencing data is a challenging bioinformatic problem that typically requires weeks of analysis, with extensive hands-on expert involvement. This informatics bottleneck inflates genome sequencing costs, poses a computational burden for large-scale projects, and impedes the adoption of time-critical clinical applications such as personalized cancer profiling and newborn disease diagnosis, where the actionable timeframe can measure in hours or days. We developed SpeedSeq, an open-source genome analysis platform that vastly reduces computing time. SpeedSeq accomplishes read alignment, duplicate removal, variant detection and functional annotation of a 50X human genome in <24 hours, even using one low-cost server. SpeedSeq offers competitive or superior performance to current methods for detecting germline and somatic single nucleotide variants (SNVs), indels, and structural variants (SVs) and includes novel functionality for SV genotyping, SV annotation, fusion gene detection, and rapid identification of actionable mutations. SpeedSeq will help bring timely genome analysis into the clinical realm.

Local and systemic gene expression responses to a white syndrome-like disease in a reef building coral, Acropora hyacinthus.

Local and systemic gene expression responses to a white syndrome-like disease in a reef building coral, Acropora hyacinthus.

Rachel M Wright, Galina V Aglyamova, Eli Meyer, Mikhail V Matz
doi: http://dx.doi.org/10.1101/012211

Background Corals are capable of launching diverse immune defenses at the site of direct contact with pathogens, but the molecular mechanisms of this activity and the colony-wide effects of such stressors remain poorly understood. Here we compared gene expression profiles in eight healthy Acropora hyacinthus colonies against eight colonies exhibiting white syndrome-like symptoms, all collected from a natural reef environment near Palau. Two types of tissues were sampled from diseased corals: visibly affected and apparently healthy tissues. Results Tag-based RNA-Seq followed by weighted gene co-expression network analysis identified groups of co-regulated differentially expressed genes between all disease states (diseased, ahead of the lesion, and healthy). Most of the differentially expressed genes were found between tissues at the lesions and asymptomatic (healthy and ahead of the lesion) tissues. These genes were related to innate immunity, oxidative stress responses, lipid metabolism, and calcification. Network analysis also revealed groups of genes regulated specifically in the tissues from diseased colonies that were not yet showing obvious symptoms of disease, indicating a systemic response to infection. Conclusions These observations suggest that tissues ahead of the lesion of disease progression exist in a transitional state between health and lesion appearance. Alternatively, these gene expression profiles capture physiological differences between colonies with varying disease susceptibilities.

Likelihood Estimation with Incomplete Array Variate Observations

Likelihood Estimation with Incomplete Array Variate Observations

Deniz Akdemir
doi: http://dx.doi.org/10.1101/012278

Missing data present an important challenge when dealing with high dimensional data arranged in the form of an array. In this paper, we propose methods for estimation of the parameters of array variate normal probability model from partially observed multi-way data. The methods developed here are useful for missing data imputation, estimation of mean and covariance parameters for multi-way data. A multi-way semi-parametric mixed effects model that allows separation of multi-way covariance effects is also defined and an efficient algorithm for estimation based on the spectral decompositions of the covariance parameters is recommended. We demonstrate our methods with simulations and with real life data involving the estimation of genotype and environment interaction effects on possibly correlated traits.

A new hierarchy of phylogenetic models consistent with heterogeneous substitution rates

A new hierarchy of phylogenetic models consistent with heterogeneous substitution rates

Michael D. Woodhams, Jesús Fernández-Sánchez, Jeremy G. Sumner
(Submitted on 4 Dec 2014)

When the process underlying DNA substitutions varies across evolutionary history, the standard Markov models underlying standard phylogenetic methods are mathematically inconsistent. The most prominent example is the general time reversible model (GTR) together with some, but not all, of its submodels. To rectify this deficiency, Lie Markov models have been developed as the class of models that are consistent in the face of a changing process of DNA substitutions. Some well-known models in popular use are within this class, but are either overly simplistic (e.g. the Kimura two-parameter model) or overly complex (the general Markov model). On a diverse set of biological data sets, we test a hierarchy of Lie Markov models spanning the full range of parameter richness. Compared against the benchmark of the ever-popular GTR model, we find that as a whole the Lie Markov models perform remarkably well, with the best performing models having eight parameters and the ability to recognise the distinction between purines and pyrimidines.

Revealing missing isoforms encoded in the human genome by integrating genomic, transcriptomic and proteomic data

Revealing missing isoforms encoded in the human genome by integrating genomic, transcriptomic and proteomic data

Zhiqiang Hu, Hamish S. Scott, Guangrong Qin, Guangyong Zheng, Xixia Chu, Lu Xie, David L. Adelson, Bergithe E. Oftedal, Parvalthy Venugopal, Milena Barbic, Christopher N. Hahn, Bing Zhang, Xiaojing Wang, Nan Li, Chaochun Wei
doi: http://dx.doi.org/10.1101/012112

Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing and is much larger than the number of human genes. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950.