Reproductive workers show queen-like gene expression in an intermediately eusocial insect, the buff-tailed bumble bee Bombus terrestris

Reproductive workers show queen-like gene expression in an intermediately eusocial insect, the buff-tailed bumble bee Bombus terrestris.

Mark Christian Harrison, Robert L Hammond, Eamonn B Mallon
doi: http://dx.doi.org/10.1101/012500

Bumble bees represent a taxon with an intermediate level of eusociality within Hymenoptera. The clear division of reproduction between a single founding queen and the largely sterile workers is characteristic for highly eusocial species, whereas the morphological similarity between the bumble bee queen and the workers is typical for more primitively eusocial hymenopterans. Also, unlike other highly eusocial hymenopterans, division of labour among worker sub-castes is plastic and not predetermined by morphology or age. We conducted a differential expression analysis based on RNA-seq data from 11 combinations of developmental stage and caste to investigate how a single genome can produce the distinct castes of queens, workers and males in the buff-tailed bumble bee Bombus terrestris. Based on expression patterns, we found males to be the most distinct of all adult castes (2,411 transcripts differentially expressed compared to non-reproductive workers). However, only relatively few transcripts were differentially expressed between males and workers during development (larvae: 71, pupae: 162). This indicates the need for more distinct expression patterns to control behaviour and physiology in adults compared to those required to create different morphologies. Among the female castes, the expression of over ten times more transcripts differed signifcantly between reproductive workers and their non-reproductive sisters than when comparing reproductive workers to the mother queen. This suggests a strong shift towards a more queen-like behaviour and physiology when a worker becomes fertile. This is in contrast to findings for higher eusocial species, in which reproductive workers are more similar to non-reproductive workers than the queen.

An experimental test of the relationship between melanism and desiccation survival in insects

An experimental test of the relationship between melanism and desiccation survival in insects

Subhash Rajpurohit, Lisa Marie Peterson, Andrew Orr, Anthony J. Marlon, Allen G Gibbs
doi: http://dx.doi.org/10.1101/012369

We used experimental evolution to test the ?melanism-desiccation? hypothesis, which proposes that dark cuticle in several Drosophila species is an adaptation for increased desiccation tolerance. We selected for dark and light body pigmentation in replicated populations of D. melanogaster and assayed traits related to water balance. We also scored pigmentation and desiccation tolerance in populations selected for desiccation survival. Populations in both selection regimes showed large differences in the traits directly under selection. However, after over 40 generations of pigmentation selection, dark-selected populations were not more desiccation-tolerant than light-selected and control populations, nor did we find significant changes in carbohydrate amounts that could affect desiccation resistance. Body pigmentation of desiccation-selected populations did not differ from control populations after over 140 generations of selection. Our results do not support an important role for melanization in Drosophila water balance.

DensiTree 2: Seeing Trees Through the Forest

DensiTree 2: Seeing Trees Through the Forest

Remco Bouckaert, Joseph Heled
doi: http://dx.doi.org/10.1101/012401

Motivation: Phylogenetic analysis like Bayesian MCMC or bootstrapping result in a collection of trees. Trees are discrete objects and it is generally difficult to get a mental grip on a distributions over trees. Visualisation tools like DensiTree can give good intuition on tree distributions. It works by drawing all trees in the set transparently thus highlighting areas where the tree in the set agrees. In this way, both uncertainty in clade heights and uncertainty in topology can be visualised. In our experience, a vanilla DensiTree can turn out to be misleading in that it shows too much uncertainty due to wrongly ordering taxa or due to unlucky placement of internal nodes. Results: DensiTree is extended to allow visualisation of meta-data associated with branches such as population size and evolutionary rates. Furthermore, geographic locations of taxa can be shown on a map, making it easy to visually check there is some geographic pattern in a phylogeny. Taxa orderings have a large impact on the layout of the tree set, and advances have been made in finding better orderings resulting in significantly more informative visualisations. We also explored various methods for positioning internal nodes, which can improve the quality of the image. Together, these advances make it easier to comprehend distributions over trees. Availability: DensiTree is freely available from http://compevol. auckland.ac.nz/software/.

Synthesis of phylogeny and taxonomy into a comprehensive tree of life

Synthesis of phylogeny and taxonomy into a comprehensive tree of life

Karen A Cranston, Open Tree of Life
doi: http://dx.doi.org/10.1101/012260

Reconstructing the phylogenetic relationships that unite all biological lineages (the tree of life) is a grand challenge of biology. However, the paucity of readily available homologous character data across disparately related lineages renders direct phylogenetic inference currently untenable. Our best recourse towards realizing the tree of life is therefore the synthesis of existing collective phylogenetic knowledge available from the wealth of published primary phylogenetic hypotheses, together with taxonomic hierarchy information for unsampled taxa. We combined phylogenetic and taxonomic data to produce a draft tree of life—the Open Tree of Life—containing 2.3 million tips. Realization of this draft tree required the assembly of two resources that should prove valuable to the community: 1) a novel comprehensive global reference taxonomy, and 2) a database of published phylogenetic trees mapped to this common taxonomy. Our open source framework facilitates community comment and contribution, enabling a continuously updatable tree when new phylogenetic and taxonomic data become digitally available. While data coverage and phylogenetic conflict across the Open Tree of Life illuminates significant gaps in both the underlying data available for phylogenetic reconstruction and the publication of trees as digital objects, the tree provides a compelling starting point from which we can continue to improve through community contributions. Having a comprehensive tree of life will fuel fundamental research on the nature of biological diversity, ultimately providing up-to-date phylogenies for downstream applications in comparative biology, ecology, conservation biology, climate change studies, agriculture, and genomics.

SpeedSeq: Ultra-fast personal genome analysis and interpretation

SpeedSeq: Ultra-fast personal genome analysis and interpretation

Colby Chiang, Ryan M Layer, Gregory G Faust, Michael R Lindberg, David B Rose, Erik P Garrison, Gabor T Marth, Aaron R Quinlan, Ira M Hall
doi: http://dx.doi.org/10.1101/012179

Comprehensive interpretation of human genome sequencing data is a challenging bioinformatic problem that typically requires weeks of analysis, with extensive hands-on expert involvement. This informatics bottleneck inflates genome sequencing costs, poses a computational burden for large-scale projects, and impedes the adoption of time-critical clinical applications such as personalized cancer profiling and newborn disease diagnosis, where the actionable timeframe can measure in hours or days. We developed SpeedSeq, an open-source genome analysis platform that vastly reduces computing time. SpeedSeq accomplishes read alignment, duplicate removal, variant detection and functional annotation of a 50X human genome in <24 hours, even using one low-cost server. SpeedSeq offers competitive or superior performance to current methods for detecting germline and somatic single nucleotide variants (SNVs), indels, and structural variants (SVs) and includes novel functionality for SV genotyping, SV annotation, fusion gene detection, and rapid identification of actionable mutations. SpeedSeq will help bring timely genome analysis into the clinical realm.

Local and systemic gene expression responses to a white syndrome-like disease in a reef building coral, Acropora hyacinthus.

Local and systemic gene expression responses to a white syndrome-like disease in a reef building coral, Acropora hyacinthus.

Rachel M Wright, Galina V Aglyamova, Eli Meyer, Mikhail V Matz
doi: http://dx.doi.org/10.1101/012211

Background Corals are capable of launching diverse immune defenses at the site of direct contact with pathogens, but the molecular mechanisms of this activity and the colony-wide effects of such stressors remain poorly understood. Here we compared gene expression profiles in eight healthy Acropora hyacinthus colonies against eight colonies exhibiting white syndrome-like symptoms, all collected from a natural reef environment near Palau. Two types of tissues were sampled from diseased corals: visibly affected and apparently healthy tissues. Results Tag-based RNA-Seq followed by weighted gene co-expression network analysis identified groups of co-regulated differentially expressed genes between all disease states (diseased, ahead of the lesion, and healthy). Most of the differentially expressed genes were found between tissues at the lesions and asymptomatic (healthy and ahead of the lesion) tissues. These genes were related to innate immunity, oxidative stress responses, lipid metabolism, and calcification. Network analysis also revealed groups of genes regulated specifically in the tissues from diseased colonies that were not yet showing obvious symptoms of disease, indicating a systemic response to infection. Conclusions These observations suggest that tissues ahead of the lesion of disease progression exist in a transitional state between health and lesion appearance. Alternatively, these gene expression profiles capture physiological differences between colonies with varying disease susceptibilities.

Likelihood Estimation with Incomplete Array Variate Observations

Likelihood Estimation with Incomplete Array Variate Observations

Deniz Akdemir
doi: http://dx.doi.org/10.1101/012278

Missing data present an important challenge when dealing with high dimensional data arranged in the form of an array. In this paper, we propose methods for estimation of the parameters of array variate normal probability model from partially observed multi-way data. The methods developed here are useful for missing data imputation, estimation of mean and covariance parameters for multi-way data. A multi-way semi-parametric mixed effects model that allows separation of multi-way covariance effects is also defined and an efficient algorithm for estimation based on the spectral decompositions of the covariance parameters is recommended. We demonstrate our methods with simulations and with real life data involving the estimation of genotype and environment interaction effects on possibly correlated traits.

A new hierarchy of phylogenetic models consistent with heterogeneous substitution rates

A new hierarchy of phylogenetic models consistent with heterogeneous substitution rates

Michael D. Woodhams, Jesús Fernández-Sánchez, Jeremy G. Sumner
(Submitted on 4 Dec 2014)

When the process underlying DNA substitutions varies across evolutionary history, the standard Markov models underlying standard phylogenetic methods are mathematically inconsistent. The most prominent example is the general time reversible model (GTR) together with some, but not all, of its submodels. To rectify this deficiency, Lie Markov models have been developed as the class of models that are consistent in the face of a changing process of DNA substitutions. Some well-known models in popular use are within this class, but are either overly simplistic (e.g. the Kimura two-parameter model) or overly complex (the general Markov model). On a diverse set of biological data sets, we test a hierarchy of Lie Markov models spanning the full range of parameter richness. Compared against the benchmark of the ever-popular GTR model, we find that as a whole the Lie Markov models perform remarkably well, with the best performing models having eight parameters and the ability to recognise the distinction between purines and pyrimidines.

Revealing missing isoforms encoded in the human genome by integrating genomic, transcriptomic and proteomic data

Revealing missing isoforms encoded in the human genome by integrating genomic, transcriptomic and proteomic data

Zhiqiang Hu, Hamish S. Scott, Guangrong Qin, Guangyong Zheng, Xixia Chu, Lu Xie, David L. Adelson, Bergithe E. Oftedal, Parvalthy Venugopal, Milena Barbic, Christopher N. Hahn, Bing Zhang, Xiaojing Wang, Nan Li, Chaochun Wei
doi: http://dx.doi.org/10.1101/012112

Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing and is much larger than the number of human genes. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950.

Explicit modeling of ancestry improves polygenic risk scores and BLUP prediction

Explicit modeling of ancestry improves polygenic risk scores and BLUP prediction

Chia-Yen Chen, Jiali Han, David J. Hunter, Peter Kraft, Alkes L. Price
doi: http://dx.doi.org/10.1101/012005

Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color, tanning ability and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRS) and Best Linear Unbiased Prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R2 for hair color increased by 66% (0.0456 to 0.0755; p<10-16), the R2 for tanning ability increased by 123% (0.0154 to 0.0344; p<10-16) and the liability-scale R2 for BCC increased by 68% (0.0138 to 0.0232; p<10-16) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being over-weighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be under-weighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.