Local and systemic gene expression responses to a white syndrome-like disease in a reef building coral, Acropora hyacinthus.

Local and systemic gene expression responses to a white syndrome-like disease in a reef building coral, Acropora hyacinthus.

Rachel M Wright, Galina V Aglyamova, Eli Meyer, Mikhail V Matz
doi: http://dx.doi.org/10.1101/012211

Background Corals are capable of launching diverse immune defenses at the site of direct contact with pathogens, but the molecular mechanisms of this activity and the colony-wide effects of such stressors remain poorly understood. Here we compared gene expression profiles in eight healthy Acropora hyacinthus colonies against eight colonies exhibiting white syndrome-like symptoms, all collected from a natural reef environment near Palau. Two types of tissues were sampled from diseased corals: visibly affected and apparently healthy tissues. Results Tag-based RNA-Seq followed by weighted gene co-expression network analysis identified groups of co-regulated differentially expressed genes between all disease states (diseased, ahead of the lesion, and healthy). Most of the differentially expressed genes were found between tissues at the lesions and asymptomatic (healthy and ahead of the lesion) tissues. These genes were related to innate immunity, oxidative stress responses, lipid metabolism, and calcification. Network analysis also revealed groups of genes regulated specifically in the tissues from diseased colonies that were not yet showing obvious symptoms of disease, indicating a systemic response to infection. Conclusions These observations suggest that tissues ahead of the lesion of disease progression exist in a transitional state between health and lesion appearance. Alternatively, these gene expression profiles capture physiological differences between colonies with varying disease susceptibilities.

Likelihood Estimation with Incomplete Array Variate Observations

Likelihood Estimation with Incomplete Array Variate Observations

Deniz Akdemir
doi: http://dx.doi.org/10.1101/012278

Missing data present an important challenge when dealing with high dimensional data arranged in the form of an array. In this paper, we propose methods for estimation of the parameters of array variate normal probability model from partially observed multi-way data. The methods developed here are useful for missing data imputation, estimation of mean and covariance parameters for multi-way data. A multi-way semi-parametric mixed effects model that allows separation of multi-way covariance effects is also defined and an efficient algorithm for estimation based on the spectral decompositions of the covariance parameters is recommended. We demonstrate our methods with simulations and with real life data involving the estimation of genotype and environment interaction effects on possibly correlated traits.

A new hierarchy of phylogenetic models consistent with heterogeneous substitution rates

A new hierarchy of phylogenetic models consistent with heterogeneous substitution rates

Michael D. Woodhams, Jesús Fernández-Sánchez, Jeremy G. Sumner
(Submitted on 4 Dec 2014)

When the process underlying DNA substitutions varies across evolutionary history, the standard Markov models underlying standard phylogenetic methods are mathematically inconsistent. The most prominent example is the general time reversible model (GTR) together with some, but not all, of its submodels. To rectify this deficiency, Lie Markov models have been developed as the class of models that are consistent in the face of a changing process of DNA substitutions. Some well-known models in popular use are within this class, but are either overly simplistic (e.g. the Kimura two-parameter model) or overly complex (the general Markov model). On a diverse set of biological data sets, we test a hierarchy of Lie Markov models spanning the full range of parameter richness. Compared against the benchmark of the ever-popular GTR model, we find that as a whole the Lie Markov models perform remarkably well, with the best performing models having eight parameters and the ability to recognise the distinction between purines and pyrimidines.

Revealing missing isoforms encoded in the human genome by integrating genomic, transcriptomic and proteomic data

Revealing missing isoforms encoded in the human genome by integrating genomic, transcriptomic and proteomic data

Zhiqiang Hu, Hamish S. Scott, Guangrong Qin, Guangyong Zheng, Xixia Chu, Lu Xie, David L. Adelson, Bergithe E. Oftedal, Parvalthy Venugopal, Milena Barbic, Christopher N. Hahn, Bing Zhang, Xiaojing Wang, Nan Li, Chaochun Wei
doi: http://dx.doi.org/10.1101/012112

Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing and is much larger than the number of human genes. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950.

Explicit modeling of ancestry improves polygenic risk scores and BLUP prediction

Explicit modeling of ancestry improves polygenic risk scores and BLUP prediction

Chia-Yen Chen, Jiali Han, David J. Hunter, Peter Kraft, Alkes L. Price
doi: http://dx.doi.org/10.1101/012005

Polygenic prediction using genome-wide SNPs can provide high prediction accuracy for complex traits. Here, we investigate the question of how to account for genetic ancestry when conducting polygenic prediction. We show that the accuracy of polygenic prediction in structured populations may be partly due to genetic ancestry. However, we hypothesized that explicitly modeling ancestry could improve polygenic prediction accuracy. We analyzed three GWAS of hair color, tanning ability and basal cell carcinoma (BCC) in European Americans (sample size from 7,440 to 9,822) and considered two widely used polygenic prediction approaches: polygenic risk scores (PRS) and Best Linear Unbiased Prediction (BLUP). We compared polygenic prediction without correction for ancestry to polygenic prediction with ancestry as a separate component in the model. In 10-fold cross-validation using the PRS approach, the R2 for hair color increased by 66% (0.0456 to 0.0755; p<10-16), the R2 for tanning ability increased by 123% (0.0154 to 0.0344; p<10-16) and the liability-scale R2 for BCC increased by 68% (0.0138 to 0.0232; p<10-16) when explicitly modeling ancestry, which prevents ancestry effects from entering into each SNP effect and being over-weighted. Surprisingly, explicitly modeling ancestry produces a similar improvement when using the BLUP approach, which fits all SNPs simultaneously in a single variance component and causes ancestry to be under-weighted. We validate our findings via simulations, which show that the differences in prediction accuracy will increase in magnitude as sample sizes increase. In summary, our results show that explicitly modeling ancestry can be important in both PRS and BLUP prediction.

Detecting the anomaly zone in species trees and evidence for a misleading signal in higher-level skink phylogeny (Squamata: Scincidae).

Detecting the anomaly zone in species trees and evidence for a misleading signal in higher-level skink phylogeny (Squamata: Scincidae).

Charles W Linkem, Vladimir N. Minin, Adam D Leache
doi: http://dx.doi.org/10.1101/012096

The anomaly zone presents a major challenge to the accurate resolution of many parts of the Tree of Life. The anomaly zone is defined by the presence of a gene tree topology that is more probable than the true species tree. This discrepancy can result from consecutive rapid speciation events in the species tree. Similar to the problem of long-branch attraction, including more data (loci) will only reinforce the support for the incorrect species tree. Empirical phylogenetic studies often implement coalescent based species tree methods to avoid the anomaly zone, but to this point these studies have not had a method for providing any direct evidence that the species tree is actually in the anomaly zone. In this study, we use 16 species of lizards in the family Scincidae to investigate whether nodes that are difficult to resolve are located within the anomaly zone. We analyze new phylogenomic data (429 loci), using both concatenation and coalescent based species tree estimation, to locate conflicting topological signal. We then use the unifying principle of the anomaly zone, together with estimates of ancestral population sizes and species persistence times, to determine whether the observed phylogenetic conflict is a result of the anomaly zone. We identify at least three regions of the Scindidae phylogeny that provide demographic signatures consistent with the anomaly zone, and this new information helps reconcile the phylogenetic conflict in previously published studies on these lizards. The anomaly zone presents a real problem in phylogenetics, and our new framework for identifying anomalous relationships will help empiricists leverage their resources appropriately for overcoming this challenge.

The infant airway microbiome in health and disease impacts later asthma development

The infant airway microbiome in health and disease impacts later asthma development

Shu Mei Teo, Danny Mok, Kym Pham, Merci Kusel, Michael Serralha, Niamh Troy, Barbara J Holt, Belinda J Hales, Michael L Walker, Elysia Hollams, Yury H Bochkov, Kristine Grindle, Sebastian L Johnston, James E Gern, Peter D Sly, Patrick G Holt, Kathryn E Holt, Michael Inouye
doi: http://dx.doi.org/10.1101/012070

The nasopharynx (NP) is a reservoir for microbes associated with acute respiratory illnesses (ARI). The development of asthma is initiated during infancy, driven by airway inflammation associated with infections. Here, we report viral and bacterial community profiling of NP aspirates across a birth cohort, capturing all lower respiratory illnesses during their first year. Most infants were initially colonized with Staphylococcus or Corynebacterium before stable colonization with Alloiococcus or Moraxella, with transient incursions of Streptococcus, Moraxella or Haemophilus marking virus-associated ARIs. Our data identify the NP microbiome as a determinant for infection spread to the lower airways, severity of accompanying inflammatory symptoms, and risk for future asthma development. Early asymptomatic colonization with Streptococcus was a strong asthma predictor, and antibiotic usage disrupted asymptomatic colonization patterns.

The rate and molecular spectrum of spontaneous mutations in the GC-rich multi-chromosome genome of Burkholderia cenocepacia

The rate and molecular spectrum of spontaneous mutations in the GC-rich multi-chromosome genome of Burkholderia cenocepacia
Marcus M Dillon, Way Sung, Michael Lynch, Vaughn S Cooper
doi: http://dx.doi.org/10.1101/011841
Spontaneous mutations are ultimately essential for evolutionary change and are also the root cause of nearly all disease. However, until recently, both biological and technical barriers have prevented detailed analyses of mutation profiles, constraining our understanding of the mutation process to a few model organisms and leaving major gaps in our understanding of the role of genome content and structure on mutation. Here, we present a genome-wide view of the molecular mutation spectrum in Burkholderia cenocepacia, a clinically relevant pathogen with high %GC content and multiple chromosomes. We find that B. cenocepacia has low genome-wide mutation rates with insertion-deletion mutations biased towards deletions, consistent with the idea that deletion pressure reduces prokaryotic genome sizes. Unlike previously assayed organisms, B. cenocepacia exhibits a GC-mutation bias, which suggests that at least some genomes with high GC content may be driven to this point by unusual base-substitution mutation pressure. Notably, we also observed variation in both the rates and spectra of mutations among chromosomes, and a significant elevation of G:C>T:A transversions in late-replicating regions. Thus, although some patterns of mutation appear to be highly conserved across cellular life, others vary between species and even between chromosomes of the same species, potentially influencing the evolution of nucleotide composition and genome architecture.

Most viewed on Haldane’s Sieve: November 2014

The most viewed posts on Haldane’s Sieve this month were:

A new FST-based method to uncover local adaptation using environmental variables

A new $F_{\text{ST}}$-based method to uncover local adaptation using environmental variables
Pierre de Villemereuil, Oscar E. Gaggiotti
Comments: 18 pages, 5 figures, Supplementary Information at the end of the document
Subjects: Populations and Evolution (q-bio.PE)

Genome-scan methods are used for screening genome-wide patterns of DNA polymorphism to detect signatures of positive selection. There are two main types of methods: (i) “outlier” detection methods based on $F_{\text{ST}}$ that detect loci with high differenciation compared to the rest of the genomes and, (ii) environmental association methods that test the association between allele frequencies and environmental variables. In this article, we present a new $F_{\text{ST}}$-based genome scan method, BayeScEnv, which incorporates environmental information in the form of “environmental differentiation”. It is based on the F model but as opposed to existing approaches it considers two locus-specific effects, one due to divergent selection and another due to other processes such as differences in mutation rates across loci or background selection. Simulation studies showed that our method has a much lower false positive rate than an existing $F_{\text{ST}}$-based method, BayeScan, under a wide range of demographic scenarios. Although it had lower power, it leads to a better compromise between power and false positive rate. We apply our method to Human and Salmon datasets and show that it can be used successfully to study local adaptation. The method was developped in C++ and is avaible at this http URL