Comparative genomics reveals the origins and diversity of arthropod immune systems

Comparative genomics reveals the origins and diversity of arthropod immune systems
William J Palmer, Francis M Jiggins

While the innate immune system of insects is well-studied, comparatively little is known about how other arthropods defend themselves against infection. We have characterised key immune components in the genomes of five chelicerates, a myriapod and a crustacean. We found clear traces of an ancient origin of innate immunity, with some arthropods having Tolllike receptors and C3-complement factors that are more closely related in sequence or structure to vertebrates than other arthropods. Across the arthropods some components of the immune system, like the Toll signalling pathway, are highly conserved. However, there is also remarkable diversity. The chelicerates apparently lack the Imd signalling pathway and BGRPs–a key class of pathogen recognition receptors. Many genes have large copy number variation across species, and this may sometimes be accompanied by changes in function. For example, peptidoglycan recognition proteins (PGRPs) have frequently lost their catalytic activity and switch between secreted and intracellular forms. There has been extensive duplication of the cellular immune receptor Dscam in several species, which may be an alternative way to generate the high diversity that produced by alternative splicing in insects. Our results provide a detailed analysis of the immune systems of several important groups of animals and lay the foundations for functional work on these groups.

Landscape and evolutionary dynamics of terminal-repeat retrotransposons in miniature (TRIMs) in 48 whole plant genomes

Landscape and evolutionary dynamics of terminal-repeat retrotransposons in miniature (TRIMs) in 48 whole plant genomes
Dongying Gao, Yupeng Li, Brian Abernathy, Scott Jackson

Terminal-repeat retrotransposons in miniature (TRIMs) are structurally similar to long terminal repeat (LTR) retrotransposons except that they are extremely small and difficult to identify. Thus far, only a few TRIMs have been characterized in the euphyllophytes and the evolutionary and biological impacts and transposition mechanism of TRIMs are poorly understood. In this study, we combined de novo and homology-based methods to annotate TRIMs in 48 plant genome sequences, spanning land plants to algae. We found 156 TRIM families, 146 previously undescribed. Notably, we identified the first TRIMs in a lycophyte and non-vascular plants. The majority of the TRIM families were highly conserved and shared within and between plant families. Even though TRIMs contribute only a small fraction of any plant genome, they are enriched in or near genes and may play important roles in gene evolution. TRIMs were frequently organized into tandem arrays we called TA-TRIMs, another unique feature distinguishing them from LTR retrotransposons. Importantly, we identified putative autonomous retrotransposons that may mobilize specific TRIM elements and detected very recent transpositions of a TRIM in O. sativa. Overall, this comprehensive analysis of TRIMs across the entire plant kingdom provides insight into the evolution and conservation of TRIMs and the functional roles they may play in gene evolution.

CNVkit: Copy number detection and visualization for targeted sequencing using off-target reads

CNVkit: Copy number detection and visualization for targeted sequencing using off-target reads
Eric Talevich, A. Hunter Shain, Boris C. Bastian

Germline copy number variants (CNVs) and somatic copy number alterations (SCNAs) are of significant importance in syndromic conditions and cancer. Massive parallel sequencing is increasingly used to infer copy number information from variations in the read depth in sequencing data. However, this approach has limitations in the case of targeted re-sequencing, which leaves gaps in coverage between the regions chosen for enrichment and introduces biases related to the efficiency of target capture and library preparation. We present a method for copy number detection, implemented in the software package CNVkit, that uses both the targeted reads and the nonspecifically captured off-target reads to infer copy number evenly across the genome. This combination achieves both exon-level resolution in targeted regions and sufficient resolution in the larger intronic and intergenic regions to identify copy number changes. In particular, we successfully inferred copy number at equivalent to 100-kilobase resolution genome-wide from a platform targeting as few as 293 genes. After normalizing read counts to a pooled reference, we evaluated and corrected for three sources of bias that explain most of the extraneous variability in the sequencing read depth: GC content, target footprint size and spacing, and repetitive sequences. We compared the performance of CNVkit to copy number changes identified by array comparative genomic hybridization. We packaged the components of CNVkit so that it is straightforward to use and provides visualizations, detailed reporting of significant features, and export options for compatibility with other software. Availability:

RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing

RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing

Vikas Gupta, April Dawn Estrada, Ivory Clabaugh Blakley, Rob Reid, Ketan Patel, Mason D. Meyer, Stig Uggerhoj Andersen, Allan F. Brown, Mary Ann Lila, Ann Loraine

Background: Blueberries are a rich source of antioxidants and other beneficial compounds that can protect against disease. Identifying genes involved in synthesis of bioactive compounds could enable breeding berry varieties with enhanced health benefits. Results: Toward this end, we annotated a draft blueberry genome assembly using RNA-Seq data from five stages of berry fruit development and ripening. Genome-guided assembly of RNA-Seq read alignments combined with output from ab initio gene finders produced around 60,000 gene models, of which more than half were similar to proteins from other species, typically the grape Vitis vinifera. Comparison of gene models to the PlantCyc database of metabolic pathway enzymes identified candidate genes involved in synthesis of bioactive compounds, including bixin, an apocarotenoid with potential disease-fighting properties, and defense-related cyanogenic glycosides, which are toxic. Cyanogenic glycoside (CG) biosynthetic enzymes were highly expressed in green fruit, and a candidate CG detoxification enzyme was up regulated during fruit ripening. Candidate genes for ethylene, anthocyanin, and 400 other biosynthetic pathways were also identified. RNA-Seq expression profiling showed that blueberry growth, maturation, and ripening involve dynamic gene expression changes, including coordinated up and down regulation of metabolic pathway enzymes, cell growth-related genes, and putative transcriptional regulators. Analysis of RNA-seq alignments also identified developmentally regulated alternative splicing, promoter use, and 3′ end formation. Conclusions: We report genome sequence, gene models, functional annotations, and RNA-Seq expression data which provide an important new resource enabling high throughput studies in blueberry. RNA-Seq data are freely available for visualization in Integrated Genome Browser, and analysis code is available from the git repository at

Origins and impacts of new exons

Origins and impacts of new exons
Jason Merkin*, Ping Chen*, Sampsa Hautaniemi, Christopher Burge

Mammalian genes are typically broken into several protein-coding and non-coding exons, but the evolutionary origins and functions of new exons are not well understood. Here, we analyzed patterns of exon gain using deep cDNA sequencing data from several mammals and one bird, identifying thousands of species- and lineage-specific exons. While exons conserved across mammals are mostly protein-coding and constitutively spliced, species-specific exons were mostly located in 5′ untranslated regions and alternatively spliced. New exons most often derived from unique intronic sequence rather than repetitive elements, and were associated with upstream intronic deletions, increased nucleosome occupancy and RNA polymerase II pausing. Surprisingly, exon gain was associated with increased gene expression, but only in tissues where the exon was included, suggesting that splicing enhances steady-state mRNA levels and that changes in splicing represent a major contributor to the evolution of gene expression.

Genome-wide Comparative Analysis Reveals Possible Common Ancestors of NBS Domain Containing Genes in Hybrid Citrus sinensis Genome and Original Citrus clementina Genome

Genome-wide Comparative Analysis Reveals Possible Common Ancestors of NBS Domain Containing Genes in Hybrid Citrus sinensis Genome and Original Citrus clementina Genome

Yunsheng Wang, Lijuan Zhou, Dazhi Li, Amy Lawton-Rauh, Pradip K. Srimani, Liangying Dai, Yongping Duan, Feng Luo

Background Recently available whole genome sequences of three citrus species: one Citrus clementina and two Citrus sinensis genomes have made it possible to understand the features of candidate disease resistance genes with nucleotide-binding sites (NBS) domain in Citrus and how NBS genes differ between hybrid and original Citrus species. Result We identified and re-annotated NBS genes from three citrus genomes and found similar numbers of NBS genes in those citrus genomes. Phylogenetic analysis of all citrus NBS genes across three genomes showed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different groups that contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three citrus genomes. This suggests that NBS genes in three citrus genomes may come from shared ancestral origins. We also mapped the re-sequenced reads of three pomelo and three Mandarin orange genomes onto the Citrus sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genome. The homologous NBS genes in pomelo and mandarin may explain why the NBS genes in their hybrid Citrus sinensis are similar to those in Citrus clementina in this study. Furthermore, sequence variation amongst citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different citrus genomes. Conclusion Our comparative analyses yield valuable insight into the understanding of the structure, evolution and organization of NBS genes in Citrus genomes. There are significantly more NBS genes in Citrus genomes compared to other plant species. NBS genes in hybrid C. sinensis genomes are very similar to those in progenitor C. clementina genome and they may be derived from possible common ancestral gene copies. Furthermore, our comprehensive analysis showed that there are three groups of plant NBS genes while CC-containing NBS genes can be divided into two groups.

Genome sequencing of the perciform fish Larimichthys crocea provides insights into stress adaptation

Genome sequencing of the perciform fish Larimichthys crocea provides insights into stress adaptation

Jingqun Ao, Yinnan Mu, Li-Xin Xiang, DingDing Fan, MingJi Feng, Shicui Zhang, Qiong Shi, Lv-Yun Zhu, Ting Li, Yang Ding, Li Nie, Qiuhua Li, Wei-ren Dong, Liang Jiang, Bing Sun, XinHui Zhang, Mingyu Li, Hai-Qi Zhang, ShangBo Xie, YaBing Zhu, XuanTing Jiang, Xianhui Wang, Pengfei Mu, Wei Chen, Zhen Yue, Zhuo Wang, Jun Wang, Jian-Zhong Shao, Xinhua Chen

The large yellow croaker Larimichthys crocea (L. crocea) is one of the most economically important marine fish in China and East Asian countries. It also exhibits peculiar behavioral and physiological characteristics, especially sensitive to various environmental stresses, such as hypoxia and air exposure. These traits may render L. crocea a good model for investigating the response mechanisms to environmental stress. To understand the molecular and genetic mechanisms underlying the adaptation and response of L. crocea to environmental stress, we sequenced and assembled the genome of L. crocea using a bacterial artificial chromosome and whole-genome shotgun hierarchical strategy. The final genome assembly was 679 Mb, with a contig N50 of 63.11 kb and a scaffold N50 of 1.03 Mb, containing 25,401 protein-coding genes. Gene families underlying adaptive behaviours, such as vision-related crystallins, olfactory receptors, and auditory sense-related genes, were significantly expanded in the genome of L. crocea relative to those of other vertebrates. Transcriptome analyses of the hypoxia-exposed L. crocea brain revealed new aspects of neuro-endocrine-immune/metabolism regulatory networks that may help the fish to avoid cerebral inflammatory injury and maintain energy balance under hypoxia. Proteomics data demonstrate that skin mucus of the air-exposed L. crocea had a complex composition, with an unexpectedly high number of proteins (3,209), suggesting its multiple protective mechanisms involved in antioxidant functions, oxygen transport, immune defence, and osmotic and ionic regulation. Our results provide novel insights into the mechanisms of fish adaptation and response to hypoxia and air exposure.