Rapid host switching in generalist Campylobacter strains erodes the signal for tracing human infections

Rapid host switching in generalist Campylobacter strains erodes the signal for tracing human infections

Bethany L. Dearlove, Alison J. Cody, Ben Pascoe, Guillaume Méric, Daniel J. Wilson, Samuel K. Sheppard
(Submitted on 7 Apr 2015)

Campylobacter jejuni and Campylobacter coli are the biggest causes of bacterial gastroenteritis in the developed world, with human infections typically arising from zoonotic transmission associated with infected meat, especially poultry. Because this organism is not thought to survive well outside of the gut, host associated populations are genetically isolated to varying degrees. Therefore the likely origin of most Campylobacter strains can be determined by host-associated variation in the genome. This is instructive for characterizing the source of human infection at the population level. However, some common strains appear to have broad host ranges, hindering source attribution. Whole genome sequencing has the potential to reveal fine-scale genetic structure associated with host specificity within each of these strains.
We found that rates of zoonotic transmission among animal host species in ST-21, ST-45 and ST-828 clonal complexes were so high that the signal of host association is all but obliterated. We attributed 89% of clinical cases to a chicken source, 10% to cattle and 1% to pig. Our results reveal that common strains of C. jejuni and C. coli infectious to humans are adapted to a generalist lifestyle, permitting rapid transmission between different hosts. Furthermore, they show that the weak signal of host association within these complexes presents a challenge for pinpointing the source of clinical infections, underlining the view that whole genome sequencing, powerful though it is, cannot substitute for intensive sampling of suspected transmission reservoirs.

Ultra-large alignments using Phylogeny-aware Profiles

Ultra-large alignments using Phylogeny-aware Profiles

Nam-phuong Nguyen, Siavash Mirarab, Keerthana Kumar, Tandy Warnow
(Submitted on 5 Apr 2015)

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments (MSAs) and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, an MSA method that uses a new machine learning technique – the Ensemble of Hidden Markov Models – that we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at this https URL

Phylogenomic analyses support traditional relationships within Cnidaria

Phylogenomic analyses support traditional relationships within Cnidaria

Felipe Zapata , Freya E Goetz , Stephen A Smith , Mark Howison , Stefan Siebert , Samuel Church , Steven M Sanders , Cheryl Lewis Ames , Catherine S McFadden , Scott C France , Marymegan Daly , Allen G Collins , Steven HD Haddock , Casey Dunn , Paulyn Cartwright
doi: http://dx.doi.org/10.1101/017632

Cnidaria, the sister group to Bilateria, is a highly diverse group of animals in terms of morphology, lifecycles, ecology, and development. How this diversity originated and evolved is not well understood because phylogenetic relationships among major cnidarian lineages are unclear, and recent studies present contrasting phylogenetic hypotheses. Here, we use transcriptome data from 15 newly-sequenced species in combination with 26 publicly available genomes and transcriptomes to assess phylogenetic relationships among major cnidarian lineages. Phylogenetic analyses using different partition schemes and models of molecular evolution, as well as topology tests for alternative phylogenetic relationships, support the monophyly of Medusozoa, Anthozoa, Octocorallia, Hydrozoa, and a clade consisting of Staurozoa, Cubozoa, and Scyphozoa. Support for the monophyly of Hexacorallia is weak due to the equivocal position of Ceriantharia. Taken together, these results further resolve deep cnidarian relationships, largely support traditional phylogenetic views on relationships, and provide a historical framework for studying the evolutionary processes involved in one of the most ancient animal radiations.

Genomic prediction of celiac disease targeting HLA-positive individuals

Genomic prediction of celiac disease targeting HLA-positive individuals

Gad Abraham , Alexia Rohmer , Jason A Tye-Din , Michael Inouye
doi: http://dx.doi.org/10.1101/017608

Background: Genomic prediction aims to leverage genome-wide genetic data towards better disease diagnostics and risk scores. We have previously published a genomic risk score (GRS) for celiac disease (CD), a common and highly heritable autoimmune disease, which differentiates between CD cases and population-based controls at a clinically-relevant predictive level, improving upon other gene-based approaches. HLA risk haplotypes, particularly HLA-DQ2.5, are necessary but not sufficient for CD, with at least one HLA risk haplotype present in up to half of most Caucasian populations. Here, we assess a genomic prediction strategy that specifically targets this common genetic susceptibility subtype, utilizing a supervised learning procedure for CD that leverages known HLA-DQ2.5 risk. Methods: Using L1/L2-regularized support-vector machines trained on large European case-control datasets, we constructed novel CD GRSs specific to individuals with HLA-DQ2.5 risk haplotypes (GRS-DQ2.5) and compared them with the predictive power of the existing CD GRS (GRS14) as well as two haplotype-based approaches, externally validating the results in a North American case-control study. Results: Consistent with previous observations, both the existing GRS14 and the GRS-DQ2.5 had better predictive performance than the HLA haplotype approaches. GRS-DQ2.5 models, based on directly genotyped or imputed markers, achieved similar levels of predictive performance (AUC = 0.718—0.73), which were substantially higher than those obtained from the DQ2.5 zygosity alone (AUC = 0.558), the HLA risk haplotype method (AUC = 0.634), or the generic GRS14 (AUC = 0.679). In a screening model of at-risk individuals, the GRS-DQ2.5 lowered the number of unnecessary follow-up tests for CD across most sensitivity levels. Relative to a baseline implicating all DQ2.5-positive individuals for follow-up, the GRS-DQ2.5 resulted in a net saving of 2.2 unnecessary follow-up tests for each justified test while still capturing 90% of DQ2.5-positive CD cases. Conclusions: Genomic risk scores for CD that target genetically at-risk sub-groups improve predictive performance beyond traditional approaches and may represent a useful strategy for prioritizing individuals at increase risk of disease, thus potentially reducing unnecessary follow-up diagnostic tests.

Testing for ancient selection using cross-population allele frequency differentiation

Testing for ancient selection using cross-population allele frequency differentiation

Fernando Racimo
doi: http://dx.doi.org/10.1101/017566
AbstractInfo/HistoryMetrics Preview PDF
Abstract

A powerful way to detect selection in a population is by modeling local allele frequency changes in a particular region of the genome under scenarios of selection and neutrality, and finding which model is most compatible with the data. Chen et al. (2010) developed a composite likelihood method called XP-CLR that uses an outgroup population to detect departures from neutrality which could be compatible with hard or soft sweeps, at linked sites near a beneficial allele. However, this method is most sensitive to recent selection and may miss selective events that happened a long time ago. To overcome this, we developed an extension of XP-CLR that jointly models the behavior of a selected allele in a three-population tree. Our method – called 3P-CLR – outperforms XP-CLR when testing for selection that occurred before two populations split from each other, and can distinguish between those events and events that occurred specifically in each of the populations after the split. We applied our new test to population genomic data from the 1000 Genomes Project, to search for selective sweeps that occurred before the split of Africans and Eurasians, but after their split from Neanderthals, and that could have presumably led to the fixation of modern-human-specific phenotypes. We also searched for sweep events that occurred in East Asians, Europeans and the ancestors of both populations, after their split from Africans.

Genome of octoploid plant maca (Lepidium meyenii) illuminates genomic basis for high altitude adaptation in the central Andes

Genome of octoploid plant maca (Lepidium meyenii) illuminates genomic basis for high altitude adaptation in the central Andes

Jun Sheng , Wei Chen , Yang Dong , Liangsheng Zhang , Jing Zhang , Yang Tian , Liang Yan , Guanghui Zhang , Xiao Wang , Yan Zeng , Jiajin Zhang , Xiao Ma , Yuntao Tan , Ni Long , Yangzi Wang , Yujin Ma , Yu Xue , Shumei Hao , Shengchao Yang , Wen Wang
doi: http://dx.doi.org/10.1101/017590

Maca (Lepidium meyenii Walp, 2n = 8x = 64) of Brassicaceae family is an Andean economic plant cultivated on the 4000-4500 meters central sierra in Peru. Considering the rapid uplift of central Andes occurred 5 to 10 million years ago (Mya), an evolutionary question arises on how plants like maca acquire high altitude adaptation within short geological period. Here, we report the high-quality genome assembly of maca, in which two close-spaced maca-specific whole genome duplications (WGDs, ~ 6.7 Mya) were identified. Comparative genomics between maca and close-related Brassicaceae species revealed expansions of maca genes and gene families involved in abiotic stress response, hormone signaling pathway and secondary metabolite biosynthesis via WGDs. Retention and subsequent evolution of many duplicated genes may account for the morphological and physiological changes (i.e. small leaf shape and loss of vernalization) in maca for high altitude environment. Additionally, some duplicated maca genes under positive selection were identified with functions in morphological adaptation (i.e. MYB59) and development (i.e. GDPD5 and HDA9). Collectively, the octoploid maca genome sheds light on the important roles of WGDs in plant high altitude adaptation in the Andes.

Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis

Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis

Yuzhen Ye, Haixu Tang
(Submitted on 6 Apr 2015)

Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory characteristics of the microbial communities. Current metatranscriptomics projects are often carried out without matched metagenomic datasets (of the same microbial communities). For the projects that produce both metatranscriptomic and metagenomic datasets, their analyses are often not integrated. Metagenome assemblies are far from perfect, partially explaining why metagenome assemblies are not used for the analysis of metatranscriptomic datasets. Here we report a reads mapping algorithm for mapping of short reads onto a de Bruijn graph of assemblies. A hash table of junction k-mers (k-mers spanning branching structures in the de Bruijn graph) is used to facilitate fast mapping of reads to the graph. We developed an application of this mapping algorithm: a reference based approach to metatranscriptome assembly using graphs of metagenome assembly as the reference. Our results show that this new approach (called TAG) helps to assemble substantially more transcripts that otherwise would have been missed or truncated because of the fragmented nature of the reference metagenome. TAG was implemented in C++ and has been tested extensively on the linux platform. It is available for download as open source at this http URL

Denisovan Ancestry in East Eurasian and Native American Populations.

Denisovan Ancestry in East Eurasian and Native American Populations.

Pengfei Qin , Mark Stoneking
doi: http://dx.doi.org/10.1101/017475

Although initial studies suggested that Denisovan ancestry was found only in modern human populations from island Southeast Asia and Oceania, more recent studies have suggested that Denisovan ancestry may be more widespread. However, the geographic extent of Denisovan ancestry has not been determined, and moreover the relationship between the Denisovan ancestry in Oceania and that elsewhere has not been studied. Here we analyze genome-wide SNP data from 2493 individuals from 221 worldwide populations, and show that there is a widespread signal of a very low level of Denisovan ancestry across Eastern Eurasian and Native American (EE/NA) populations. We also verify a higher level of Denisovan ancestry in Oceania than that in EE/NA; the Denisovan ancestry in Oceania is correlated with the amount of New Guinea ancestry, but not the amount of Australian ancestry, indicating that recent gene flow from New Guinea likely accounts for signals of Denisovan ancestry across Oceania. However, Denisovan ancestry in EE/NA populations is equally correlated with their New Guinea or their Australian ancestry, suggesting a common source for the Denisovan ancestry in EE/NA and Oceanian populations. Our results suggest that Denisovan ancestry in EE/NA is derived either from common ancestry with, or gene flow from, the common ancestor of New Guineans and Australians, indicating a more complex history involving East Eurasians and Oceanians than previously suspected.

Mycobacterial infection induces a specific human innate immune response

Mycobacterial infection induces a specific human innate immune response

John D Blischak , Ludovic Tailleux , Amy Mitrano , Luis B Barreiro , Yoav Gilad
doi: http://dx.doi.org/10.1101/017483

The innate immune system provides the first response to pathogen infection and orchestrates the activation of the adaptive immune system. Though a large component of the innate immune response is common to all infections, pathogen-specific responses have been documented as well. The innate immune response is thought to be especially critical for fighting infection with Mycobacterium tuberculosis (MTB), the causative agent of tuberculosis (TB). While TB can be deadly, only 5-10% of individuals infected with MTB develop active disease. The risk for disease susceptibility is, at least partly, heritable. Studies of inter-individual variation in the innate immune response to MTB infection may therefore shed light on the genetic basis for variation in susceptibility to TB. Yet, to date, we still do not know which properties of the innate immune response are specific to MTB infection and which represent a general response to pathogen infection. To begin addressing this gap, we infected macrophages with eight different bacteria, including different MTB strains and related mycobacteria, and studied the transcriptional response to infection. Although the ensued gene regulatory responses were largely consistent across the bacterial infection treatments, we were able to identify a novel subset of genes whose regulation was affected specifically by infection with mycobacteria. Genetic variants that are associated with regulatory differences in these genes should be considered candidate loci for explaining inter-individual susceptibility TB.

Whole Genome Regulatory Variant Evaluation for Transcription Factor Binding

Whole Genome Regulatory Variant Evaluation for Transcription Factor Binding

Haoyang Zeng , Tatsunori Hashimoto , Daniel D. Kang , David K. Gifford
doi: http://dx.doi.org/10.1101/017392

Contemporary approaches to predict single nucleotide polymorphisms (SNPs) that alter transcription factor binding rely upon the sequence affinity of a transcription factor as represented by its canonical motif. WAVE (Whole-genome regulAtory Variants Evaluation) is a novel method for predicting more general regulatory variants that affect transcription factor binding, including those that fall outside of the canonical motif. WAVE learns a k-mer based generative model of transcription factor binding from ChIP-seq data and scores variants using its generative binding model. The k-mers learned by WAVE capture more sequence feature in transcription factor binding than a motif-based approach alone, including both a transcription factor’s canonical motif as well as associated co-factor motifs. WAVE significantly outperforms motif-based methods in predicting SNPs associated with allele-specific binding.