Phylogenetic analysis supports a link between DUF1220 domain number and primate brain expansion

Phylogenetic analysis supports a link between DUF1220 domain number and primate brain expansion

Fabian Zimmer , Stephen H Montgomery
doi: http://dx.doi.org/10.1101/018077

The expansion of DUF1220 domain copy number during human evolution is a dramatic example of rapid and repeated domain duplication. However, the phenotypic relevance of DUF1220 dosage is unknown. Although patterns of expression, homology and disease associations suggest a role in cortical development, this hypothesis has not been robustly tested using phylogenetic methods. Here, we estimate DUF1220 domain counts across 12 primate genomes using a nucleotide Hidden Markov Model. We then test a series of hypotheses designed to examine the potential evolutionary significance of DUF1220 copy number expansion. Our results suggest a robust association with brain size, and more specifically neocortex volume. In contradiction to previous hypotheses we find a strong association with postnatal brain development, but not with prenatal brain development. Our results provide further evidence of a conserved association between specific loci and brain size across primates, suggesting human brain evolution occurred through a continuation of existing processes.

The African wolf is a missing link in the wolf-like canid phylogeny

The African wolf is a missing link in the wolf-like canid phylogeny

Eli K. Rueness , Pål Trosvik , Anagaw Atickem , Claudio Sillero-Zubiri , Emiliano Trucchi
doi: http://dx.doi.org/10.1101/017996

Here we present the first genomic data for the African wolf (Canis aureus lupaster) and conclusively demonstrate that it is a unique taxon and not a hybrid between other canids. These animals are commonly misclassified as golden jackals (Canis aureus) and have never been included in any large-scale studies of canid diversity and biogeography, or in investigations of the early stages of dog domestication. Applying massive Restriction Site Associated DNA (RAD) sequencing, 110481 polymorphic sites across the genome of 7 individuals of African wolf were aligned and compared with other wolf-like canids (golden jackal, Holarctic grey wolf, Ethiopian wolf, side-striped jackal and domestic dog). Analyses of this extensive sequence dataset (ca. 8.5Mb) show conclusively that the African wolves represent a distinct taxon more closely related to the Holarctic grey wolf than to the golden jackal. Our results strongly indicate that the distribution of the golden jackal needs to be re-evaluated and point towards alternative hypotheses for the evolution of the rare and endemic Ethiopian wolf (Canis simensis). Furthermore, the extension of the grey wolf phylogeny and distribution opens new possible scenarios for the timing and location of dog domestication.

Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes

Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes

John O Woods , Matthew Z Tien , Edward M Marcotte
doi: http://dx.doi.org/10.1101/017947

Conserved genetic programs often predate the homologous structures and phenotypes to which they give rise; eyes, for example, have evolved several dozen times, but their development seems to involve a common set of conserved genes. Recently, the concept of orthologous phenotypes (or phenologs) offered a quantitative way to describe this property. Phenologs are phenotypes or diseases from separate species who share an unexpectedly large set of their associated gene orthologs. It has been shown that the phenotype pairs which make up a phenolog are mutually predictive in terms of the genes involved. Recently, we demonstrated the ranking of gene–phenotype association predictions using multiple phenologs from an array of species. In this work, we demonstrate a computational method which provides a more targeted view of the conserved pathways which give rise to diseases. Our approach involves the generation of synthetic pseudo-phenotypes made up of Boolean combinations (union, intersection, and difference) of the gene sets for phenotypes from our database. We search for diseases that overlap significantly with these Boolean phenotypes, and find a number of highly predictive combinations. While set unions produce less specific predictions (as expected), intersection and difference-based combinations appear to offer insights into extremely specific aspects of target diseases. For example, breast cancer is predicted by zebrafish methylmercury response minus metal ion response, with predictions MT-COI, JUN, SOD2, GADD45B, and BAX all involved in the pro-apoptotic response to reactive oxygen species, thought to be a key player in cancer. We also demonstrate predictions from Arabidopsis Boolean phenotypes for increased brown adipose tissue in mouse (salt stress response’s intersection with sucrose stimulus response); and for human myopathy (red light response minus water deprivation response). We demonstrate the ranking of predictions for human holoprosencephaly from the set intersections between each pair of a variety of closely-related zebrafish phenotypes. Our results suggest that Boolean phenolog combinations may provide a more informed insight into the conserved pathways underlying diseases than either regular phenologs or the naïve Bayes approach.

Bayesian Modeling of Epigenetic Variation in Multiple Human Cell Types

Bayesian Modeling of Epigenetic Variation in Multiple Human Cell Types

Yu Zhang , Feng Yue , Ross C. Hardison
doi: http://dx.doi.org/10.1101/018028

With high-throughput sequencing data generated for multiple epigenetic features in many cell types, a chief challenge is to explain the dynamics in multiple epigenomes that lead to differential regulation and phenotypes. We introduce a Bayesian framework for jointly annotating multiple epigenomes and detecting differential regulation among multiple cell types. Our method, IDEAS (integrative and discriminative epigenome annotation system), achieves superior power by modeling both position and cell type specific epigenetic activities. Using ENCODE data sets in 6 cell types, we identified epigenetic variation strongly associated with differential gene expression. The detected regions are significantly enriched in disease genetic variants with much stronger enrichment scores than achievable by existing methods, and the enriched phenotypes are highly relevant to the corresponding cell types. IDEAS is a powerful tool for integrative epigenome annotation and detection of variation, which could be of important utility in elucidating the interplay between genetics, gene regulation and diseases.

Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs

Methods for distinguishing between protein-coding and long noncoding RNAs and the elusive biological purpose of translation of long noncoding RNAs
Gali Housman , Igor Ulitsky
doi: http://dx.doi.org/10.1101/017889

Long noncoding RNAs (lncRNAs) are a diverse class of RNAs with increasingly appreciated functions in vertebrates, yet much of their biology remains poorly understood. In particular, it is unclear to what extent the current catalog of over 10,000 distinct annotated lncRNAs is indeed devoid of genes coding for proteins. Here we review the available computational and experimental schemes for distinguishing between recent genome-wide applications. We conclude that the model most consistent with available data is that a large number of mammalian lncRNAs undergo translation, but only a very small minority of such translation events result in stable and functional peptides. The outcome of the majority of the translation events and their potential biological purposes remain an intriguing topic for future investigation.

Predicting Carriers of Ongoing Selective Sweeps Without Knowledge of the Favored Allele

Predicting Carriers of Ongoing Selective Sweeps Without Knowledge of the Favored Allele
Roy Ronen , Glenn Tesler , Ali Akbari , Shay Zakov , Noah A Rosenberg , Vineet Bafna

Methods for detecting the genomic signatures of natural selection have been heavily studied, and they have been successful in identifying many selective sweeps. For most of these sweeps, the favored allele remains unknown, making it difficult to distinguish carriers of the sweep from non-carriers. In an ongoing selective sweep, carriers of the favored allele are likely to contain a future most recent common ancestor. Therefore, identifying them may prove useful in predicting the evolutionary trajectory — for example, in contexts involving drug-resistant pathogen strains or cancer subclones. The main contribution of this paper is the development and analysis of a new statistic, the Haplotype Allele Frequency (HAF) score. The HAF score, assigned to individual haplotypes in a sample, naturally captures many of the properties shared by haplotypes carrying a favored allele. We provide a theoretical framework for computing expected HAF scores under different evolutionary scenarios, and we validate the theoretical predictions with simulations. As an application of HAF score computations, we develop an algorithm (PreCIOSS: Predicting Carriers of Ongoing Selective Sweeps) to identify carriers of the favored allele in selective sweeps, and we demonstrate its power on simulations of both hard and soft sweeps, as well as on data from well-known sweeps in human populations.

Analysis of allele-specific expression reveals cis-regulatory changes associated with a recent mating system shift and floral adaptation in Capsella

Analysis of allele-specific expression reveals cis-regulatory changes associated with a recent mating system shift and floral adaptation in Capsella

Kim A Steige , Johan Reimegård , Daniel Koenig , Douglas G Scofield , Tanja Slotte
doi: http://dx.doi.org/10.1101/017749

Cis-regulatory changes have long been suggested to contribute to organismal adaptation. While cis-regulatory changes can now be identified on a transcriptome-wide scale, in most cases the adaptive significance and mechanistic basis of rapid cis-regulatory divergence remains unclear. Here, we have characterized cis-regulatory changes associated with recent adaptive floral evolution in the selfing plant Capsella rubella, which diverged from the outcrosser Capsella grandiflora less than 200 kya. We assessed allele-specific expression (ASE) in leaves and flower buds at a total of 18,452 genes in three interspecific F1 C. grandiflora x C. rubella hybrids. After accounting for technical variation and read-mapping biases using genomic reads, we estimate that an average of 44% of these genes show evidence of ASE, however only 6% show strong allelic expression biases. Flower buds, but not leaves, show an enrichment of genes with ASE in genomic regions responsible for phenotypic divergence between C. rubella and C. grandiflora. We further detected an excess of heterozygous transposable element (TE) insertions in the vicinity of genes with ASE, and TE insertions targeted by uniquely mapping 24-nt small RNAs were associated with reduced allelic expression of nearby genes. Our results suggest that cis-regulatory changes have been important for recent adaptive floral evolution in Capsella and that differences in TE dynamics between selfing and outcrossing species could be an important mechanism underlying rapid regulatory divergence.

Rapid host switching in generalist Campylobacter strains erodes the signal for tracing human infections

Rapid host switching in generalist Campylobacter strains erodes the signal for tracing human infections

Bethany L. Dearlove, Alison J. Cody, Ben Pascoe, Guillaume Méric, Daniel J. Wilson, Samuel K. Sheppard
(Submitted on 7 Apr 2015)

Campylobacter jejuni and Campylobacter coli are the biggest causes of bacterial gastroenteritis in the developed world, with human infections typically arising from zoonotic transmission associated with infected meat, especially poultry. Because this organism is not thought to survive well outside of the gut, host associated populations are genetically isolated to varying degrees. Therefore the likely origin of most Campylobacter strains can be determined by host-associated variation in the genome. This is instructive for characterizing the source of human infection at the population level. However, some common strains appear to have broad host ranges, hindering source attribution. Whole genome sequencing has the potential to reveal fine-scale genetic structure associated with host specificity within each of these strains.
We found that rates of zoonotic transmission among animal host species in ST-21, ST-45 and ST-828 clonal complexes were so high that the signal of host association is all but obliterated. We attributed 89% of clinical cases to a chicken source, 10% to cattle and 1% to pig. Our results reveal that common strains of C. jejuni and C. coli infectious to humans are adapted to a generalist lifestyle, permitting rapid transmission between different hosts. Furthermore, they show that the weak signal of host association within these complexes presents a challenge for pinpointing the source of clinical infections, underlining the view that whole genome sequencing, powerful though it is, cannot substitute for intensive sampling of suspected transmission reservoirs.

Adaptive evolution of anti-viral siRNAi genes in bumblebees

Adaptive evolution of anti-viral siRNAi genes in bumblebees
Sophie Helbing , Michael Lattorff
doi: http://dx.doi.org/10.1101/017681

The high density of frequently interacting and closely related individuals in social insects enhance pathogen transmission and establishment within colonies. Group-mediated behavior supporting immune defenses tend to decrease selection acting on immune genes. Along with low effective population sizes this will result in relaxed constraint and rapid evolution of genes of the immune system. Here we show that sociality is the main driver of selection in antiviral siRNAi genes in social bumblebees compared to their socially parasitic cuckoo bumblebees that lack a worker caste. RNAi genes show frequent positive selection at the codon level additionally supported by the occurrence of parallel evolution and their evolutionary rate is linked to their pathway specific position with genes directly interacting with viruses showing the highest rates of molecular evolution. We suggest that indeed higher pathogen load in social insects drive adaptive evolution of immune genes, if not compensated by behavior.

Ultra-large alignments using Phylogeny-aware Profiles

Ultra-large alignments using Phylogeny-aware Profiles

Nam-phuong Nguyen, Siavash Mirarab, Keerthana Kumar, Tandy Warnow
(Submitted on 5 Apr 2015)

Many biological questions, including the estimation of deep evolutionary histories and the detection of remote homology between protein sequences, rely upon multiple sequence alignments (MSAs) and phylogenetic trees of large datasets. However, accurate large-scale multiple sequence alignment is very difficult, especially when the dataset contains fragmentary sequences. We present UPP, an MSA method that uses a new machine learning technique – the Ensemble of Hidden Markov Models – that we propose here. UPP produces highly accurate alignments for both nucleotide and amino acid sequences, even on ultra-large datasets or datasets containing fragmentary sequences. UPP is available at this https URL