The epigenome of evolving Drosophila neo-sex chromosomes: dosage compensation and heterochromatin formation

The epigenome of evolving Drosophila neo-sex chromosomes: dosage compensation and heterochromatin formation
Qi Zhou, Christopher E. Ellison, Vera B. Kaiser, Artyom A. Alekseyenko, Andrey A. Gorchakov, Doris Bachtrog
(Submitted on 26 Sep 2013)

Drosophila Y chromosomes are composed entirely of silent heterochromatin, while male X chromosomes have highly accessible chromatin and are hypertranscribed due to dosage compensation. Here, we dissect the molecular mechanisms and functional pressures driving heterochromatin formation and dosage compensation of the recently formed neo-sex chromosomes of Drosophila miranda. We show that the onset of heterochromatin formation on the neo-Y is triggered by an accumulation of repetitive DNA. The neo-X has evolved partial dosage compensation and we find that diverse mutational paths have been utilized to establish several dozen novel binding consensus motifs for the dosage compensation complex on the neo-X, including simple point mutations at pre-binding sites, insertion and deletion mutations, microsatellite expansions, or tandem amplification of weak binding sites. Spreading of these silencing or activating chromatin modifications to adjacent regions results in massive mis-expression of neo-sex linked genes, and little correspondence between functionality of genes and their silencing on the neo-Y or dosage compensation on the neo-X. Intriguingly, the genomic regions being targeted by the dosage compensation complex on the neo-X and those becoming heterochromatic on the neo-Y show little overlap, possibly reflecting different propensities along the ancestral chromosome to adopt active or repressive chromatin configurations. Our findings have broad implications for current models of sex chromosome evolution, and demonstrate how mechanistic constraints can limit evolutionary adaptations. Our study also highlights how evolution can follow predictable genetic trajectories, by repeatedly acquiring the same 21-bp consensus motif for recruitment of the dosage compensation complex, yet utilizing a diverse array of random mutational changes to attain the same phenotypic outcome.

Lineage specific reductions in genome size in salamanders are associated with increased rates of mutation

Lineage specific reductions in genome size in salamanders are associated with increased rates of mutation
John Herrick, Bianca Sclavi
(Submitted on 4 Aug 2013)

Very low levels of genetic diversity have been reported in vertebrates with large genomes, notably salamanders and lungfish [1-3]. Interpreting differences in heterozygosity, which reflects genetic diversity in a population, is complicated because levels of heterozygosity vary widely between conspecific populations, and correlate with many different physiological and demographic variables such as body size and effective population size. Here we return to the question of genetic variability in salamanders, and report on the relationship between evolutionary rates and genome sizes in five different salamander families. We found that rates of evolution are exceptionally low in salamanders as a group. Evolutionary rates are as low as those reported for cartilaginous fish, which have the slowest rates recorded so far in vertebrates [4]. We also found that, independent of life history, salamanders with the smallest genomes (14 pg) are evolving at rates two to three times faster than salamanders with the largest genomes (>50 pg). After accounting for evolutionary duration, we conclude that speciation events in salamanders are associated with contractions in genome size and concomitant increases in mutation and diversification rates.

Synteny in Bacterial Genomes: Inference, Organization and Evolution

Synteny in Bacterial Genomes: Inference, Organization and Evolution
Ivan Junier, Olivier Rivoire
(Submitted on 16 Jul 2013)

Genes are not located randomly along genomes. Synteny, the conservation of their relative positions in genomes of different species, reflects fundamental constraints on natural evolution. We present approaches to infer pairs of co-localized genes from multiple genomes, describe their organization, and study their evolutionary history. In bacterial genomes, we thus identify synteny units, or “syntons”, which are clusters of proximal genes that encompass and extend operons. The size distribution of these syntons divide them into large syntons, which correspond to fundamental macro-molecular complexes of bacteria, and smaller ones, which display a remarkable exponential distribution of sizes. This distribution is “universal” in two respects: it holds for vastly different genomes, and for functionally distinct genes. Similar statistical laws have been reported previously in studies of bacterial genomes, and generally attributed to purifying selection or neutral processes. Here, we perform a new analysis based on the concept of parsimony, and find that the prevailing evolutionary mechanism behind the formation of small syntons is a selective process of gene aggregation. Altogether, our results imply a common evolutionary process that selectively shapes the organization and diversity of bacterial genomes.

Complete sequence representation across human X and Y centromeric regions

Complete sequence representation across human X and Y centromeric regions
Karen E. Hayden, Yulia Newton, Miten Jain, Nicolas Altemose, Huntington F. Willard, Jim Kent
(Submitted on 28 Jun 2013)

The human genome remains incomplete, with multi-megabase sized gaps representing the endogenous centromeres and other heterochromatic regions. These regions are commonly enriched with long arrays of near-identical tandem repeats, known as satellite DNAs, that offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies, and as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we present a locally ordered assembly across two haploid human satellite arrays on chromosomes X and Y, resulting in an initial linear representation of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence, we evaluate sites within the arrays for short-read mappability and chromosome specificity. As satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 372 individuals from distinct human populations. In doing so, we identify two ancient satellite array variants in both X and Y centromeres as determined by array length and sequence composition. This study provides an initial linear representation and comprehensive sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.

The complex hybrid origins of the root knot nematodes revealed through comparative genomics

The complex hybrid origins of the root knot nematodes revealed through comparative genomics
David H Lunt, Sujai Kumar, Georgios Koutsovoulos, Mark L Blaxter
(Submitted on 26 Jun 2013)

Meloidogyne root knot nematodes (RKN) can infect most of the world’s agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by mitotic parthenogenesis and are suggested to originate by interspecific hybridizations between unknown parental taxa. We sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that it was involved in the hybrid origin of the tropical mitotic parthenogen M. incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species M. hapla was used to trace the evolutionary history of these species’ genomes, demonstrating that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and agricultural success may be related to this hybridization, producing transgressive variation on which natural selection acts. Studying RKN variation via individual marker loci may fail due to the species’ convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally.

Clusters of microRNAs emerge by new hairpins in existing transcripts

Clusters of microRNAs emerge by new hairpins in existing transcripts
Antonio Marco, Maria Ninova, Matthew Ronshaugen, Sam Griffiths-Jones
(Submitted on 9 Apr 2013)

Genetic linkage may result in the expression of multiple products from a single polycistronic transcript, under the control of a single promoter. In animals, protein-coding polycistronic transcripts are rare. However, microRNAs are frequently clustered in the genomes of animals and plants, and these clusters are often transcribed as a single unit. The evolution of microRNA clusters has been the subject of much speculation, and a selective advantage of clusters of functionally related microRNAs is often proposed. However, the origin of microRNA clusters has not been so far systematically explored. Here we study the evolution of all microRNA clusters in Drosophila melanogaster, and suggest a number of models for their emergence. We observed that a majority of microRNA clusters arose by the de novo formation of new microRNA-like hairpins in existing microRNA transcripts. Some clusters also emerged by tandem duplication of a single microRNA. Comparative genomics show that these clusters, once formed, are unlikely to split or undergo rearrangements. We did not find any instances of clusters appearing by rearrangement of pre-existing microRNA genes. We propose a model for microRNA cluster origin and evolution in which selection over one of the microRNAs in the cluster interferes with the evolution of the other tightly linked microRNAs. Our analysis suggests that the evolutionary study of microRNAs and other small RNAs must consider and account for linkage associations.

An algebraic framework to sample the rearrangement histories of a cancer metagenome with double cut and join, duplication and deletion events

An algebraic framework to sample the rearrangement histories of a cancer metagenome with double cut and join, duplication and deletion events
Daniel R. Zerbino, Benedict Paten, Glenn Hickey, David Haussler
(Submitted on 22 Mar 2013)

Algorithms to study structural variants (SV) in whole genome sequencing (WGS) cancer datasets are currently unable to sample the entire space of rearrangements while allowing for copy number variations (CNV). In addition, rearrangement theory has up to now focused on fully assembled genomes, not on fragmentary observations on mixed genome populations. This affects the applicability of current methods to actual cancer datasets, which are produced from short read sequencing of a heterogeneous population of cells. We show how basic linear algebra can be used to describe and sample the set of possible sequences of SVs, extending the double cut and join (DCJ) model into the analysis of metagenomes. We also describe a functional pipeline which was run on simulated as well as experimental cancer datasets.

Major changes in the core developmental pathways of nematodes: Romanomermis culicivorax reveals the derived status of the Caenorhabditis elegans model

Major changes in the core developmental pathways of nematodes: Romanomermis culicivorax reveals the derived status of the Caenorhabditis elegans model
Philipp H. Schiffer, Michael Kroiher, Christopher Kraus, Georgios D. Koutsovoulos, Sujai Kumar, Julia I. R. Camps, Ndifon A. Nsah, Dominik Stappert, Krystalynne Morris, Peter Heger, Janine Altmüller, Peter Frommolt, Peter Nürnberg, W. Kelley Thomas, Mark L. Blaxter, Einhard Schierenberg
(Submitted on 17 Mar 2013)

Background Despite its status as a model organism, the development of Caenorhabditis elegans is not necessarily archetypical for nematodes. The phylum Nematoda is divided into the Chromadorea (indcludes C. elegans) and the Enoplea. Compared to C. elegans, enoplean nematodes have very different patterns of cell division and determination. Embryogenesis of the enoplean Romanomermis culicivorax has been studied in great detail, but the genetic circuitry underpinning development in this species is unknown. Results We created a draft genome of R. culicivorax and compared its developmental gene content with that of two nematodes, C. elegans and Trichinella spiralis (another enoplean), and a representative arthropod Tribolium castaneum. This genome evidence shows that R. culicivorax retains components of the conserved metazoan developmental toolkit lost in C. elegans. T. spiralis has independently lost even more of the toolkit than has C. elegans. However, the C. elegans toolkit is not simply depauperate, as many genes essential for embryogenesis in C. elegans are unique to this lineage, or have only extremely divergent homologues in R. culicivorax and T. spiralis. These data imply fundamental differences in the genetic programmes for early cell specification, inductive interactions, vulva formation and sex determination. Conclusions Thus nematodes, despite their apparent phylum-wide morphological conservatism, have evolved major differences in the molecular logic of their development. R. culicivorax serves as a tractable, contrasting model to C. elegans for understanding how divergent genomic and thus regulatory backgrounds can generate a conserved phenotype. The availability of the draft genome will promote use of R. culicivorax as a research model.

A Unifying Parsimony Model of Genome Evolution

A Unifying Parsimony Model of Genome Evolution
Benedict Paten, Daniel R. Zerbino, Glenn Hickey, David Haussler
(Submitted on 9 Mar 2013)

The study of molecular evolution rests on the classical fields of population genetics and systematics, but the increasing availability of DNA sequence data has broadened the field in the last decades, leading to new theories and methodologies. This includes parsimony and maximum likelihood methods of phylogenetic tree estimation, the theory of genome rearrangements, and the coalescent model with recombination. These all interact in the study of genome evolution, yet to date they have only been pursued in isolation. We present the first unified parsimony framework for the study of genome evolutionary histories that includes all of these aspects, proposing a graphical data structure called a history graph that is intended to form a practical basis for analysis. We define tractable upper and lower bound parsimony cost functions on history graphs that incorporate both substitutions and rearrangements. We demonstrate that these bounds become tight for a special unambiguous type of history graph called an ancestral variation graph (AVG), which captures in its combinatorial structure the operations required in an evolutionary history. For an input history graph G, we demonstrate that there exists a finite set of interpretations of G that contains all minimal (lacking extraneous elements) and most parsimonious AVG interpretations of G. We define a partial order over this set and an associated set of sampling moves that can be used to explore these DNA histories. These results generalise and conceptually simplify the problem so that we can sample evolutionary histories using parsimony cost functions that account for all substitutions and rearrangements in the presence of duplications.

A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes
John A. Capra, Melissa J. Hubisz, Dennis Kostka, Katherine S. Pollard, Adam Siepel
(Submitted on 9 Mar 2013)

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts ~1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. We also find some evidence that they contribute to the fixation of deleterious alleles, including an enrichment for disease-associated polymorphisms. These tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages; they supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available.