On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al.
Claudiu I Bandea
In a recent article entitled On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE, Graur et al. dismantle ENCODEs evidence and conclusion that 80% of the human genome is functional. However, the article by Graur et al. contains assumptions and statements that are questionable. Primarily, the authors limit their evaluation of DNAs biological functions to informational roles, sidestepping putative non-informational functions. Here, I bring forward an old hypothesis on the evolution of genome size and on the role of so called junk DNA (jDNA), which might explain C-value enigma. According to this hypothesis, the jDNA functions as a defense mechanism against insertion mutagenesis by endogenous and exogenous inserting elements such as retroviruses, thereby protecting informational DNA sequences from inactivation or alteration of their expression. Notably, this model couples the mechanisms and the selective forces responsible for the origin of jDNA with its putative protective biological function, which represents a classic case of fighting fire with fire. One of the key tenets of this theory is that in humans and many other species, jDNAs serves as a protective mechanism against insertional oncogenic transformation. As an adaptive defense mechanism, the amount of protective DNA varies from one species to another based on the rate of its origin, insertional mutagenesis activity, and evolutionary constraints on genome size.
The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana
Erik Wijnker, Geo Velikkakam James, Jia Ding, Frank Becker, Jonas R. Klasen, Vimal Rawat, Beth A. Rowan, Daniel F. de Jong, C. Bastiaan de Snoo, Luis Zapata, Bruno Huettel, Hans de Jong, Stephan Ossowski, Detlef Weigel, Maarten Koornneef, Joost J.B. Keurentjes, Korbinian Schneeberger
(Submitted on 13 Nov 2013)
Knowledge of the exact distribution of meiotic crossovers (COs) and gene conversions (GCs) is essential for understanding many aspects of population genetics and evolution, from haplotype structure and long-distance genetic linkage to the generation of new allelic variants of genes. To this end, we resequenced the four products of 13 meiotic tetrads along with 10 doubled haploids derived from Arabidopsis thaliana hybrids. GC detection through short reads has previously been confounded by genomic rearrangements. Rigid filtering for misaligned reads allowed GC identification at high accuracy and revealed an ~80-kb transposition, which undergoes copy-number changes mediated by meiotic recombination. Non-crossover associated GCs were extremely rare most likely due to their short average length of ~25-50 bp, which is significantly shorter than the length of CO associated GCs. Overall, recombination preferentially targeted non-methylated nucleosome-free regions at gene promoters, which showed significant enrichment of two sequence motifs.
Sequencing and characterisation of rearrangements in three S. pastorianus strains reveals the presence of chimeric genes and gives evidence of breakpoint reuse
Sarah K. Hewitt, Ian Donaldson, Simon C. Lovell, Daniela Delneri
(Submitted on 8 Nov 2013)
Gross chromosomal rearrangements have the potential to be evolutionarily advantageous to an adapting organism. The generation of a hybrid species increases opportunity for recombination by bringing together two homologous genomes. We sought to define the location of genomic rearrangements in three strains of Saccharomyces pastorianus, a natural lager-brewing yeast hybrid of Saccharomyces cerevisiae and Saccharomyces eubayanus, using whole genome shotgun sequencing. Each strain of S. pastorianus has lost species-specific portions of its genome and has undergone extensive recombination, producing chimeric chromosomes. We predicted 30 breakpoints that we confirmed at the single nucleotide level by designing species-specific primers that flank each breakpoint, and then sequencing the PCR product. These rearrangements are the result of recombination between areas of homology between the two subgenomes, rather than repetitive elements such as transposons or tRNAs. Interestingly, 28/30 S. cerevisiae- S. eubayanus recombination breakpoints are located within genic regions, generating chimeric genes. Furthermore we show evidence for the reuse of two breakpoints, located in HSP82 and KEM1, in strains of proposed independent origin.
Comparative Assembly Hubs: Web Accessible Browsers for Comparative Genomics
Ngan Nguyen, Glenn Hickey, Brian J. Raney, Joel Armstrong, Hiram Clawson, Ann Zweig, Jim Kent, David Haussler, Benedict Paten
(Submitted on 5 Nov 2013)
We introduce a pipeline to easily generate collections of web accessible UCSC genome browsers interrelated by an alignment. Using the alignment, all annotations and the alignment itself can be efficiently viewed with reference to any genome in the collection, symmetrically. A new, intelligently scaled alignment display makes it simple to view all changes between the genomes at all levels of resolution, from substitutions to complex structural rearrangements, including duplications.
Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication
Carlos Nossa, Paul Havlak, Jia-Xing Yue, Jie Lv, Kim Vincent, H Jane Brockmann, Nicholas H Putnam
(Submitted on 28 Sep 2013)
Horseshoe crabs are marine arthropods with a fossil record extending back approximately 450 million years. They exhibit remarkable morphological stability over their long evolutionary history, retaining a number of ancestral arthropod traits, and are often cited as examples of “living fossils.” As arthropods, they belong to the Ecdysozoa}, an ancient super-phylum whose sequenced genomes (including insects and nematodes) have thus far shown more divergence from the ancestral pattern of eumetazoan genome organization than cnidarians, deuterostomes, and lophotrochozoans. However, much of ecdysozoan diversity remains unrepresented in comparative genomic analyses. Here we use a new strategy of combined de novo assembly and genetic mapping to examine the chromosome-scale genome organization of the Atlantic horseshoe crab Limulus polyphemus. We constructed a genetic linkage map of this 2.7 Gbp genome by sequencing the nuclear DNA of 34 wild-collected, full-sibling embryos and their parents at a mean redundancy of 1.1x per sample. The map includes 84,307 sequence markers and 5,775 candidate conserved protein coding genes. Comparison to other metazoan genomes shows that the L. polyphemus genome preserves ancestral bilaterian linkage groups, and that a common ancestor of modern horseshoe crabs underwent one or more ancient whole genome duplications (WGDs) ~ 300 MYA, followed by extensive chromosome fusion.
The epigenome of evolving Drosophila neo-sex chromosomes: dosage compensation and heterochromatin formation
Qi Zhou, Christopher E. Ellison, Vera B. Kaiser, Artyom A. Alekseyenko, Andrey A. Gorchakov, Doris Bachtrog
(Submitted on 26 Sep 2013)
Drosophila Y chromosomes are composed entirely of silent heterochromatin, while male X chromosomes have highly accessible chromatin and are hypertranscribed due to dosage compensation. Here, we dissect the molecular mechanisms and functional pressures driving heterochromatin formation and dosage compensation of the recently formed neo-sex chromosomes of Drosophila miranda. We show that the onset of heterochromatin formation on the neo-Y is triggered by an accumulation of repetitive DNA. The neo-X has evolved partial dosage compensation and we find that diverse mutational paths have been utilized to establish several dozen novel binding consensus motifs for the dosage compensation complex on the neo-X, including simple point mutations at pre-binding sites, insertion and deletion mutations, microsatellite expansions, or tandem amplification of weak binding sites. Spreading of these silencing or activating chromatin modifications to adjacent regions results in massive mis-expression of neo-sex linked genes, and little correspondence between functionality of genes and their silencing on the neo-Y or dosage compensation on the neo-X. Intriguingly, the genomic regions being targeted by the dosage compensation complex on the neo-X and those becoming heterochromatic on the neo-Y show little overlap, possibly reflecting different propensities along the ancestral chromosome to adopt active or repressive chromatin configurations. Our findings have broad implications for current models of sex chromosome evolution, and demonstrate how mechanistic constraints can limit evolutionary adaptations. Our study also highlights how evolution can follow predictable genetic trajectories, by repeatedly acquiring the same 21-bp consensus motif for recruitment of the dosage compensation complex, yet utilizing a diverse array of random mutational changes to attain the same phenotypic outcome.
Lineage specific reductions in genome size in salamanders are associated with increased rates of mutation
John Herrick, Bianca Sclavi
(Submitted on 4 Aug 2013)
Very low levels of genetic diversity have been reported in vertebrates with large genomes, notably salamanders and lungfish [1-3]. Interpreting differences in heterozygosity, which reflects genetic diversity in a population, is complicated because levels of heterozygosity vary widely between conspecific populations, and correlate with many different physiological and demographic variables such as body size and effective population size. Here we return to the question of genetic variability in salamanders, and report on the relationship between evolutionary rates and genome sizes in five different salamander families. We found that rates of evolution are exceptionally low in salamanders as a group. Evolutionary rates are as low as those reported for cartilaginous fish, which have the slowest rates recorded so far in vertebrates . We also found that, independent of life history, salamanders with the smallest genomes (14 pg) are evolving at rates two to three times faster than salamanders with the largest genomes (>50 pg). After accounting for evolutionary duration, we conclude that speciation events in salamanders are associated with contractions in genome size and concomitant increases in mutation and diversification rates.
Synteny in Bacterial Genomes: Inference, Organization and Evolution
Ivan Junier, Olivier Rivoire
(Submitted on 16 Jul 2013)
Genes are not located randomly along genomes. Synteny, the conservation of their relative positions in genomes of different species, reflects fundamental constraints on natural evolution. We present approaches to infer pairs of co-localized genes from multiple genomes, describe their organization, and study their evolutionary history. In bacterial genomes, we thus identify synteny units, or “syntons”, which are clusters of proximal genes that encompass and extend operons. The size distribution of these syntons divide them into large syntons, which correspond to fundamental macro-molecular complexes of bacteria, and smaller ones, which display a remarkable exponential distribution of sizes. This distribution is “universal” in two respects: it holds for vastly different genomes, and for functionally distinct genes. Similar statistical laws have been reported previously in studies of bacterial genomes, and generally attributed to purifying selection or neutral processes. Here, we perform a new analysis based on the concept of parsimony, and find that the prevailing evolutionary mechanism behind the formation of small syntons is a selective process of gene aggregation. Altogether, our results imply a common evolutionary process that selectively shapes the organization and diversity of bacterial genomes.
Complete sequence representation across human X and Y centromeric regions
Karen E. Hayden, Yulia Newton, Miten Jain, Nicolas Altemose, Huntington F. Willard, Jim Kent
(Submitted on 28 Jun 2013)
The human genome remains incomplete, with multi-megabase sized gaps representing the endogenous centromeres and other heterochromatic regions. These regions are commonly enriched with long arrays of near-identical tandem repeats, known as satellite DNAs, that offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies, and as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we present a locally ordered assembly across two haploid human satellite arrays on chromosomes X and Y, resulting in an initial linear representation of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence, we evaluate sites within the arrays for short-read mappability and chromosome specificity. As satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 372 individuals from distinct human populations. In doing so, we identify two ancient satellite array variants in both X and Y centromeres as determined by array length and sequence composition. This study provides an initial linear representation and comprehensive sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.
The complex hybrid origins of the root knot nematodes revealed through comparative genomics
David H Lunt, Sujai Kumar, Georgios Koutsovoulos, Mark L Blaxter
(Submitted on 26 Jun 2013)
Meloidogyne root knot nematodes (RKN) can infect most of the world’s agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by mitotic parthenogenesis and are suggested to originate by interspecific hybridizations between unknown parental taxa. We sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that it was involved in the hybrid origin of the tropical mitotic parthenogen M. incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species M. hapla was used to trace the evolutionary history of these species’ genomes, demonstrating that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and agricultural success may be related to this hybridization, producing transgressive variation on which natural selection acts. Studying RKN variation via individual marker loci may fail due to the species’ convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally.