Long-term survival of duplicate genes despite absence of subfunctionalized expression.
Xun Lan , Jonathan K Pritchard
Gene duplication is a fundamental process in genome evolution. However, young duplicates are frequently degraded into pseudogenes by loss-of-function mutations. One standard model proposes that the main path for duplicate genes to avoid mutational destruction is by rapidly evolving subfunctionalized expression profiles. We examined this hypothesis using RNA-seq data from 46 human tissues. Surprisingly, we find that sub- or neofunctionalization of expression evolves very slowly, and is rare among duplications that arose within the placental mammals. Most mammalian duplicates are located in tandem and have highly correlated expression profiles, likely due to shared regulation, thus impeding subfunctionalization. Moreover, we also find that a large fraction of duplicate gene pairs exhibit a striking asymmetric pattern in which one gene has consistently higher expression. These asymmetrically expressed duplicates (AEDs) may persist for tens of millions of years, even though the lower-expressed copies tend to evolve under reduced selective constraint and are associated with fewer human diseases than their duplicate partners. We suggest that dosage-sharing of expression, rather than subfunctionalization, is more likely to be the initial factor enabling survival of duplicate gene pairs.
Entire genome transcription across evolutionary time exposes non-coding DNA to de novo gene emergence
Rafik Neme , Diethard Tautz
Even in the best studied Mammalian genomes, less than 5% of the total genome length is annotated as exonic. However, deep sequencing analysis in humans has shown that around 40% of the genome may be covered by poly-adenylated non-coding transcripts occurring at low levels. Their functional significance is unclear, and there has been a dispute whether they should be considered as noise of the transcriptional machinery. We propose that if such transcripts show some evolutionary stability they will serve as substrates for de novo gene evolution, i.e. gene emergence out of non-coding DNA. Here, we characterize the phylogenetic turnover of low-level poly-adenylated transcripts in a comprehensive sampling of populations, sub-species and species of the genus Mus, spanning a phylogenetic distance of about 10 Myr. We find evidence for more evolutionary stable gains of transcription than losses among closely related taxa, balanced by a loss of older transcripts across the whole phylogeny. We show that adding taxa increases the genomic transcript coverage and that no major transcript-free islands exist over time. This suggests that the entire genome can be transcribed into poly-adenylated RNA when viewed at an evolutionary time scale. Thus, any part of the “non-coding” genome can become subject to evolutionary functionalization via de novo gene evolution.
Recombining without hotspots: A comprehensive evolutionary portrait of recombination in two closely related species of Drosophila
Caiti Smukowski Heil , Chris Ellison , Matthew Dubin , Mohamed Noor
Meiotic recombination rate varies across the genome within and between individuals, populations, and species in virtually all taxa studied. In almost every species, this variation takes the form of discrete recombination hotspots, determined in Metazoans by a protein called PRDM9. Hotspots and their determinants have a profound effect on the genomic landscape, and share certain features that extend across the tree of life. Drosophila, in contrast, are anomalous in their absence of hotspots, PRDM9, and other species-specific differences in the determination of recombination. To better understand the evolution of meiosis and general patterns of recombination across diverse taxa, we present what may be the most comprehensive portrait of recombination to date, combining contemporary recombination estimates from each of two sister species along with historic estimates of recombination using linkage-disequilibrium-based approaches derived from sequence data from both species. Using Drosophila pseudoobscura and Drosophila miranda as a model system, we compare recombination rate between species at multiple scales, and we replicate the pattern seen in human-chimpanzee that recombination rate is conserved at broad scales and more divergent at finer scales. We also find evidence of a species-wide recombination modifier, resulting in both a present and historic genome wide elevation of recombination rates in D. miranda, and identify broad scale effects on recombination from the presence of an inter-species inversion. Finally, we reveal an unprecedented view of the distribution of recombination in D. pseudoobscura, illustrating patterns of linked selection and where recombination is taking place. Overall, by combining these estimation approaches, we highlight key similarities and differences in recombination between Drosophila and other organisms.
Transcriptome Differences between Alternative Sex Determining Genotypes in the House Fly, Musca domestica
Richard P Meisel , Jeffrey G Scott , Andrew G Clark
Sex determination evolves rapidly, often because of turnover of the genes at the top of the pathway. The house fly, Musca domestica, has a multifactorial sex determination system, allowing us to identify the selective forces responsible for the evolutionary turnover of sex determination in action. There is a male determining factor, M, on the Y chromosome (YM), which is probably the ancestral state. An M factor on the third chromosome (IIIM) has reached high frequencies in multiple populations across the world, but the evolutionary forces responsible for the invasion of IIIM are not resolved. To test if the IIIM chromosome invaded because of sex-specific selection pressures, we used mRNA sequencing to determine if isogenic males that differ only in the presence of the YM or IIIM chromosome have different gene expression profiles. We find that more genes are differentially expressed between YM and IIIM males in testis than head, and that genes with male-biased expression are most likely to be differentially expressed between YM and IIIM males. This suggests that male phenotypes, especially those related to male fertility, are more likely to be affected by the male-determining chromosome, supporting the hypothesis that sex-specific selection acts on alleles linked to the male-determining locus driving evolutionary turnover in the sex determination pathway. We additionally find that IIIM males have a “masculinized” gene expression profile, suggesting that the IIIM chromosome has accumulated an excess of male- beneficial alleles because of its male-limited transmission.
The interplay between DNA methylation and sequence divergence in recent human evolution
Irene Hernando-Herraez , Holger Heyn , Marcos Fernandez-Callejo , Enrique Vidal , Hugo Fernandez-Bellon , Javier Prado-Martinez , Andrew J Sharp , Manel Esteller , Tomas Marques-Bonet
DNA methylation is a key regulatory mechanism in mammalian genomes. Despite the increasing knowledge about this epigenetic modification, the understanding of human epigenome evolution is in its infancy. We used whole genome bisulfite sequencing to study DNA methylation and nucleotide divergence between human and great apes. We identified 360 and 210 differentially hypo- and hypermethylated regions (DMRs) in humans compared to non-human primates and estimated that 20% and 36% of these regions, respectively, were detectable throughout several human tissues. Human DMRs were enriched for specific histone modifications and contrary to expectations, the majority were located distal to transcription start sites, highlighting the importance of regions outside the direct regulatory context. We also found a significant excess of endogenous retrovirus elements in human-specific hypomethylated regions suggesting their association with local epigenetic changes. We also reported for the first time a close interplay between inter-species genetic and epigenetic variation in regions of incomplete lineage sorting, transcription factor binding sites and human differentially hypermethylated regions. Specifically, we observed an excess of human-specific substitutions in transcription factor binding sites located within human DMRs, suggesting that alteration of regulatory motifs underlies some human-specific methylation patterns. We also found that the acquisition of DNA hypermethylation in the human lineage is frequently coupled with a rapid evolution at nucleotide level in the neighborhood of these CpG sites. Taken together, our results reveal new insights into the mechanistic basis of human-specific DNA methylation patterns and the interpretation of inter-species non-coding variation.
Chromosome-scale shotgun assembly using an in vitro method for long-range linkage
Nicholas H. Putnam, Brendan O’Connell, Jonathan C. Stites, Brandon J. Rice, Andrew Fields, Paul D. Hartley, Charles W. Sugnet, David Haussler, Daniel S. Rokhsar, Richard E. Green
Subjects: Genomics (q-bio.GN); Biomolecules (q-bio.BM)
Long-range and highly accurate de novo assembly from short-read data is one of the most pressing challenges in genomics. Recently, it has been shown that read pairs generated by proximity ligation of DNA in chromatin of living tissue can address this problem. These data dramatically increase the scaffold contiguity of assemblies and provide haplotype phasing information. Here, we describe a simpler approach (“Chicago”) based on in vitro reconstituted chromatin. We generated two Chicago datasets with human DNA and used a new software pipeline (“HiRise”) to construct a highly accurate de novo assembly and scaffolding of a human genome with scaffold N50 of 30 Mb. We also demonstrated the utility of Chicago for improving existing assemblies by re-assembling and scaffolding the genome of the American alligator. With a single library and one lane of Illumina HiSeq sequencing, we increased the scaffold N50 of the American alligator from 508 kb to 10 Mb. Our method uses established molecular biology procedures and can be used to analyze any genome, as it requires only about 5 micrograms of DNA as the starting material.
Linkage Disequilibrium and Inversion-Typing of the Drosophila melanogaster Genome Reference Panel
David Houle , Eladio J. Marquez
We calculated the linkage disequilibrium between all pairs of variants in the Drosophila Genome Reference Panel, and make available the list of all highly correlated SNPs for use in association studies. Seventy-three percent of variant SNPs are correlated at r2>0.5 with at least one other SNP, and the mean number of correlated SNPs per variant over the whole genome is 64.9. Disequilibrium between distant SNPs is also common when minor allele frequency (MAF) is low: 24% of SNPs with MAF<0.1 are highly correlated with SNPs more than 100kb distant. While SNPs within regions with polymorphic inversions are highly correlated with somewhat larger numbers of SNPs, and these correlated SNPs are on average farther away, the probability that a SNP in such regions is highly correlated with at least one other SNP is very similar to SNPs outside inversions. Previous karyotyping of the DGRP lines has been inconsistent, and we used LD and genotype to investigate these discrepancies. When previous studies agreed on inversion karyotype, our analysis was almost perfectly concordant with those assignments. In discordant cases, and for inversion heterozygotes, our results suggest errors in two previous analyses, or discordance between genotype and karyotype. Heterozygosities of chromosome arms are in many cases surprisingly highly correlated, suggesting strong epsistatic selection during the inbreeding and maintenance of the DGRP lines.