Chromosome-scale shotgun assembly using an in vitro method for long-range linkage

Chromosome-scale shotgun assembly using an in vitro method for long-range linkage
Nicholas H. Putnam, Brendan O’Connell, Jonathan C. Stites, Brandon J. Rice, Andrew Fields, Paul D. Hartley, Charles W. Sugnet, David Haussler, Daniel S. Rokhsar, Richard E. Green
Subjects: Genomics (q-bio.GN); Biomolecules (q-bio.BM)

Long-range and highly accurate de novo assembly from short-read data is one of the most pressing challenges in genomics. Recently, it has been shown that read pairs generated by proximity ligation of DNA in chromatin of living tissue can address this problem. These data dramatically increase the scaffold contiguity of assemblies and provide haplotype phasing information. Here, we describe a simpler approach (“Chicago”) based on in vitro reconstituted chromatin. We generated two Chicago datasets with human DNA and used a new software pipeline (“HiRise”) to construct a highly accurate de novo assembly and scaffolding of a human genome with scaffold N50 of 30 Mb. We also demonstrated the utility of Chicago for improving existing assemblies by re-assembling and scaffolding the genome of the American alligator. With a single library and one lane of Illumina HiSeq sequencing, we increased the scaffold N50 of the American alligator from 508 kb to 10 Mb. Our method uses established molecular biology procedures and can be used to analyze any genome, as it requires only about 5 micrograms of DNA as the starting material.

Linkage Disequilibrium and Inversion-Typing of the Drosophila melanogaster Genome Reference Panel

Linkage Disequilibrium and Inversion-Typing of the Drosophila melanogaster Genome Reference Panel
David Houle , Eladio J. Marquez
doi: http://dx.doi.org/10.1101/014936

We calculated the linkage disequilibrium between all pairs of variants in the Drosophila Genome Reference Panel, and make available the list of all highly correlated SNPs for use in association studies. Seventy-three percent of variant SNPs are correlated at r2>0.5 with at least one other SNP, and the mean number of correlated SNPs per variant over the whole genome is 64.9. Disequilibrium between distant SNPs is also common when minor allele frequency (MAF) is low: 24% of SNPs with MAF<0.1 are highly correlated with SNPs more than 100kb distant. While SNPs within regions with polymorphic inversions are highly correlated with somewhat larger numbers of SNPs, and these correlated SNPs are on average farther away, the probability that a SNP in such regions is highly correlated with at least one other SNP is very similar to SNPs outside inversions. Previous karyotyping of the DGRP lines has been inconsistent, and we used LD and genotype to investigate these discrepancies. When previous studies agreed on inversion karyotype, our analysis was almost perfectly concordant with those assignments. In discordant cases, and for inversion heterozygotes, our results suggest errors in two previous analyses, or discordance between genotype and karyotype. Heterozygosities of chromosome arms are in many cases surprisingly highly correlated, suggesting strong epsistatic selection during the inbreeding and maintenance of the DGRP lines.

Evolution of selenophosphate synthetases: emergence and relocation of function through independent duplications and recurrent subfunctionalization

Evolution of selenophosphate synthetases: emergence and relocation of function through independent duplications and recurrent subfunctionalization
Marco Mariotti , Didac Santesmasses , Salvador Capella-Gutierrez , Andrea Mateo , Carme Arnan , Rory Johnson , Salvatore D’Aniello , Sun Hee Yim , Vadim N Gladyshev , Florenci Serras , Montserrat Corominas , Toni Gabaldon , Roderic Guigo
doi: http://dx.doi.org/10.1101/014928

SPS catalyzes the synthesis of selenophosphate, the selenium donor for the synthesis of the amino acid selenocysteine (Sec), incorporated in selenoproteins in response to the UGA codon. SPS is unique among proteins of the selenoprotein biosynthesis machinery in that it is, in many species, a selenoprotein itself, although, as in all selenoproteins, Sec is often replaced by cysteine (Cys). In metazoan genomes we found, however, SPS genes with lineage specific substitutions other than Sec or Cys. Our results show that these non-Sec, non-Cys SPS genes originated through a number of independent gene duplications of diverse molecular origin from an ancestral selenoprotein SPS gene. Although of independent origin, complementation assays in fly mutants show that these genes share a common function, which most likely emerged in the ancestral metazoan gene. This function appears to be unrelated to selenophosphate synthesis, since all genomes encoding selenoproteins contain Sec or Cys SPS genes (SPS2), but those containing only non-Sec, non-Cys SPS genes (SPS1) do not encode selenoproteins. Thus, in SPS genes, through parallel duplications and subsequent convergent subfunctionalization, two functions initially carried by a single gene are recurrently segregated at two different loci. RNA structures enhancing the readthrough of the Sec-UGA codon in SPS genes, which may be traced back to prokaryotes, played a key role in this process. The SPS evolutionary history in metazoans constitute a remarkable example of the emergence and evolution of gene function. We have been able to trace this history with unusual detail thanks to the singular feature of SPS genes, wherein the amino acid at a single site determines protein function, and, ultimately, the evolutionary fate of an entire class of genes.

Evolution of Conditional Cooperativity Between HOXA11 and FOXO1 Through Allosteric Regulation

Evolution of Conditional Cooperativity Between HOXA11 and FOXO1 Through Allosteric Regulation

Mauris C. Nnamani, Soumya Ganguly, Vincent J. Lynch, Laura S. Mizoue, Yingchun Tong, Heather Darling, Monika Fuxreiter, Jens Meiler, Gunter P. Wagner
doi: http://dx.doi.org/10.1101/014381

Transcription factors (TFs) play multiple roles in different cells and stages of development. Given this multitude of functional roles it has been assumed that TFs are evolutionarily highly constrained. Here we investigate the molecular mechanisms for the origin of a derived functional interaction between two TFs that play a key role in mammalian pregnancy, HOXA11 and FOXO1. We have previously shown that the regulatory role of HOXA11 in mammalian endometrial stromal cells requires an interaction with FOXO1, and that the physical interaction between these proteins evolved long before their functional cooperativity. Through a combination of functional, biochemical, and structural approaches, we demonstrate that the derived functional cooperativity between HOXA11 and FOXO1 is due to derived allosteric regulation of HOXA11 by FOXO1. This study shows that TF function can evolve through changes affecting the functional output of a pre-existing protein complex.

Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin

Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin

Austin G Meyer, Claus O Wilke
doi: http://dx.doi.org/10.1101/014183

We have carried out a comprehensive analysis of the determinants of human influenza A H3 hemagglutinin evolution, considering three distinct predictors of evolutionary variation at in- dividual sites: solvent accessibility (as a proxy for protein fold stability and/or conservation), experimental epitope sites (as a proxy for host immune bias), and proximity to the receptor- binding region (as a proxy for protein function). We have found that these three predictors individually explain approximately 15% of the variation in site-wise dN/dS. However, the sol- vent accessibility and proximity predictors seem largely independent of each other, while the epitope sites are not. In combination, solvent accessibility and proximity explain 32% of the variation in dN/dS. Incorporating experimental epitope sites into the model adds only an ad- ditional 2 percentage points. We have also found that the historical H3 epitope sites, which date back to the 1980s and 1990s, show only weak overlap with the latest experimental epi- tope data, and we have defined a novel set of four epitope groups which are experimentally supported and cluster in 3D space. Finally, sites with dN/dS > 1, i.e., the sites most likely driving seasonal immune escape, are not correctly predicted by either historical or experimental epitope sites, but only by proximity to the receptor-binding region. In summary, proximity to the receptor-binding region, rather than host immune bias, seems to be the primary determinant of H3 immune-escape evolution.

Rates of karyotypic evolution in Estrildid finches differ between island and continental clades

Rates of karyotypic evolution in Estrildid finches differ between island and continental clades
Daniel M Hooper, Trevor D Price
doi: http://dx.doi.org/10.1101/013987

Reasons why chromosomal rearrangements spread to fixation and frequently distinguish related taxa remain poorly understood. We used cytological descriptions of karyotype to identify large pericentric inversions between species of Estrildid finches (family Estrildidae) and a time-dated phylogeny to assess the genomic, geographic, and phylogenetic context of karyotype evolution in this group. Inversions between finch species fixed at an average rate of one every 2.26 My. Inversions were twice as likely to fix on the sex chromosomes compared to the autosomes, possibly a result of their repeat density, and inversion fixation rate for all chromosomes scales with range size. Alternative mutagenic input explanations are not supported, as the number of inversions on a chromosome does not correlate with its length or map size. Inversions have fixed 3.3× faster in three continental clades than in two island chain clades, and fixation rate correlates with both range size and the number of sympatric species pairs. These results point to adaptation as the dominant mechanism driving fixation and suggest a role for gene flow in karyotype divergence. A review shows that the rapid karyotype evolution observed in the Estrildid finches appears to be more general across birds, and by implication other understudied taxa.

Ecological patterns of genome size variation in salamanders

Ecological patterns of genome size variation in salamanders
Bianca Sclavi, John Herrick
Comments: 19 Pages, 4 figures, 1 supplementary figure
Subjects: Genomics (q-bio.GN); Populations and Evolution (q-bio.PE)

Salamanders (urodela) have among the largest vertebrate genomes, ranging in size from 10 to over 80 pg. The urodela are divided into ten extant families each with a characteristic range in genome size. Although changes in genome size often occur randomly and in the absence of selection pressure, non-random patterns of genome size variation are evident among specific vertebrate lineages. Here we report that genome size in salamander families varies inversely with species richness and other ecological factors: clades that began radiating earlier (older crown age) tend to have smaller genomes, higher levels of diversity and larger geographical ranges. These observations support the hypothesis that urodel families with larger genomes either have a lower propensity to diversify or are more vulnerable to extinction than families with smaller genomes.