Sequencing and characterisation of rearrangements in three S. pastorianus strains reveals the presence of chimeric genes and gives evidence of breakpoint reuse

Sequencing and characterisation of rearrangements in three S. pastorianus strains reveals the presence of chimeric genes and gives evidence of breakpoint reuse
Sarah K. Hewitt, Ian Donaldson, Simon C. Lovell, Daniela Delneri
(Submitted on 8 Nov 2013)

Gross chromosomal rearrangements have the potential to be evolutionarily advantageous to an adapting organism. The generation of a hybrid species increases opportunity for recombination by bringing together two homologous genomes. We sought to define the location of genomic rearrangements in three strains of Saccharomyces pastorianus, a natural lager-brewing yeast hybrid of Saccharomyces cerevisiae and Saccharomyces eubayanus, using whole genome shotgun sequencing. Each strain of S. pastorianus has lost species-specific portions of its genome and has undergone extensive recombination, producing chimeric chromosomes. We predicted 30 breakpoints that we confirmed at the single nucleotide level by designing species-specific primers that flank each breakpoint, and then sequencing the PCR product. These rearrangements are the result of recombination between areas of homology between the two subgenomes, rather than repetitive elements such as transposons or tRNAs. Interestingly, 28/30 S. cerevisiae- S. eubayanus recombination breakpoints are located within genic regions, generating chimeric genes. Furthermore we show evidence for the reuse of two breakpoints, located in HSP82 and KEM1, in strains of proposed independent origin.

The hemagglutinin mutation E391K of pandemic 2009 influenza revisited

The hemagglutinin mutation E391K of pandemic 2009 influenza revisited
Jan P. Radomski, Piotr Płoński, Włodzimierz Zagórski-Ostoja
(Submitted on 8 Nov 2013)

Phylogenetic analyses based on small to moderately sized sets of sequential data lead to overestimating mutation rates in influenza hemagglutinin (HA) by at least an order of magnitude. Two major underlying reasons are: the incomplete lineage sorting, and a possible absence in the analyzed sequences set some of key missing ancestors. Additionally, during neighbor joining tree reconstruction each mutation is considered equally important, regardless of its nature. Here we have implemented a heuristic method optimizing site dependent factors weighting differently 1st, 2nd, and 3rd codon position mutations, allowing to extricate incorrectly attributed sub-clades. The least squares regression analysis of distribution of frequencies for all mutations observed on a partially disentangled tree for a large set of unique 3243 HA sequences, along all nucleotide positions, was performed for all mutations as well as for non-equivalent amino acid mutations: in both cases demonstrating almost flat gradients, with a very slight downward slope towards the 3′-end positions. The mean mutation rates per sequence per year were 3.83*10^-4 for the all mutations, and 9.64*10^-5 for the non-equivalent ones.

Comparative Assembly Hubs: Web Accessible Browsers for Comparative Genomics

Comparative Assembly Hubs: Web Accessible Browsers for Comparative Genomics
Ngan Nguyen, Glenn Hickey, Brian J. Raney, Joel Armstrong, Hiram Clawson, Ann Zweig, Jim Kent, David Haussler, Benedict Paten
(Submitted on 5 Nov 2013)

We introduce a pipeline to easily generate collections of web accessible UCSC genome browsers interrelated by an alignment. Using the alignment, all annotations and the alignment itself can be efficiently viewed with reference to any genome in the collection, symmetrically. A new, intelligently scaled alignment display makes it simple to view all changes between the genomes at all levels of resolution, from substitutions to complex structural rearrangements, including duplications.

Reducing pervasive false positive identical-by-descent segments detected by large-scale pedigree analysis

Reducing pervasive false positive identical-by-descent segments detected by large-scale pedigree analysis
Eric Y. Durand, Nicholas Eriksson, Cory Y. McLean
(Submitted on 5 Nov 2013)

Analysis of genomic segments shared identical-by-descent (IBD) between individuals is fundamental to many genetic applications, but IBD detection accuracy in non-simulated data is largely unknown. Using 25,432 genotyped European individuals, and exploiting known familial relationships in 2,952 father-mother-child trios contained therein, we identify a false positive rate over 67% for short (2-4 centiMorgan) segments. We introduce a novel, computationally-efficient, haplotype-based metric that enables accurate IBD detection on population-scale datasets.

The inference of gene trees with species trees

The inference of gene trees with species trees
Gergely J. Szöllosi, Eric Tannier, Vincent Daubin, Bastien Boussau
(Submitted on 4 Nov 2013)

Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.

SMASH: A Benchmarking Toolkit for Variant Calling

SMASH: A Benchmarking Toolkit for Variant Calling
Ameet Talwalkar, Jesse Liptrap, Julie Newcomb, Christopher Hartl, Jonathan Terhorst, Kristal Curtis, Ma’ayan Bresler, Yun S. Song, Michael I. Jordan, David Patterson
(Submitted on 31 Oct 2013)

Motivation: Computational methods are essential to extract actionable information from raw sequencing data, and to thus fulfill the promise of next-generation sequencing technology. Unfortunately, computational tools developed to call variants from human sequencing data disagree on many of their predictions, and current methods to evaluate accuracy and computational performance are ad-hoc and incomplete. Agreement on benchmarking variant calling methods would stimulate development of genomic processing tools and facilitate communication among researchers.
Results: We propose a benchmarking methodology for evaluating variant calling algorithms called the SMASH toolkit. We generate synthetic datasets, organize and interpret a wide range of existing benchmarking data for real genomes, and propose a set of accuracy and computational performance metrics for evaluating variant calling methods on this benchmarking data. Moreover, we illustrate the utility of SMASH to evaluate the performance of some leading single nucleotide polymorphism (SNP), indel, and structural variant calling algorithms.
Availability: We provide free and open access online to the SMASH toolkit, along with detailed documentation, at smash.cs.berkeley.edu.

Most viewed on Haldane’s Sieve: October 2013

The most viewed preprints on Haldane’s Sieve this month were:

An HMM-based Comparative Genomic Framework for Detecting Introgression in Eukaryotes

An HMM-based Comparative Genomic Framework for Detecting Introgression in Eukaryotes
Kevin J. Liu, Jingxuan Dai, Kathy Truong, Ying Song, Michael H. Kohn, Luay Nakhleh
(Submitted on 30 Oct 2013)

One outcome of interspecific hybridization and subsequent effects of evolutionary forces is introgression, which is the integration of genetic material from one species into the genome of an individual in another species. The evolution of several groups of eukaryotic species has involved hybridization, and cases of adaptation through introgression have been already established. In this work, we report on a new comparative genomic framework for detecting introgression in genomes, called PhyloNet-HMM, which combines phylogenetic networks, that capture reticulate evolutionary relationships among genomes, with hidden Markov models (HMMs), that capture dependencies within genomes. A novel aspect of our work is that it also accounts for incomplete lineage sorting and dependence across loci.
Application of our model to variation data from chromosome 7 in the mouse (Mus musculus domesticus) genome detects a recently reported adaptive introgression event involving the rodent poison resistance gene Vkorc1, in addition to other newly detected introgression regions. Based on our analysis, it is estimated that about 12% of all sites withinchromosome 7 are of introgressive origin (these cover about 18 Mbp of chromosome 7, and over 300 genes). Further, our model detects no introgression in two negative control data sets. Our work provides a powerful framework for systematic analysis of introgression while simultaneously accounting for dependence across sites, point mutations, recombination, and ancestral polymorphism.

Global patterns of sex-biased migrations in humans

Global patterns of sex-biased migrations in humans
Chuan-Chao Wang, Li Jin, Hui Li
(Submitted on 29 Oct 2013)

A series of studies have revealed the among-population components of genetic variation are higher for the paternal Y chromosome than for the maternal mitochondrial DNA (mtDNA), which indicates sex-biased migrations in human populations. However, this phenomenon might be also an ascertainment bias due to nonrandom sampling of SNPs. To eliminate the possible bias, we used the whole Y chromosome and mtDNA sequence data of 491 individuals from the 1000 Genomes Project Phase I to address the sex-biased migration dispute. We found that genetic differentiation between populations was higher for Y chromosome than for the mtDNA at global scales. The migration rate of female might be three times higher than that of male, assuming the effective population size is the same for male and female.

Can we predict the mutation rate at the single nucleotide scale in the human genome?

Can we predict the mutation rate at the single nucleotide scale in the human genome?
Adam Eyre-Walker, Ying Chen
(Submitted on 29 Oct 2013)

It has been recently claimed that it is possible to predict the rate of de novo mutation of each site in the human genome with almost perfect accuracy (Michaelson et al. (2012) Cell, 151, 1431-1442). We show that this claim is unwarranted. By considering the correlation between the rate of de novo mutation and the predictions from the model of Michaelson et al., we show that there could be substantial unexplained variance in the mutation rate. We also demonstrate that the model of Michaelson et al. fails to capture a major component of the variation in the mutation rate, that which is local but not associated with simple context.