Single-cell RNA-Sequencing (scRNA-Seq) has become the most widely used high-throughput method for transcription profiling of individual cells. Systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies. Surprisingly, these issues have received minimal attention in published studies based on scRNA-Seq technology. We examined data from five published studies and found that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we found that the proportion of genes reported as expressed explains a substantial part of observed variability and that this quantity varies systematically across experimental batches. Furthermore, we found that the implemented experimental designs confounded outcomes of interest with batch effects, a design that can bring into question some of the conclusions of these studies. Finally, we propose a simple experimental design that can ameliorate the effect of theses systematic errors have on downstream results.
Yearly Archives: 2015
Fitness-valley crossing with generalized parent-offspring transmission
Fitness-valley crossing with generalized parent-offspring transmission
Simple and ubiquitous gene interactions create rugged fitness landscapes composed of coadapted gene complexes separated by “valleys” of low fitness. Crossing such fitness valleys allows a population to escape suboptimal local fitness peaks to become better adapted. This is the premise of Sewall Wright’s shifting balance process. Here we generalize the theory of fitness-valley crossing in the two-locus, biallelic case by allowing bias in parent-offspring transmission. This generalization extends the existing mathematical framework to genetic systems with segregation distortion and uniparental inheritance. Our results are also flexible enough to provide insight into shifts between alternate stable states in cultural systems with “transmission valleys”. Using a semi-deterministic analysis and a stochastic diffusion approximation, we focus on the limiting step in valley crossing: the first appearance of the genotype on the new fitness peak whose lineage will eventually fix. We then apply our results to specific cases of segregation distortion, uniparental inheritance, and cultural transmission. Segregation distortion favouring mutant alleles facilitates crossing most when recombination and mutation are rare, i.e., scenarios where crossing is otherwise unlikely. Interactions with more mutable genes (e.g., uniparental inherited cytoplasmic elements) substantially reduce crossing times. Despite component traits being passed on poorly in the previous cultural background, small advantages in the transmission of a new combination of cultural traits can greatly facilitate a cultural transition. While peak shifts are unlikely under many of the common assumptions of population genetic theory, relaxing some of these assumptions can promote fitness-valley crossing.
Hybridization capture using RAD probes (hyRAD), a new tool for performing genomic analyses on museum collection specimens.
In the recent years, many protocols aimed at reproducibly sequencing reduced-genome subsets in non-model organisms have been published. Among them, RAD-sequencing is one of the most widely used. It relies on digesting DNA with specific restriction enzymes and performing size selection on the resulting fragments. Despite its utility, this method is of a limited use with degraded DNA samples, such as those isolated from museum specimens, as these are either less likely to harbor fragments long enough to comprise two restriction sites making possible ligation of the technical sequences required or performing size selection of the resulting fragments. In addition, RAD-sequencing also reveals a suboptimal technique when applied to an evolutionary scale larger than the intra-specific level, as polymophisms in the restriction sites cause loci dropout. Here, we address both of these limitations by a novel method called hybridization RAD (hyRAD). In this method, biotinylated RAD fragments, covering a random fraction of the genome, are used as baits for capturing homologous fragments from samples processed through a classical genomic shotgun sequencing protocol. This simple and cost- effective approach allows sequencing orthologous sequences even from highly degraded DNA samples, opening new avenues of research in the field of museum genomics. Not relying on the restriction site presence, it improves among-sample loci coverage, and can be applied to broader phylogenetic scales. In a trial study, hyRAD allowed us to obtain a large set of orthologous loci from fresh and museum samples from a non-model butterfly species, with over 10.000 single nucleotide polymorphisms present in all eight analyzed specimens, including 58 years old museum samples.
Genome wide estimates of mutation rates and spectrum in Schizosaccharomyces pombe indicate CpG sites are highly mutagenic despite the absence of DNA methylation
We accumulated mutations for 1952 generations in 79 initially identical, haploid lines of the fission yeast Schizosaccharomyces pombe and then performed whole-genome sequencing to determine the mutation rates and spectrum. We captured 696 spontaneous mutations across the 79 mutation accumulation lines. We compared the mutation spectrum and rate to another model ascomycetous yeast, the budding yeast Saccharomyces cerevisiae. While the two organisms are approximately 600 million years diverged from each other, they share similar life histories, genome size and genomic G/C content. We found that Sc. pombe and S. cerevisiae have similar mutation rates, contrary to what was expected given Sc. pombe’s smaller reported effective population size. Sc. pombe’s also exhibits a strong insertion bias in comparison to S. cerevisiae,. Intriguingly, we observed an increased mutation rate at cytosine nucleotides, specifically CpG nucleotides, which is also seen in S. cerevisiae. However, the absence of methylation in Sc. pombe and the pattern of mutation at these sites, primarily C→ A as opposed to C→T, strongly suggest that the increased mutation rate is not caused by deamination of methylated cytosines. This result implies that the high mutability of CpG dinucleotides in other species may be caused in part by an additional mechanism than methylation.
Implications of simplified linkage equilibrium SNP simulation
Implications of simplified linkage equilibrium SNP simulation
In a recent paper published in PNAS (Golan et al. 2014), residual maximum likelihood (REML) seriously underestimated genetic variance explained by genomewide single nucleotide polymorphism when using a case-control design. It was concluded that Haseman–Elston regression (denoted as PCGC in their paper) should be used instead of REML. Their conclusions were based on results from simplified linkage equilibrium SNP simulation (SLES), which the authors acknowledged may be unrealistic. We found that their simulation, SLES, unrealistically inflated the correlation between the eigenvectors of the genomic relationship matrix and disease status to values that are rarely observed in real data analyses. With a more realistic simulation that the authors failed to carry out (as they noted in their paper), we showed that there was no such inflated correlation between the eigenvectors of the genomic relationship matrix and disease status. Because REML uses the eigensystem of covariance structure, the inflated correlation artefactually constrained its estimates. We compared SNP-heritabilities from SLES and a more realistic simulation, showing that there was a substantial difference between the REML estimates from the two simulation strategies. Finally, we presented that there was no difference between REML and PCGC in real data analyses. This pattern from real data results differed strikingly from the pattern in the simulation study of Golan et al. One needs to be cautious of results drawn from SLES.
DNA-metabarcoding uncovers the diversity of soil-inhabiting fungi in the tropical island of Puerto Rico
The site-frequency spectrum associated with Xi-coalescents
The site-frequency spectrum associated with Xi-coalescents
The mysterious orphans of Mycoplasmataceae
The mysterious orphans of Mycoplasmataceae
Phylogenetic community structure metrics and null models: a review with new methods and software
Phylogenetic community structure metrics and null models: a review with new methods and software