In the recent years, many protocols aimed at reproducibly sequencing reduced-genome subsets in non-model organisms have been published. Among them, RAD-sequencing is one of the most widely used. It relies on digesting DNA with specific restriction enzymes and performing size selection on the resulting fragments. Despite its utility, this method is of a limited use with degraded DNA samples, such as those isolated from museum specimens, as these are either less likely to harbor fragments long enough to comprise two restriction sites making possible ligation of the technical sequences required or performing size selection of the resulting fragments. In addition, RAD-sequencing also reveals a suboptimal technique when applied to an evolutionary scale larger than the intra-specific level, as polymophisms in the restriction sites cause loci dropout. Here, we address both of these limitations by a novel method called hybridization RAD (hyRAD). In this method, biotinylated RAD fragments, covering a random fraction of the genome, are used as baits for capturing homologous fragments from samples processed through a classical genomic shotgun sequencing protocol. This simple and cost- effective approach allows sequencing orthologous sequences even from highly degraded DNA samples, opening new avenues of research in the field of museum genomics. Not relying on the restriction site presence, it improves among-sample loci coverage, and can be applied to broader phylogenetic scales. In a trial study, hyRAD allowed us to obtain a large set of orthologous loci from fresh and museum samples from a non-model butterfly species, with over 10.000 single nucleotide polymorphisms present in all eight analyzed specimens, including 58 years old museum samples.
Category Archives: Uncategorized
SFS_CODE: More Efficient and Flexible Forward Simulations
SFS_CODE: More Efficient and Flexible Forward Simulations
SUMMARY: Modern implementations of forward population genetic simulations are efficient and flexible, enabling the exploration of complex models that may otherwise be intractable. Here we describe an updated version of SFS_CODE, which has increased efficiency and includes many novel features. Among these features is an arbitrary model of dominance, the ability to simulate partial and soft selective sweeps, as well as track the trajectories of mutations and/or ancestries across multiple populations under complex models that are not possible under a coalescent framework. We also release sfs_coder, a Python wrapper to SFS_CODE allowing the user to easily generate command lines for common models of demography, selection, and human genome structure, as well as parse and simulate phenotypes from SFS_CODE output. Availability and Implementation: Our open source software is written in C and Python, and are available under the GNU General Public License at http://sfscode.sourceforge.net. Contact: ryan.hernandez@ucsf.edu Supplementary information: Detailed usage information is available from the project website at http://sfscode.sourceforge.net.
Teaser: Individualized benchmarking and optimization of read mapping results for NGS data
Teaser: Individualized benchmarking and optimization of read mapping results for NGS data
Fitness-valley crossing with generalized parent-offspring transmission
Fitness-valley crossing with generalized parent-offspring transmission
Simple and ubiquitous gene interactions create rugged fitness landscapes composed of coadapted gene complexes separated by “valleys” of low fitness. Crossing such fitness valleys allows a population to escape suboptimal local fitness peaks to become better adapted. This is the premise of Sewall Wright’s shifting balance process. Here we generalize the theory of fitness-valley crossing in the two-locus, biallelic case by allowing bias in parent-offspring transmission. This generalization extends the existing mathematical framework to genetic systems with segregation distortion and uniparental inheritance. Our results are also flexible enough to provide insight into shifts between alternate stable states in cultural systems with “transmission valleys”. Using a semi-deterministic analysis and a stochastic diffusion approximation, we focus on the limiting step in valley crossing: the first appearance of the genotype on the new fitness peak whose lineage will eventually fix. We then apply our results to specific cases of segregation distortion, uniparental inheritance, and cultural transmission. Segregation distortion favouring mutant alleles facilitates crossing most when recombination and mutation are rare, i.e., scenarios where crossing is otherwise unlikely. Interactions with more mutable genes (e.g., uniparental inherited cytoplasmic elements) substantially reduce crossing times. Despite component traits being passed on poorly in the previous cultural background, small advantages in the transmission of a new combination of cultural traits can greatly facilitate a cultural transition. While peak shifts are unlikely under many of the common assumptions of population genetic theory, relaxing some of these assumptions can promote fitness-valley crossing.
Statistical Inference of a Convergent Antibody Repertoire Response to Influenza Vaccine
Statistical Inference of a Convergent Antibody Repertoire Response to Influenza Vaccine
Background: Vaccines dramatically affect an individual’s adaptive immune system, and thus provide an excellent means to study human immunity. Upon vaccination, the B cells that express antibodies (Abs) that happen to bind the vaccine are stimulated to proliferate and undergo mutagenesis at their Ab locus. This process may alter the composition of B cell lineages within an individual, which are known collectively as the antibody repertoire (AbR). Antibodies are also highly expressed in whole blood, potentially enabling unbiased RNA sequencing technologies to query this diversity. Less is known about the diversity of AbR responses across individuals to a given vaccine and if individuals tend to yield a similar response to the same antigenic stimulus. Methods: Here we implement a bioinformatic pipeline that extracts the AbR information from a time-series RNA-seq dataset of 5 patients who were administered a seasonal trivalent influenza vaccine (TIV). We harness the detailed time-series nature of this dataset and use methods based in functional data analysis (FDA) to identify the B cell lineages that respond to the vaccine. We then design and implement rigorous statistical tests in order to ask whether or not these patients exhibit a convergent AbR response to the same TIV. Results: We find that high-resolution time-series data can be used to help identify the Ab lineages that respond to an antigenic stimulus, and that this response can exhibit a convergent nature across patients inoculated with the same vaccine. However, correlations in AbR diversity among individuals prior to inoculation can confound inference of a convergent signal unless it is taken into account. Conclusions: We developed a framework to identify the elements of an AbR that respond to an antigen. This information could be used to understand the diversity of different immune responses in different individuals, as well as to gauge the effectiveness of the immune response to a given stimulus within an individual. We also present a framework for testing a convergent hypothesis between AbRs; a hypothesis that is more difficult to test than previously appreciated. Our discovery of a convergent signal suggests that similar epitopes do select for antibodies with similar sequence characteristics.
Allele-specific expression reveals interactions between genetic variation and environment
Allele-specific expression reveals interactions between genetic variation and environment
On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data
Single-cell RNA-Sequencing (scRNA-Seq) has become the most widely used high-throughput method for transcription profiling of individual cells. Systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies. Surprisingly, these issues have received minimal attention in published studies based on scRNA-Seq technology. We examined data from five published studies and found that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we found that the proportion of genes reported as expressed explains a substantial part of observed variability and that this quantity varies systematically across experimental batches. Furthermore, we found that the implemented experimental designs confounded outcomes of interest with batch effects, a design that can bring into question some of the conclusions of these studies. Finally, we propose a simple experimental design that can ameliorate the effect of theses systematic errors have on downstream results.
Shift and adapt: the costs and benefits of karyotype variations
Shift and adapt: the costs and benefits of karyotype variations
Variation is the spice of life or, in the case of evolution, variation is the necessary material on which selection can act to enable adaptation. Karyotypic variation in ploidy (the number of homologous chromosome sets) and aneuploidy (imbalance in the number of chromosomes) are fundamentally different than other types of genomic variants. Karyotypic variation emerges through different molecular mechanisms than other mutational events, and unlike mutations that alter the genome at the base pair level, rapid reversion to the wild type chromosome number is often possible. Although karyotypic variation has long been noted and discussed by biologists, interest in the importance of karyotypic variants in evolutionary processes has spiked in recent years, and much remains to be discovered about how karyotypic variants are produced and subsequently selected.
Protein homeostasis imposes a barrier on functional integration of horizontally transferred genes in bacteria
Further genetic diversification in multiple tumors and an evolutionary perspective on therapeutics
Further genetic diversification in multiple tumors and an evolutionary perspective on therapeutics
The genetic diversity within a single tumor can be extremely large, possibly with mutations at all coding sites (Ling et al. 2015). In this study, we analyzed 12 cases of multiple hepatocellular carcinoma (HCC) tumors by sequencing and genotyping several samples from each case. In 10 cases, tumors are clonally related by a process of cell migration and colonization. They permit a detailed analysis of the evolutionary forces (mutation, migration, drift and natural selection) that influence the genetic diversity both within and between tumors. In 23 inter-tumor comparisons, the descendant tumor usually shows a higher growth rate than the parent tumor. In contrast, neutral diversity dominates within-tumor observations such that adaptively growing clones are rarely found. The apparent adaptive evolution between tumors can be explained by the inherent bias for detecting larger tumors that have a growth advantage. Beyond these tumors are a far larger number of clones which, growing at a neutral rate and too small to see, can nevertheless be verified by molecular means. Given that the estimated genetic diversity is often very large, therapeutic strategies need to take into account the pre-existence of many drug-resistance mutations. Importantly, these mutations are expected to be in the very low frequency range in the primary tumors (and become frequent in the relapses, as is indeed reported (1-3). In conclusion, tumors may often harbor a very large number of mutations in the very low frequency range. This duality provides both a challenge and an opportunity for designing strategies against drug resistance (4-8).