Local and sex-specific biases in crossover vs. noncrossover outcomes at meiotic recombination hotspots in mouse

Local and sex-specific biases in crossover vs. noncrossover outcomes at meiotic recombination hotspots in mouse

Esther de Boer, Maria Jasin, Scott Keeney
doi: http://dx.doi.org/10.1101/022830

Meiotic recombination initiated by programmed double-strand breaks (DSBs) yields two types of interhomolog recombination products, crossovers and noncrossovers, but what determines whether a DSB will yield a crossover or noncrossover is not understood. In this study we analyze the influence of sex and chromosomal location on mammalian recombination outcomes by constructing fine-scale recombination maps in both males and females at two mouse hotspots located in different regions of the same chromosome. These include the most comprehensive maps of recombination hotspots in oocytes to date. One hotspot, located centrally on chromosome 1, behaved similarly in male and female meiosis: crossovers and noncrossovers formed at comparable levels and ratios in both sexes. In contrast, at a distal hotspot crossovers were recovered only in males even though noncrossovers were obtained at similar frequencies in both sexes. These findings reveal an example of extreme sex-specific bias in recombination outcome. We further find that estimates of relative DSB levels are surprisingly poor predictors of relative crossover frequencies between hotspots in males. Our results demonstrate that the outcome of mammalian meiotic recombination can be biased, that this bias can vary depending on location and cellular context, and that DSB frequency is not the only determinant of crossover frequency.

A Profile-Based Method for Measuring the Impact of Genetic Variation

A Profile-Based Method for Measuring the Impact of Genetic Variation

Nicole E Wheeler, Lars Barquist, Fatemeh Ashari Ghomi, Robert A Kingsley, Paul P Gardner
doi: http://dx.doi.org/10.1101/022616

Advances in our ability to generate genome sequence data have increased the need for fast, effective approaches to assessing the functional significance of genetic variation. Traditionally, this has been done by identifying single nucleotide polymorphisms within populations, and calculating derived statistics to prioritize candidates, such as dN/dS. However, these methods commonly ignore the differential selective pressure acting at different positions within a given protein sequence and the effect of insertions and deletions (indels). We present a profile-based method for predicting whether a protein sequence variant is likely to have functionally diverged from close relatives, which takes into account differences in residue conservation and indel rates within a sequence. We assess the performance of the method, and apply it to the identification of functionally significant genetic variation between bacterial genomes. We demonstrate that this method is a highly sensitive measure of functional potential, which can improve our understanding of the evolution of proteins and organisms. An implementation can be found at https://github.com/UCanCompBio/deltaBS.

Tools and techniques for computational reproducibility

Tools and techniques for computational reproducibility

Stephen R Piccolo, Adam B Lee, Michael B Frampton
doi: http://dx.doi.org/10.1101/022707

When reporting research findings, scientists document the steps they followed so that others can verify and build upon the research. When those steps have been described in sufficient detail that others can retrace the steps and obtain similar results, the research is said to be reproducible. Computers play a vital role in many research disciplines and present both opportunities and challenges for reproducibility. Computers can be programmed to execute analysis tasks, and those programs can be repeated and shared with others. Due to the deterministic nature of most computer programs, the same analysis tasks, applied to the same data, will often produce the same outputs. However, in practice, computational findings often cannot be reproduced, due to complexities in how software is packaged, installed, and executed—and due to limitations in how scientists document analysis steps. Many tools and techniques are available to help overcome these challenges. Here we describe six such strategies. With a broad scientific audience in mind, we describe strengths and limitations of each approach, as well as circumstances under which each might be applied. No single strategy is sufficient for every scenario; thus we emphasize that it is often useful to combine approaches.

Tanglegrams: a reduction tool for mathematical phylogenetics

Tanglegrams: a reduction tool for mathematical phylogenetics

Frederick A Matsen IV, Sara Billey, Arnold Kas, Matjaž Konvalinka
(Submitted on 16 Jul 2015)

Many discrete mathematics problems in phylogenetics are defined in terms of the relative labeling of pairs of leaf-labeled trees. These relative labelings are naturally formalized as tanglegrams, which have previously been an object of study in coevolutionary analysis. Although there has been considerable work on planar drawings of tanglegrams, they have not been fully explored as combinatorial objects until recently. In this paper, we describe how many discrete mathematical questions on trees “factor” through a problem on tanglegrams, and how understanding that factoring can simplify analysis. Depending on the problem, it may be useful to consider a unordered version of tanglegrams, and/or their unrooted counterparts. For all of these definitions, we show how the isomorphism types of tanglegrams can be understood in terms of double cosets of the symmetric group, and we investigate their automorphisms. Understanding tanglegrams better will isolate the distinct problems on leaf-labeled pairs of trees and reveal natural symmetries of spaces associated with such problems.

Adaptive variation in human toll-like receptors is contributed by introgression from both Neandertals and Denisovans

Adaptive variation in human toll-like receptors is contributed by introgression from both Neandertals and Denisovans

Michael Dannemann, Aida M. Andrés, Janet Kelso
doi: http://dx.doi.org/10.1101/022699

Pathogens and the diseases they cause have been among the most important selective forces experienced by humans during their evolutionary history. Although adaptive alleles generally arise by mutation, introgression can also be a valuable source of beneficial alleles. Archaic humans, who lived in Europe and Western Asia for over 200,000 years, were likely well-adapted to the environment and its local pathogens, and it is therefore conceivable that modern humans entering Europe and Western Asia who admixed with them obtained a substantial immune advantage from the introgression of archaic alleles. Here we document a cluster of three toll-like receptors (TLR6-TLR1-TLR10) in modern humans that carries three distinct archaic haplotypes, indicating repeated introgression from archaic humans. Two of these haplotypes are most similar to Neandertal genome, while the third haplotype is most similar to the Denisovan genome. The toll-like receptors are key components of innate immunity and provide an important first line of immune defense against bacteria, fungi and parasites. The unusually high allele frequencies and unexpected levels of population differentiation indicate that there has been local positive selection on multiple haplotypes at this locus. We show that the introgressed alleles have clear functional effects in modern humans; archaic-like alleles underlie differences in the expression of the TLR genes and are associated with reduced microbial resistance and increased allergic disease in large cohorts. This provides strong evidence for recurrent adaptive introgression at the TLR6-TLR1-TLR10 locus, resulting in differences in disease phenotypes in modern humans.

Improving the Efficiency of Genomic Selection in Chinese Simmental beef cattle

Improving the Efficiency of Genomic Selection in Chinese Simmental beef cattle

Jiangwei Xia, Yang Wu, Huizhong Fang, Wengang Zhang, Yuxin Song, Lupei Zhang, Xue Gao, Yan Chen, Junya Li, Huijiang Gao
doi: http://dx.doi.org/10.1101/022673

Genomic selection is an accurate and efficient method of estimating genetic merits by using high-density genome-wide single nucleotide polymorphisms (SNPs).In this study, we investigate an approach to increase the efficiency of genomic prediction by using genome-wide markers. The approach is a feature selection based on genomic best linear unbiased prediction (GBLUP),which is a statistical method used to predict breeding values using SNPs for selection in animal and plant breeding. The objective of this study is the choice of kinship matrix for genomic best linear unbiased prediction (GBLUP).The G-matrix is using the information of genome-wide dense markers. We compare three kinds of kinships based on different combinations of centring and scaling of marker genotypes.And find a suitable kinship approach that adjusts for the resource population of Chinese Simmental beef cattle.Single nucleotide polymorphism (SNPs) can be used to estimate kinship matrix and individual inbreeding coefficients more accurately. So in our research a genomic relationship matrix was developed for 1059 Chinese Simmental beef cattle using 640000 single nucleotide polymorphisms and breeding values were estimated using phenotypes about Carcass weight and Sirloin weight. The number of SNPs needed to accurately estimate a genomic relationship matrix was evaluated in this population. Another aim of this study was to optimize the selection of markers and determine the required number of SNPs for estimation of kinship in the Chinese Simmental beef cattle. We find that the feature selection of GBLUP using Xu’s and the Astle and Balding’s kinships model performed similarly well, and were the best-performing methods in our study. Inbreeding and kinship matrix can be estimated with high accuracy using ≥12,000s in Chinese Simmental beef cattle.

metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

metaCCA: Summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

Anna Cichonska, Juho Rousu, Pekka Marttinen, Antti J Kangas, Pasi Soininen, Terho Lehtimäki, Olli T Raitakari, Marjo-Riitta Järvelin, Veikko Salomaa, Mika Ala-Korpela, Samuli Ripatti, Matti Pirinen
doi: http://dx.doi.org/10.1101/022665

A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analysing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies.

Iron Age and Anglo-Saxon genomes from East England reveal British migration history

Iron Age and Anglo-Saxon genomes from East England reveal British migration history

Stephan Schiffels, Wolfgang Haak, Pirita Paajanen, Bastien Llamas, Elizabeth Popescu, Louise Lou, Rachel Clarke, Alice Lyons, Richard Mortimer, Duncan Sayer, Chris Tyler-Smith, Alan Cooper, Richard Durbin
doi: http://dx.doi.org/10.1101/022723

British population history has been shaped by a series of immigrations and internal movements, including the early Anglo-Saxon migrations following the breakdown of the Roman administration after 410CE. It remains an open question how these events affected the genetic composition of the current British population. Here, we present whole-genome sequences generated from ten ancient individuals found in archaeological excavations close to Cambridge in the East of England, ranging from 2,300 until 1,200 years before present (Iron Age to Anglo-Saxon period). We use present-day genetic data to characterize the relationship of these ancient individuals to contemporary British and other European populations. By analyzing the distribution of shared rare variants across ancient and modern individuals, we find that today’s British are more similar to the Iron Age individuals than to most of the Anglo-Saxon individuals, and estimate that the contemporary East English population derives 30% of its ancestry from Anglo-Saxon migrations, with a lower fraction in Wales and Scotland. We gain further insight with a new method, rarecoal, which fits a demographic model to the distribution of shared rare variants across a large number of samples, enabling fine scale analysis of subtle genetic differences and yielding explicit estimates of population sizes and split times. Using rarecoal we find that the ancestors of the Anglo-Saxon samples are closest to modern Danish and Dutch populations, while the Iron Age samples share ancestors with multiple Northern European populations including Britain.

CUA: a Flexible and Comprehensive Codon Usage Analyzer

CUA: a Flexible and Comprehensive Codon Usage Analyzer

ZHENGUO ZHANG
doi: http://dx.doi.org/10.1101/022814

Codon usage bias (CUB) is pervasive in genomes. Studying its patterns and causes is fundamental for understanding genome evolution. Rapidly emerging large-scale RNA and DNA sequences make studying CUB in many species feasible. Existing software however is limited in incorporating the new data resources. Therefore, I release the software CUA which can compute all popular CUB metrics, including CAI, tAI, Fop, ENC. More importantly, CUA allows users to incorporate user-specific data, such as tRNA abundance and highly expressed genes from considered tissues; this flexibility enables computing CUB metrics for any species with improved accuracy. In sum, CUA eases codon usage studies and establishes a platform for incorporating new metrics in future. CUA is available at http://search.cpan.org/dist/Bio-CUA/ with help documentation and tutorial.

Replaying Evolution to Test the Cause of Extinction of One Ecotype in an Experimentally Evolved Population

Replaying Evolution to Test the Cause of Extinction of One Ecotype in an Experimentally Evolved Population

Caroline B. Turner, Zachary D. Blount, Richard E. Lenski
doi: http://dx.doi.org/10.1101/022798

In a long-term evolution experiment with Escherichia coli, bacteria in one of twelve populations evolved the ability to consume citrate, a previously unexploited resource in a glucose-limited medium. This innovation led to the frequency-dependent coexistence of citrate-consuming (Cit+) and non-consuming (Cit–) ecotypes, with Cit– bacteria persisting on the exogenously supplied glucose as well as other carbon molecules released by the Cit+ bacteria. After more than 10,000 generations of coexistence, however, the Cit– lineage went extinct; cells with the Cit– phenotype dropped to levels below detection, and the Cit– clade could not be detected by molecular assays based on its unique genotype. We hypothesized that this extinction event was a deterministic outcome of evolutionary change within the population, specifically the appearance of a more-fit Cit+ ecotype that competitively excluded the Cit– ecotype. We tested this hypothesis by re-evolving the population from one frozen sample taken just prior to the extinction and from another sample taken several thousand generations earlier, in each case for 500 generations and with 20-fold replication. To our surprise, the Cit– type did not go extinct in any of these replays, and Cit– cells also persisted in a single replicate that was propagated for 3,000 generations. Even more unexpectedly, we showed that the Cit– ecotype could reinvade the Cit+ population after its extinction. Taken together, these results indicate that the extinction of the Cit– ecotype was not a deterministic outcome driven by competitive exclusion by the Cit+ ecotype. The extinction also cannot be explained by demographic stochasticity, as the population size of the Cit– ecotype should have been many thousands of cells even during the daily transfer events. Instead, we infer that the extinction must have been caused by a rare chance event in which some aspect of the experimental conditions was inadvertently perturbed.