Sex chromosomes play a prominent role in development and evolution and have several characteristic features that distinguish them from autosomes. Across diverse taxa, recombination is typically suppressed at the sex-determining region (SDR) and proportionally elevated in the remainder of the chromosome or pseudoautosomal region (PAR). However, in most model taxa the sex chromosomes are ancient and highly differentiated from autosomes, and thus little is known about recombination dynamics of homomorphic sex chromosomes with incipient sex-determining mechanisms. Here we examine male function (pollen production) and female function (fruit production) in crosses of the dioecious octoploid strawberry Fragaria chiloensis in order to map the small and recently evolved SDR controlling both traits and to examine recombination patterns on the young ZW chromosome. The SDR occurs in a narrow 280kb window, in which the maternal recombination rate is lower than in the orthologous paternal region and the genome-wide average rate, but within the range of autosomal rate variation. In contrast to the SDR, the ZW recombination rate in the PAR is much higher than the rates of the ZZ or autosomal linkage groups, substantially overcompensating for the SDR rate. By extensively sequencing sections of the SDR vicinity in several crosses and unrelated plants, we show that W-specific divergence is elevated within a portion of the SDR and find only a single SNP to be in high linkage disequilibrium with sex, suggesting that any W-specific haplotype protected from recombination is not large. We hypothesize that selection for recombination suppression within the small SDR may be weak, but that fluctuating sex ratios could favor elevated recombination in the PAR to remove deleterious mutation on the W. Thus these results illuminate the recombination dynamics of a nascent sex chromosome with a modestly diverged SDR, which could be typical of other dioecious plants.
Author Archives: schraib
Inferring chimpanzee Y chromosome history and amplicon diversity from whole genome sequencing
Flowr: Robust and efficient pipelines using a simple language-agnostic approach
Machine learning for metagenomics: methods and tools
Machine learning for metagenomics: methods and tools
Hayssam Soueidan, Macha Nikolski
While genomics is the research field relative to the study of the genome of any organism, metagenomics is the term for the research that focuses on many genomes at the same time, as typical in some sections of environmental study. Metagenomics recognizes the need to develop computational methods that enable understanding the genetic composition and activities of communities of species so complex that they can only be sampled, never completely characterized.
Machine learning currently offers some of the most computationally efficient tools for building predictive models for classification of biological data. Various biological applications cover the entire spectrum of machine learning problems including supervised learning, unsupervised learning (or clustering), and model construction. Moreover, most of biological data — and this is the case for metagenomics — are both unbalanced and heterogeneous, thus meeting the current challenges of machine learning in the era of Big Data.
The goal of this revue is to examine the contribution of machine learning techniques for metagenomics, that is answer the question “to what extent does machine learning contribute to the study of microbial communities and environmental samples?” We will first briefly introduce the scientific fundamentals of machine learning. In the following sections we will illustrate how these techniques are helpful in answering questions of metagenomic data analysis. We will describe a certain number of methods and tools to this end, though we will not cover them exhaustively. Finally, we will speculate on the possible future directions of this research.
Efficient genome-wide sequencing and low coverage pedigree analysis from non-invasively collected samples
Hard, soft and just right: variations in linked selection and recombination drive genomic divergence during speciation of aspens
Inferring the correlated fitness effects of nonsynonymous mutations at the same site using triallelic population genomics
New thoughts on an old riddle: what determines genetic diversity within and between species?
New thoughts on an old riddle: what determines genetic diversity within and between species?
Shi Huang
The question of what determines genetic diversity both between and within species has long remained unsolved by the modern evolutionary theory (MET). However, it has not deterred researchers from producing interpretations of genetic diversity by using MET. We here examine the two key experimental observations of genetic diversity made in the 1960s, one between species and the other within a population of a species, that directly contributed to the development of MET. The interpretations of these observations as well as the assumptions by MET are widely known to be inadequate. We review the recent progress of an alternative framework, the maximum genetic diversity (MGD) hypothesis, that uses axioms and natural selection to explain the vast majority of genetic diversity as being at optimum equilibrium that is largely determined by organismal complexity. The MGD hypothesis fully absorbs the proven virtues of MET and considers its assumptions relevant only to a much more limited scope. This new synthesis has accounted for the much overlooked phenomenon of progression towards higher complexity, and more importantly, been instrumental in directing productive research into both evolutionary and biomedical problems.
An exact algorithm and efficient importance sampling for computing two-locus likelihoods under variable population size
An exact algorithm and efficient importance sampling for computing two-locus likelihoods under variable population size
John A. Kamm, Jeffrey P. Spence, Jeffrey Chan, Yun S. Song
Two-locus sampling probabilities have played a central role in devising an efficient composite likelihood method for estimating fine-scale recombination rates. Due to mathematical and computational challenges, these sampling probabilities are typically computed under the unrealistic assumption of a constant population size, and simulation studies have shown that resulting recombination rate estimates can be severely biased in certain cases of historical population size changes. To alleviate this problem, we develop here two distinct methods to compute the sampling probability for variable population size functions that are piecewise constant. The first is a novel formula that can be evaluated by numerically exponentiating a large but sparse matrix. The second method is importance sampling on genealogies, based on a characterization of the optimal proposal distribution that extends previous results to the variable-size setting. The resulting proposal distribution is highly efficient, with an average effective sample size (ESS) of nearly 98% per sample. Through a simulation study, we show that accounting for population size changes improves inference of recombination rates.