Homomorphic ZW Chromosomes in a Wild Strawberry Show Distinctive Recombination Heterogeneity but a Small Sex-Determining Region

Posted on October 23, 2015 by schraib

Jacob Tennessen, Rajanikanth Govindarajulu, Aaron Liston, Tia-Lynn Ashman

bioRxiv doi: http://dx.doi.org/10.1101/029611

Sex chromosomes play a prominent role in development and evolution and have several characteristic features that distinguish them from autosomes. Across diverse taxa, recombination is typically suppressed at the sex-determining region (SDR) and proportionally elevated in the remainder of the chromosome or pseudoautosomal region (PAR). However, in most model taxa the sex chromosomes are ancient and highly differentiated from autosomes, and thus little is known about recombination dynamics of homomorphic sex chromosomes with incipient sex-determining mechanisms. Here we examine male function (pollen production) and female function (fruit production) in crosses of the dioecious octoploid strawberry Fragaria chiloensis in order to map the small and recently evolved SDR controlling both traits and to examine recombination patterns on the young ZW chromosome. The SDR occurs in a narrow 280kb window, in which the maternal recombination rate is lower than in the orthologous paternal region and the genome-wide average rate, but within the range of autosomal rate variation. In contrast to the SDR, the ZW recombination rate in the PAR is much higher than the rates of the ZZ or autosomal linkage groups, substantially overcompensating for the SDR rate. By extensively sequencing sections of the SDR vicinity in several crosses and unrelated plants, we show that W-specific divergence is elevated within a portion of the SDR and find only a single SNP to be in high linkage disequilibrium with sex, suggesting that any W-specific haplotype protected from recombination is not large. We hypothesize that selection for recombination suppression within the small SDR may be weak, but that fluctuating sex ratios could favor elevated recombination in the PAR to remove deleterious mutation on the W. Thus these results illuminate the recombination dynamics of a nascent sex chromosome with a modestly diverged SDR, which could be typical of other dioecious plants.

Inferring chimpanzee Y chromosome history and amplicon diversity from whole genome sequencing

Posted on October 23, 2015 by schraib

Inferring chimpanzee Y chromosome history and amplicon diversity from whole genome sequencing

Matthew Oetjens, Feichen Shen, Zhengting Zou, Jeffrey Kidd

bioRxiv doi: http://dx.doi.org/10.1101/029702

Due to the lack of recombination, the male-specific region of the Y chromosome (MSY) is a unique resource for tracking the genetic history of populations. The MSY is also enriched for large, nearly identical repetitive regions known as amplicons, which harbor many of the genes essential for spermatogenesis. In humans, sequence diversity on the unique segment of the MSY is greatly reduced compared to the autosomes, an observation consistent with the action of strong selection. Here, we analyze 9 chimpanzee (representing three subspecies: Pan troglodytes schweinfurthii, Pan troglodytes ellioti, and Pan troglodytes verus) and two Pan paniscus male whole-genome sequences to assess Y chromosome nucleotide and ampliconic copy-number diversity across the Pan genus. In total, we identified 23,946 Pan spp. SNVs across 4.2 million callable sites. Comparisons with autosomal, X chromosome, and mitochondrial sequences from the same samples indicate that nucleotide diversity on the chimpanzee MSY is reduced relative to neutral expectations with an equal sex ratio. Additionally, the estimated common chimpanzee Y chromosome TMRCA (0.44 mya [0.31-0.56]) is half the age of the mitochondria TMRCA (0.97 mya [0.65-1.35]), indicating an unequal sex ratio or Y chromosome selection in the common chimpanzee ancestral population. We observe that the copy-number of Y chromosome amplicons is variable amongst chimpanzees and bonobos, and identify several lineage-specific patterns, including variable copy-number of the testis-expressed genes RBMY and DAZ. We detect recurrent switchpoints of copy-number change along the ampliconic tracts across chimpanzee populations, which may be the result of localized genome instability or selective forces.

Flowr: Robust and efficient pipelines using a simple language-agnostic approach

Posted on October 23, 2015 by schraib

Flowr: Robust and efficient pipelines using a simple language-agnostic approach

Sahil Seth, Samir Amin, Xingzhi Song, Xizeng Mao, Huandong Sun, Andrew Futreal, Jianhua Zhang

bioRxiv doi: http://dx.doi.org/10.1101/029710

Motivation: Bioinformatics analyses have become increasingly intensive computing processes, with lowering costs and increasing numbers of samples. Each laboratory spends time creating and maintaining a set of pipelines, which may not be robust, scalable, or efficient. Further, the existence of different computing environments across institutions hinders both collabo-ration and the portability of analysis pipelines. Results: Flowr is a robust and scalable framework for designing and deploying computing pipelines in an easy-to-use fashion. It implements a scatter-gather approach using computing clusters, simplifying the concept to the use of five simple terms (in submission and dependency types). Most importantly, it is flexible, such that customizing existing pipelines is easy, and since it works across several computing environments (LSF, SGE, Torque, and SLURM), it is portable. Availability: http://docs.flowr.space

Machine learning for metagenomics: methods and tools

Posted on October 23, 2015 by schraib

Machine learning for metagenomics: methods and tools
Hayssam Soueidan, Macha Nikolski

While genomics is the research field relative to the study of the genome of any organism, metagenomics is the term for the research that focuses on many genomes at the same time, as typical in some sections of environmental study. Metagenomics recognizes the need to develop computational methods that enable understanding the genetic composition and activities of communities of species so complex that they can only be sampled, never completely characterized.
Machine learning currently offers some of the most computationally efficient tools for building predictive models for classification of biological data. Various biological applications cover the entire spectrum of machine learning problems including supervised learning, unsupervised learning (or clustering), and model construction. Moreover, most of biological data — and this is the case for metagenomics — are both unbalanced and heterogeneous, thus meeting the current challenges of machine learning in the era of Big Data.
The goal of this revue is to examine the contribution of machine learning techniques for metagenomics, that is answer the question “to what extent does machine learning contribute to the study of microbial communities and environmental samples?” We will first briefly introduce the scientific fundamentals of machine learning. In the following sections we will illustrate how these techniques are helpful in answering questions of metagenomic data analysis. We will describe a certain number of methods and tools to this end, though we will not cover them exhaustively. Finally, we will speculate on the possible future directions of this research.

Efficient genome-wide sequencing and low coverage pedigree analysis from non-invasively collected samples

Posted on October 22, 2015 by schraib

Efficient genome-wide sequencing and low coverage pedigree analysis from non-invasively collected samples

Noah Snyder-Mackler, William H Majoros, Michael L Yuan, Amanda O Shaver, Jacob B Gordon, Gisela H Kopp, Stephen A Schlebusch, Jeffrey D Wall, Susan C Alberts, Sayan Mukherjee, Xiang Zhou, Jenny Tung

bioRxiv doi: http://dx.doi.org/10.1101/029520

Research on the genetics of natural populations was revolutionized in the 1990′s by methods for genotyping non-invasively collected samples. However, these methods have remained largely unchanged for the past 20 years and lag far behind the genomics era. To close this gap, here we report an optimized laboratory protocol for genome-wide capture of endogenous DNA from non-invasively collected samples, coupled with a novel computational approach to reconstruct pedigree links from the resulting low-coverage data. We validated both methods using fecal samples from 62 wild baboons, including 48 from an independently constructed extended pedigree. We enriched fecal-derived DNA samples up to 40-fold for endogenous baboon DNA, and reconstructed near-perfect pedigree relationships even with extremely low-coverage sequencing. We anticipate that these methods will be broadly applicable to the many research systems for which only non-invasive samples are available. The lab protocol and software (″WHODAD″) are freely available at http://www.tung-lab.org/protocols and http://www.xzlab.org/software, respectively.

Hard, soft and just right: variations in linked selection and recombination drive genomic divergence during speciation of aspens

Posted on October 22, 2015 by schraib

Hard, soft and just right: variations in linked selection and recombination drive genomic divergence during speciation of aspens

Jing Wang, Nathaniel R Street, Douglas G Scofield, Pär Ingvarsson

bioRxiv doi: http://dx.doi.org/10.1101/029561

Despite the global economic and ecological importance of forest trees, the genomic basis of differential adaptation and speciation in tree species is still poorly understood. Populous tremula and P. tremuloides are two of the most widespread tree species in Northern Hemisphere. Using whole-genome re-sequencing data from 24 P. tremula and 22 P. tremuloides individuals, we find that the two species diverged ~2.2-3.1 million years ago. The approximately allopatric speciation of the two species was likely the results of the severing of the Bering land bridge combined with the onset of dramatic climatic oscillations throughout the Pleistocene. We detected moderate but also considerable heterogeneous genomic differentiation between species. Rather than being physically clustered into just a few large, discrete genomic “islands” as may be expected when species diverges in the presence of gene flow, we found that the regions of differentiation were particularly steep, narrowly defined and located in regions with substantially suppressed recombination. It appears that species-specific adaptation, mainly involving standing genetic variation via soft selective sweeps, was likely the predominant proximate cause in generating the differentiation islands between species and not local differences in permeability of gene flow. In addition, we identified multiple signatures of long-term balancing selection predating speciation in regions containing immunity and defense-related genes in both species.

Inferring the correlated fitness effects of nonsynonymous mutations at the same site using triallelic population genomics

Posted on October 22, 2015 by schraib

Inferring the correlated fitness effects of nonsynonymous mutations at the same site using triallelic population genomics

Aaron P Ragsdale, Alec J Coffman, PingHsun Hsieh, Travis J Struck, Ryan N Gutenkunst

bioRxiv doi: http://dx.doi.org/10.1101/029546

The distribution of mutation fitness effects is central to evolutionary genetics. Typical univariate distributions, however, cannot model the effects of multiple mutations at the same site, so we introduce a model in which mutations at the same site have correlated fitness effects. To infer the strength of that correlation, we developed a diffusion approximation to the triallelic frequency spectrum, which we applied to data from D. melanogaster. We found a moderate correlation between the fitness effects of nonsynonymous mutations at the same codon, suggesting that both mutation identity and location are important for determining fitness effects in proteins. We validated our approach by comparing with biochemical mutational scanning experiments, finding strong quantitative agreement, even between different organisms. We also found that the correlation of mutation fitness effects was not affected by protein solvent exposure or structural disorder. Together, our results suggest that the correlation of fitness effects at the same site is a previously overlooked yet fundamental property of protein evolution.

New thoughts on an old riddle: what determines genetic diversity within and between species?

Posted on October 21, 2015 by schraib

New thoughts on an old riddle: what determines genetic diversity within and between species?
Shi Huang

The question of what determines genetic diversity both between and within species has long remained unsolved by the modern evolutionary theory (MET). However, it has not deterred researchers from producing interpretations of genetic diversity by using MET. We here examine the two key experimental observations of genetic diversity made in the 1960s, one between species and the other within a population of a species, that directly contributed to the development of MET. The interpretations of these observations as well as the assumptions by MET are widely known to be inadequate. We review the recent progress of an alternative framework, the maximum genetic diversity (MGD) hypothesis, that uses axioms and natural selection to explain the vast majority of genetic diversity as being at optimum equilibrium that is largely determined by organismal complexity. The MGD hypothesis fully absorbs the proven virtues of MET and considers its assumptions relevant only to a much more limited scope. This new synthesis has accounted for the much overlooked phenomenon of progression towards higher complexity, and more importantly, been instrumental in directing productive research into both evolutionary and biomedical problems.

An exact algorithm and efficient importance sampling for computing two-locus likelihoods under variable population size

Posted on October 21, 2015 by schraib

An exact algorithm and efficient importance sampling for computing two-locus likelihoods under variable population size
John A. Kamm, Jeffrey P. Spence, Jeffrey Chan, Yun S. Song

Two-locus sampling probabilities have played a central role in devising an efficient composite likelihood method for estimating fine-scale recombination rates. Due to mathematical and computational challenges, these sampling probabilities are typically computed under the unrealistic assumption of a constant population size, and simulation studies have shown that resulting recombination rate estimates can be severely biased in certain cases of historical population size changes. To alleviate this problem, we develop here two distinct methods to compute the sampling probability for variable population size functions that are piecewise constant. The first is a novel formula that can be evaluated by numerically exponentiating a large but sparse matrix. The second method is importance sampling on genealogies, based on a characterization of the optimal proposal distribution that extends previous results to the variable-size setting. The resulting proposal distribution is highly efficient, with an average effective sample size (ESS) of nearly 98% per sample. Through a simulation study, we show that accounting for population size changes improves inference of recombination rates.

Modes of migration and multilevel selection in evolutionary multiplayer games

Posted on October 20, 2015 by schraib

Modes of migration and multilevel selection in evolutionary multiplayer games

Yuriy Pichugin, Chaitanya S. Gokhale, Julián Garcia, Arne Traulsen, Paul B. Rainey

bioRxiv doi: http://dx.doi.org/10.1101/029470

The evolution of cooperation in group-structured populations has received much attention, but little is known about the effects of different modes of migration of individuals between groups. Here, we have incorporated four different modes of migration that differ in the degree of coordination among the individuals. For each mode of migration, we identify the set of multiplayer games in which the cooperative strategy has higher fixation probability than defection. The comparison shows that the set of games under which cooperation may evolve generally expands depending upon the degree of coordination among the migrating individuals. Weak altruism can evolve under all modes of individual migration, provided that the benefit to cost ratio is high enough. Strong altruism, however, evolves only if the mode of migration involves coordination of individual actions. Depending upon the migration frequency and degree of coordination among individuals, conditions that allow selection to work at the level of groups can be established.

Haldane's Sieve

Discussing preprints in population and evolutionary genetics

Author Archives: schraib

Homomorphic ZW Chromosomes in a Wild Strawberry Show Distinctive Recombination Heterogeneity but a Small Sex-Determining Region

Inferring chimpanzee Y chromosome history and amplicon diversity from whole genome sequencing

Flowr: Robust and efficient pipelines using a simple language-agnostic approach

Machine learning for metagenomics: methods and tools

Efficient genome-wide sequencing and low coverage pedigree analysis from non-invasively collected samples

Hard, soft and just right: variations in linked selection and recombination drive genomic divergence during speciation of aspens

Inferring the correlated fitness effects of nonsynonymous mutations at the same site using triallelic population genomics

New thoughts on an old riddle: what determines genetic diversity within and between species?

An exact algorithm and efficient importance sampling for computing two-locus likelihoods under variable population size

Modes of migration and multilevel selection in evolutionary multiplayer games

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: