The limits of selection under plant domestication

The limits of selection under plant domestication
Robin G. Allaby, Dorian Q. Fuller, James L. Kitchen
Subjects: Populations and Evolution (q-bio.PE)

Plant domestication involved a process of selection through human agency of a series of traits collectively termed the domestication syndrome. Current debate concerns the pace at which domesticated plants emerged from cultivated wild populations and how many genes were involved. Here we present simulations that test how many genes could have been involved by considering the cost of selection. We demonstrate the selection load that can be endured by populations increases with decreasing selection coefficients and greater numbers of loci down to values of about s = 0.005, causing a driving force that increases the number of loci under selection. As the number of loci under selection increases, an effect of co-selection increases resulting in individual unlinked loci being fixed more rapidly in out-crossing populations, representing a second driving force to increase the number of loci under selection. In inbreeding systems co-selection results in interference and reduced rates of fixation but does not reduce the size of the selection load that can be endured. These driving forces result in an optimum pace of genome evolution in which 50-100 loci are the most that could be under selection in a cultivation regime. Furthermore, the simulations do not preclude the existence of selective sweeps but demonstrate that they come at a cost of the selection load that can be endured and consequently a reduction of the capacity of plants to adapt to new environments, which may contribute to the explanation of why selective sweeps have been so rarely detected in genome studies.

Mycobiome of the Bat White Nose Syndrome (WNS) Affected Caves and Mines reveals High Diversity of Fungi and Local Adaptation by the Fungal Pathogen Pseudogymnoascus (Geomyces) destructans

Mycobiome of the Bat White Nose Syndrome (WNS) Affected Caves and Mines reveals High Diversity of Fungi and Local Adaptation by the Fungal Pathogen Pseudogymnoascus (Geomyces) destructans

Tao Zhang, Tanya R. Victor, Sunanda S. Rajkumar, Xiaojiang Li, Joseph C. Okoniewski, Alan C. Hicks, April D. Davis, Kelly Broussard, Shannon L. LaDeau, Sudha Chaturvedi, Vishnu Chaturvedi
(Submitted on 3 Mar 2014)

The investigations of the bat White Nose Syndrome (WNS) have yet to provide answers as to how the causative fungus Pseudogymnoascus (Geomyces) destructans (Pd) first appeared in the Northeast and how a single clone has spread rapidly in the US and Canada. We aimed to catalogue Pd and all other fungi (mycobiome) by the culture-dependent (CD) and culture-independent (CI) methods in four Mines and two Caves from the epicenter of WNS zoonotic. Six hundred sixty-five fungal isolates were obtained by CD method including the live recovery of Pd. Seven hundred three nucleotide sequences that met the definition of operational taxonomic units (OTUs) were recovered by CI methods. Most OTUs belonged to unidentified clones deposited in the databases as environmental nucleic acid sequences (ENAS). The core mycobiome of WNS affected sites comprised of 46 species of fungi from 31 genera recovered in culture, and 17 fungal genera and 31 ENAS identified from clone libraries. Fungi such as Arthroderma spp., Geomyces spp., Kernia spp., Mortierella spp., Penicillium spp., and Verticillium spp. were predominant in culture while Ganoderma spp., Geomyces spp., Mortierella spp., Penicillium spp. and Trichosporon spp. were abundant is clone libraries. Alpha diversity analyses from CI data revealed that fungal community structure was highly diverse. However, the true species diversity remains undetermined due to under sampling. The frequent recovery of Pd indicated that the pathogen has adapted to WNS-afflicted habitats. Further, this study supports the hypothesis that Pd is an introduced species. These findings underscore the need for integrated WNS control measures that target both bats and the fungal pathogen.

Decoding coalescent hidden Markov models in linear time

Decoding coalescent hidden Markov models in linear time

Kelley Harris, Sara Sheehan, John A. Kamm, Yun S. Song
(Submitted on 4 Mar 2014)

In many areas of computational biology, hidden Markov models (HMMs) have been used to model local genomic features. In particular, coalescent HMMs have been used to infer ancient population sizes, migration rates, divergence times, and other parameters such as mutation and recombination rates. As more loci, sequences, and hidden states are added to the model, however, the runtime of coalescent HMMs can quickly become prohibitive. Here we present a new algorithm for reducing the runtime of coalescent HMMs from quadratic in the number of hidden time states to linear, without making any additional approximations. Our algorithm can be incorporated into various coalescent HMMs, including the popular method PSMC for inferring variable effective population sizes. Here we implement this algorithm to speed up our demographic inference method diCal, which is equivalent to PSMC when applied to a sample of two haplotypes. We demonstrate that the linear-time method can reconstruct a population size change history more accurately than the quadratic-time method, given similar computation resources. We also apply the method to data from the 1000 Genomes project, inferring a high-resolution history of size changes in the European population.

Conditions for the validity of SNP-based heritability estimation

Conditions for the validity of SNP-based heritability estimation
James J Lee, Carson C Chow

The heritability of a trait ($h^2$) is the proportion of its population variance caused by genetic differences, and estimates of this parameter are important for interpreting the results of genome-wide association studies (GWAS). In recent years, researchers have adopted a novel method for estimating a lower bound on heritability directly from GWAS data that uses realized genetic similarities between nominally unrelated individuals. The quantity estimated by this method is purported to be the contribution to heritability that could in principle be recovered from association studies employing the given panel of SNPs ($h^2_\textrm{SNP}$). Thus far the validity of this approach has mostly been tested empirically. Here, we provide a mathematical explication and show that the method should remain a robust means of obtaining $h^2_\textrm{SNP}$ under circumstances wider than those under which it has so far been derived.

Most viewed on Haldane’s Sieve: February 2014

The most viewed posts on Haldane’s Sieve last month were:

Local description of phylogenetic group-based models

Local description of phylogenetic group-based models

Marta Casanellas, Jesús Fernández-Sánchez, Mateusz Michałek
(Submitted on 27 Feb 2014)

Motivated by phylogenetics, our aim is to obtain a system of equations that define a phylogenetic variety on an open set containing the biologically meaningful points. In this paper we consider phylogenetic varieties defined via group-based models. For any finite abelian group G, we provide an explicit construction of codimX phylogenetic invariants (polynomial equations) of degree at most |G| that define the variety X on a Zariski open set U. The set U contains all biologically meaningful points when G is the group of the Kimura 3-parameter model. In particular, our main result confirms a conjecture by the third author and, on the set U, a couple of conjectures by Bernd Sturmfels and Seth Sullivant.

DNA methylation modulates transcription factor occupancy chiefly at sites of high intrinsic cell-type variability

DNA methylation modulates transcription factor occupancy chiefly at sites of high intrinsic cell-type variability

Matthew Maurano, Hao Wang, Sam John, Anthony Shafer, Theresa Canfield, Kristen Lee, John A Stamatoyannopoulos

The nuclear genome of every cell harbors millions of unoccupied transcription factor (TF) recognition sequences that harbor methylated cytosines. Although DNA methylation is commonly invoked as a repressive mechanism, the extent to which it actively silences specific TF occupancy sites is unknown. To define the role of DNA methylation in modulating TF binding, we quantified the effect of DNA methyltransferase abrogation on the occupancy patterns of a ubiquitous TF capable of autonomous binding to its target sites in chromatin (CTCF). Here we show that the vast majority of unoccupied, methylated CTCF recognition sequences remain unbound upon depletion of DNA methylation. Rather, methylation-regulated binding is restricted to a small fraction of elements that exhibit high intrinsic variability in CTCF occupancy across cell types. Our results suggest that DNA methylation is not a major groundskeeper of genomic transcription factor occupancy landscapes, but rather a specialized mechanism for stabilizing epigenetically labile sites.

A tug-of-war between driver and passenger mutations in cancer and other adaptive processes

A tug-of-war between driver and passenger mutations in cancer and other adaptive processes

Christopher McFarland, Leonid Mirny, Kirill S. Korolev
(Submitted on 25 Feb 2014)

Cancer progression is an example of a rapid adaptive process where evolving new traits is essential for survival and requires a high mutation rate. Precancerous cells acquire a few key mutations that drive rapid population growth and carcinogenesis. Cancer genomics demonstrates that these few ‘driver’ mutations occur alongside thousands of random ‘passenger’ mutations-a natural consequence of cancer’s elevated mutation rate. Some passengers can be deleterious to cancer cells, yet have been largely ignored in cancer research. In population genetics, however, the accumulation of mildly deleterious mutations has been shown to cause population meltdown. Here we develop a stochastic population model where beneficial drivers engage in a tug-of-war with frequent mildly deleterious passengers. These passengers present a barrier to cancer progression that is described by a critical population size, below which most lesions fail to progress, and a critical mutation rate, above which cancers meltdown. We find support for the model in cancer age-incidence and cancer genomics data that also allow us to estimate the fitness advantage of drivers and fitness costs of passengers. We identify two regimes of adaptive evolutionary dynamics and use these regimes to rationalize successes and failures of different treatment strategies. We find that a tumor’s load of deleterious passengers can explain previously paradoxical treatment outcomes and suggest that it could potentially serve as a biomarker of response to mutagenic therapies. Collective deleterious effect of passengers is currently an unexploited therapeutic target. We discuss how their effects might be exacerbated by both current and future therapies.

Implications of uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al

Implications of uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al

Jamie R. Oaks, Charles W. Linkem, Jeet Sukumaran
(Submitted on 26 Feb 2014)

Biogeographers often seek to explain speciation on geographical phenomena. Establishing that a set of population splitting events occurred at the same time can be a persuasive argument that a set of taxa were affected by the same geographic events. Huang et al. (2011) introduced an approximate Bayesian approach (implemented in the software msBayes) to estimate the probabilities of models in which multiple sets of taxa diverge simultaneously. Oaks et al. (2013) used this model-choice framework to study 22 pairs of vertebrates distributed across the Philippines; they also studied the behavior of the approach using simulations. Oaks et al. (2013) found the model was very sensitive to the prior and had low power to detect variation in divergences times. This was not surprising in light of a rich statistical literature showing the marginal likelihood of a model is sensitive to vague priors. Because this sensitivity to prior assumptions affects the crucial insights a researcher who employs msBayes seeks to gain, Oaks et al. (2013) recommended users of the approach carefully assess the robustness of their conclusions to different priors. According to Hickerson et al. (2014), the lack of robustness was due to broad priors leading to inadequate numbers of simulations. They proposed a model-averaging approach using narrow, empirically informed uniform priors. Here, we demonstrate their approach is dangerous in the sense that the empirically-derived priors often exclude the true values of the parameters. We question the value of adopting an empirical-Bayesian stance for this problem, because it can mislead model posterior probabilities. The robust approach of conducting analyses under a variety of priors can reveal sensitivity and communicate assumptions underlying inference. Furthermore, simulations provide insight into the temporal resolution of the method and guide interpretation of results.

Approaching allelic probabilities and Genome-Wide Association Studies from beta distributions

Approaching allelic probabilities and Genome-Wide Association Studies from beta distributions

José Santiago García-Cremades, Angel del Río, José A. García, Javier Gayán, Antonio González-Pérez, Agustín Ruiz, O. Sotolongo-Grau, Manuel Ruiz-Marín
(Submitted on 25 Feb 2014)

In this paper we have proposed a model for the distribution of allelic probabilities for generating populations as reliably as possible. Our objective was to develop such a model which would allow simulating allelic probabilities with different observed truncation and de- gree of noise. In addition, we have also introduced here a complete new approach to analyze a genome-wide association study (GWAS) dataset, starting from a new test of association with a statistical distribution and two effect sizes of each genotype. The new methodologi- cal approach was applied to a real data set together with a Monte Carlo experiment which showed the power performance of our new method. Finally, we compared the new method based on beta distribution with the conventional method (based on Chi-Squared distribu- tion) using the agreement Kappa index and a principal component analysis (PCA). Both the analyses show found differences existed between both the approaches while selecting the single nucleotide polymorphisms (SNPs) in association.