ZRT1 harbors an excess of nonsynonymous polymorphism and shows evidence of balancing selection in Saccharomyces cerevisiae

ZRT1 harbors an excess of nonsynonymous polymorphism and shows evidence of balancing selection in Saccharomyces cerevisiae
Elizabeth K. Engle, Justin C. Fay
(Submitted on 1 Dec 2012)

Estimates of the fraction of nucleotide substitutions driven by positive selection vary widely across different species. Accounting for different estimates of positive selection has been difficult, in part because selection on polymorphism within a species is known to obscure a signal of positive selection between species. While methods have been developed to control for the confounding effects of negative selection against deleterious polymorphism, the impact of balancing selection on estimates of positive selection has not been assessed. In Saccharomyces cerevisiae, there is no signal of positive selection within protein coding sequences as the ratio of nonsynonymous to synonymous polymorphism is higher than that of divergence. To investigate the impact of balancing selection on estimates of positive selection we examined five genes with high rates of nonsynonymous polymorphism in S. cerevisiae relative to divergence from S. paradoxus. One of the genes, a high affinity zinc transporter ZRT1, shows an elevated rate of synonymous polymorphism indicative of balancing selection. The high rate of synonymous polymorphism coincides with nonsynonymous divergence between three haplotype groups, which we find to be functionally indistinguishable. We conclude that balancing selection is not likely to be a common cause of genes harboring a large excess of nonsynonymous polymorphism in yeast.

Most viewed on Haldane’s Sieve: November 2012

The most viewed preprints on Haldane’s Sieve in November 2012 were:

The evolution of complex gene regulation by low specificity binding sites

The evolution of complex gene regulation by low specificity binding sites
Alexander J. Stewart, Joshua B. Plotkin
(Submitted on 30 Nov 2012)

Transcription factor binding sites vary in their specificity, both within and between species. Binding specificity has a strong impact on the evolution of gene expression, because it determines how easily regulatory interactions are gained and lost. Nevertheless, we have a relatively poor understanding of what evolutionary forces determine the specificity of binding sites. Here we address this question by studying regulatory modules composed of multiple binding sites. Using a population-genetic model, we show that more complex regulatory modules, composed of a greater number of binding sites, must employ binding sites that are individually less specific, compared to less complex regulatory modules. This effect is extremely general, and it hold regardless of the regulatory logic of a module. We attribute this phenomenon to the inability of stabilising selection to maintain highly specific sites in large regulatory modules. Our analysis helps to explain broad empirical trends in the yeast regulatory network: those genes with a greater number of transcriptional regulators feature by less specific binding sites, and there is less variance in their specificity, compared to genes with fewer regulators. Likewise, our results also help to explain the well-known trend towards lower specificity in the transcription factor binding sites of higher eukaryotes, which perform complex regulatory tasks, compared to prokaryotes.

Our paper: Bacterial diversity associated with Drosophila in the laboratory and in the natural environment

For next guest post Fabian Staubach and Dmitri Petrov write about their paper (along with coauthors) Bacterial diversity associated with Drosophila in the laboratory and in the natural environment arXived here.

Host associated bacterial communities are ubiquitous, have a variety of effects on the host phenotype and play a role in host adaptation to new environments. Some clear examples of such adaptations are known but generally these are ancient associations between host and symbiont, such as the association between aphids and the obligate symbiotic bacterium Buchnera that provides the aphid with essential amino acids or the association between bee wolfs and Streptomyces that protects bee wolf larvae from fungal infections. We are investigating the potential of bacterial communities to underlie short-term adaptation using adaptation of D. melanogaster and D. simulans to different fruit as a study system.

As the first step we profiled the diversity and composition of bacterial communities associated with Drosophila across multiple species, habitats, and substrates. We amplified and sequenced a region of the bacterial ribosomal DNA from whole body fly samples using 454 technology. We focused on comparing the bacterial communities of the sibling species D. melanogaster and D. simulans in the lab and in an ecologically and evolutionary relevant setting: their natural environment. In most cases we were able to study flies from these two species collected by aspiration from the same fruit. We also included nine different species spanning the Drosophila phylogeny to test whether phylogenetic distance and distance between bacterial communities are correlated.

We show that natural bacterial communities associated with Drosophila contain more different bacterial taxa than previously thought. Comparison to a mammalian fecal data set reveals that although mammal-associated bacterial communities are more diverse on average, the diversity of some mammalian fecal samples lies within the range or is even lower than that of the Drosophila samples we analyzed. This finding is interesting because it has been a matter of debate whether organisms with an adaptive immune system can in general accommodate higher bacterial diversity. By comparing the bacterial communities of D. melanogaster and D. simulans collected directly from different natural food substrates we demonstrate that bacterial communities differ primarily between substrates and very weakly among fly species.

We find acetic acid bacteria of the genera Acetobacter and Gluconobacter to be associated with all wild-caught flies constituting two thirds of all sequences. Acetic acid bacteria oxidize sugars and ethanol to acetic acid and are known to be directly involved in the development of a specific process of decay called ‘sour rot’ on grapes that causes wine spoilage. There is previous evidence that Drosophila is vital for the dispersal of acetic acid bacteria among rotting fruit: grapes covered with nets in the field do acquire yeasts, but no acetic acid bacteria and acetic acid bacteria thrive on grapes only when flies are present. At the same time, Acetobacter has been shown to promote Drosophila larval growth and shorten development time under certain nutritional conditions. Therefore, we argue that the relationship between Acetobacteraceae and Drosophila is likely mutualistic.

Individual natural fly samples are dominated by bacteria known to be pathogenic in Drosophila, such as Enterococcus and Providencia. These bacteria are known to reach very high cell counts during systemic infections of Drosophila and we believe that the inclusion of systemically infected flies in these samples is the most likely explanation for the observed pattern. The observation that it is in principle possible to identify potential candidate pathogens in natural populations using standard, high throughput microbial community screening techniques opens up opportunities for large scale epidemiological studies in nature and can help to identify candidate pathogenic bacterial species for further investigation in the laboratory.

In the laboratory, fly associated bacterial communities are similar irrespective of phylogenetic distance between fly species, suggesting that host genetic factors either play a minor role in shaping the bacterial communities associated with Drosophila or, as suggested by the difference of bacterial communities between D. melanogster and D. simulans in the wild, require natural conditions to manifest themselves. High variability of Drosophila bacterial communities within and between laboratories is a potential source of experimental noise when studying phenotypic variation. The impact of microbes on Drosophila phenotypes ranges from influencing growth to cold tolerance and it is hard to imagine traits that are not subject in principle to alteration by microbes.

We hope that our data will serve as a solid foundation for future studies especially for the growing community of scientists that are interested in the microbial communities that are associated with Drosophila.

Fabian Staubach and Dmitri Petrov

Identifying a species tree subject to random lateral gene transfer

Identifying a species tree subject to random lateral gene transfer

Mike Steel, Simone Linz, Daniel H. Huson, Michael J. Sanderson
(Submitted on 30 Nov 2012)

A major problem for inferring species trees from gene trees is that evolutionary processes can sometimes favour gene tree topologies that conflict with an underlying species tree. In the case of incomplete lineage sorting, this phenomenon has recently been well-studied, and some elegant solutions for species tree reconstruction have been proposed. One particularly simple and statistically consistent estimator of the species tree under incomplete lineage sorting is to combine three-taxon analyses, which are phylogenetically robust to incomplete lineage sorting. In this paper, we consider whether such an approach will also work under lateral gene transfer (LGT). By providing an exact analysis of some cases of this model, we show that there is a zone of inconsistency for triplet-based species tree reconstruction under LGT. However, a triplet-based approach will consistently reconstruct a species tree under models of LGT, provided that the expected number of LGT transfers is not too high. Our analysis involves a novel connection between the LGT problem and random walks on cyclic graphs. We have implemented a procedure for reconstructing trees subject to LGT or lineage sorting in settings where taxon coverage may be patchy and illustrate its use on two sample data sets.