Compensatory evolution and the origins of innovations.

Compensatory evolution and the origins of innovations. (arXiv:1212.2658v1 [q-bio.PE])
by Etienne Rajon, Joanna Masel

Cryptic genetic sequences have attenuated effects on phenotypes. In the classic view, relaxed selection allows cryptic genetic diversity to build up across individuals in a population, providing alleles that may later contribute to adaptation when co-opted – e.g. following a mutation increasing expression from a low, attenuated baseline. This view is described, for example, by the metaphor of the spread of a population across a neutral network in genotype space. As an alternative view, consider the fact that most phenotypic traits are affected by multiple sequences, including cryptic ones. Even in a strictly clonal population, the co-option of cryptic sequences at different loci may have different phenotypic effects and offer the population multiple adaptive possibilities. Here, we model the evolution of quantitative phenotypic characters encoded by cryptic sequences, and compare the relative contributions of genetic diversity and of variation across sites to the phenotypic potential of a population. We show that most of the phenotypic variation accessible through co-option would exist even in populations with no polymorphism. This is made possible by a history of compensatory evolution, whereby the phenotypic effect of a cryptic mutation at one site was balanced by mutations elsewhere in the genome, leading to a diversity of cryptic effect sizes across sites rather than across individuals. Cryptic sequences might accelerate adaptation and facilitate large phenotypic changes even in the absence of genetic diversity, as traditionally defined in terms of alternative alleles.

Detection of selective sweeps in cattle using genome-wide SNP data

Detection of selective sweeps in cattle using genome-wide SNP data
Holly R. Ramey, Jared E. Decker, Stephanie D. McKay, Megan M. Rolf, Robert D. Schnabel, Jeremy F. Taylor
(Submitted on 11 Dec 2012)

The domestication and subsequent selection by humans to create breeds of cattle undoubtedly altered the patterning of variation within their genomes. Strong selection to fix advantageous large-effect mutations underlying domesticability, breed characteristics or productivity created selective sweeps in which variation was lost in the chromosomal region flanking the selected allele. Selective sweeps have been identified in the genomes of many species including humans, dogs, horses, and chickens. We attempt to identify regions of the bovine genome that have been subjected to selective sweeps. Two datasets were used for the discovery and validation of selective sweeps via the fixation of alleles at a series of contiguous SNP loci. BovineSNP50 data were used to identify 28 putative sweep regions among 14 cattle breeds. Affymetrix BOS 1 prescreening assay data for five breeds were used to identify 114 regions and validate 5 regions identified using the BovineSNP50 data. Many genes are located within these regions; however, phenotypes that we predict to have historically been under strong selection include horned-polled, coat color, stature, ear morphology, and behavior. The identified selective sweeps represent recent events associated with breed formation rather than ancient events associated with domestication. No sweep regions were shared between indicine and taurine breeds reflecting their divergent selection histories. A primary finding of this study is the sensitivity of results to assay resolution. Despite the bias towards common SNPs in the BovineSNP50 design, false positive sweep regions appear to be common due to the limited resolution of the assay. This assay design bias leads to the detection of breed-specific sweep regions, or regions shared by a small number of breeds, restricting the suite of selected phenotypes detected to primarily those associated with breed characteristics.

Oh sister, where art thou? Indirect fitness benefit could maintain a host defense trait

Oh sister, where art thou? Indirect fitness benefit could maintain a host defense trait
Tobias Pamminger, Susanne Foitzik, Dirk Metzler, Pleuni S. Pennings
(Submitted on 4 Dec 2012)

Population structure can affect the evolution of parasite virulence and host defense, a hypothesis that has been confirmed by studies focusing on large spatial scales. In contrast, we examine the small scale population structure of a host species and investigate whether it could explain the evolution of a defense trait against slavemaking ants. Slavemaking ants steal worker brood from host colonies, which will later serve as slaves to rear parasite offspring. The host species Temnothorax longispinosus has evolved an effective post-enslavement defense mechanism; instead of taking care of the slavemaker young, these slaves kill a high proportion of the parasite offspring. Because slaves never reproduce, they were thought to be trapped in an evolutionary dead end without the possibility of evolving such defense traits. Using detailed microsatellite data on a small spatial scale we can demonstrate that slaves can gain indirect fitness benefits by reducing parasite pressure on nearby host colonies, because these are often closely related to the slaves. Our genetic analyses indicate that polydomy, i.e., the occupation of several nest sites by a single colony, is sufficient to explain the elevated relatedness values between slaves and the surrounding host colonies, which may benefit from the slaves’ rebellion behavior.

GWAPP: A Web Application for Genome-wide Association Mapping in A. thaliana

GWAPP: A Web Application for Genome-wide Association Mapping in A. thaliana
Ümit Seren (1), Bjarni J. Vilhjálmssona (1 and 2), Matthew W. Horton (1 and 3), Dazhe Meng (4), Petar Forai (1), Yu S. Huang (4), Quan Long (1), Vincent Segura (5), Magnus Nordborg (1 and 2) ((1) Gregor Mendel, Institute Austrian Academy of Sciences, (2) Molecular and Computational Biology, University of Southern California, (3) Department of Ecology and Evolution, University of Chicago, (4) Center for Neurobehavioral Genetics, Semel Institute, University of California Los Angeles, (5) INRA, France)
(Submitted on 4 Dec 2012)

Arabidopsis thaliana is an important model organism for understanding the genetics and molecular biology of plants. Its highly selfing nature, together with other important features, such as small size, short generation time, small genome size, and wide geographic distribution, make it an ideal model organism for understanding natural variation. Genome-wide association studies (GWAS) have proven a useful technique for identifying genetic loci responsible for natural variation in A. thaliana. Previously genotyped accessions (natural inbred lines) can be grown in replicate under different conditions, and phenotyped for different traits. These important features greatly simplify association mapping of traits and allow for systematic dissection of the genetics of natural variation by the entire Arabidopsis community. To facilitate this, we present GWAPP, an interactive web-based application for conducting GWAS in A. thaliana. Using an efficient Python implementation of a linear mixed model, traits measured for a subset of 1386 publicly available ecotypes can be uploaded and mapped with an efficient mixed model and other methods in just a couple of minutes. GWAPP features an extensive, interactive, and a user-friendly interface that includes interactive manhattan plots and interactive local and genome-wide LD plots. It facilitates exploratory data analysis by implementing features such as the inclusion of candidate SNPs in the model as cofactors.

The evolution of complex gene regulation by low specificity binding sites

The evolution of complex gene regulation by low specificity binding sites
Alexander J. Stewart, Joshua B. Plotkin
(Submitted on 30 Nov 2012)

Transcription factor binding sites vary in their specificity, both within and between species. Binding specificity has a strong impact on the evolution of gene expression, because it determines how easily regulatory interactions are gained and lost. Nevertheless, we have a relatively poor understanding of what evolutionary forces determine the specificity of binding sites. Here we address this question by studying regulatory modules composed of multiple binding sites. Using a population-genetic model, we show that more complex regulatory modules, composed of a greater number of binding sites, must employ binding sites that are individually less specific, compared to less complex regulatory modules. This effect is extremely general, and it hold regardless of the regulatory logic of a module. We attribute this phenomenon to the inability of stabilising selection to maintain highly specific sites in large regulatory modules. Our analysis helps to explain broad empirical trends in the yeast regulatory network: those genes with a greater number of transcriptional regulators feature by less specific binding sites, and there is less variance in their specificity, compared to genes with fewer regulators. Likewise, our results also help to explain the well-known trend towards lower specificity in the transcription factor binding sites of higher eukaryotes, which perform complex regulatory tasks, compared to prokaryotes.

Our paper: Bacterial diversity associated with Drosophila in the laboratory and in the natural environment

For next guest post Fabian Staubach and Dmitri Petrov write about their paper (along with coauthors) Bacterial diversity associated with Drosophila in the laboratory and in the natural environment arXived here.

Host associated bacterial communities are ubiquitous, have a variety of effects on the host phenotype and play a role in host adaptation to new environments. Some clear examples of such adaptations are known but generally these are ancient associations between host and symbiont, such as the association between aphids and the obligate symbiotic bacterium Buchnera that provides the aphid with essential amino acids or the association between bee wolfs and Streptomyces that protects bee wolf larvae from fungal infections. We are investigating the potential of bacterial communities to underlie short-term adaptation using adaptation of D. melanogaster and D. simulans to different fruit as a study system.

As the first step we profiled the diversity and composition of bacterial communities associated with Drosophila across multiple species, habitats, and substrates. We amplified and sequenced a region of the bacterial ribosomal DNA from whole body fly samples using 454 technology. We focused on comparing the bacterial communities of the sibling species D. melanogaster and D. simulans in the lab and in an ecologically and evolutionary relevant setting: their natural environment. In most cases we were able to study flies from these two species collected by aspiration from the same fruit. We also included nine different species spanning the Drosophila phylogeny to test whether phylogenetic distance and distance between bacterial communities are correlated.

We show that natural bacterial communities associated with Drosophila contain more different bacterial taxa than previously thought. Comparison to a mammalian fecal data set reveals that although mammal-associated bacterial communities are more diverse on average, the diversity of some mammalian fecal samples lies within the range or is even lower than that of the Drosophila samples we analyzed. This finding is interesting because it has been a matter of debate whether organisms with an adaptive immune system can in general accommodate higher bacterial diversity. By comparing the bacterial communities of D. melanogaster and D. simulans collected directly from different natural food substrates we demonstrate that bacterial communities differ primarily between substrates and very weakly among fly species.

We find acetic acid bacteria of the genera Acetobacter and Gluconobacter to be associated with all wild-caught flies constituting two thirds of all sequences. Acetic acid bacteria oxidize sugars and ethanol to acetic acid and are known to be directly involved in the development of a specific process of decay called ‘sour rot’ on grapes that causes wine spoilage. There is previous evidence that Drosophila is vital for the dispersal of acetic acid bacteria among rotting fruit: grapes covered with nets in the field do acquire yeasts, but no acetic acid bacteria and acetic acid bacteria thrive on grapes only when flies are present. At the same time, Acetobacter has been shown to promote Drosophila larval growth and shorten development time under certain nutritional conditions. Therefore, we argue that the relationship between Acetobacteraceae and Drosophila is likely mutualistic.

Individual natural fly samples are dominated by bacteria known to be pathogenic in Drosophila, such as Enterococcus and Providencia. These bacteria are known to reach very high cell counts during systemic infections of Drosophila and we believe that the inclusion of systemically infected flies in these samples is the most likely explanation for the observed pattern. The observation that it is in principle possible to identify potential candidate pathogens in natural populations using standard, high throughput microbial community screening techniques opens up opportunities for large scale epidemiological studies in nature and can help to identify candidate pathogenic bacterial species for further investigation in the laboratory.

In the laboratory, fly associated bacterial communities are similar irrespective of phylogenetic distance between fly species, suggesting that host genetic factors either play a minor role in shaping the bacterial communities associated with Drosophila or, as suggested by the difference of bacterial communities between D. melanogster and D. simulans in the wild, require natural conditions to manifest themselves. High variability of Drosophila bacterial communities within and between laboratories is a potential source of experimental noise when studying phenotypic variation. The impact of microbes on Drosophila phenotypes ranges from influencing growth to cold tolerance and it is hard to imagine traits that are not subject in principle to alteration by microbes.

We hope that our data will serve as a solid foundation for future studies especially for the growing community of scientists that are interested in the microbial communities that are associated with Drosophila.

Fabian Staubach and Dmitri Petrov

Identifying a species tree subject to random lateral gene transfer

Identifying a species tree subject to random lateral gene transfer

Mike Steel, Simone Linz, Daniel H. Huson, Michael J. Sanderson
(Submitted on 30 Nov 2012)

A major problem for inferring species trees from gene trees is that evolutionary processes can sometimes favour gene tree topologies that conflict with an underlying species tree. In the case of incomplete lineage sorting, this phenomenon has recently been well-studied, and some elegant solutions for species tree reconstruction have been proposed. One particularly simple and statistically consistent estimator of the species tree under incomplete lineage sorting is to combine three-taxon analyses, which are phylogenetically robust to incomplete lineage sorting. In this paper, we consider whether such an approach will also work under lateral gene transfer (LGT). By providing an exact analysis of some cases of this model, we show that there is a zone of inconsistency for triplet-based species tree reconstruction under LGT. However, a triplet-based approach will consistently reconstruct a species tree under models of LGT, provided that the expected number of LGT transfers is not too high. Our analysis involves a novel connection between the LGT problem and random walks on cyclic graphs. We have implemented a procedure for reconstructing trees subject to LGT or lineage sorting in settings where taxon coverage may be patchy and illustrate its use on two sample data sets.

My paper ” HIV drug resistance: problems and perspectives”

Our next guest post is by Pleuni Pennings (@pleunipennings) on her paper HIV drug resistance: problems and perspectives arXived here, cross posted from her website here.

A few days ago, I submitted a review paper to Infectious Disease Reports. The review is an invited essay for the special issue they are planning around the World AIDS Day (December 1st).

I was pleasantly surprised to see that the author guidelines of Infectious Disease Reports said: “Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.” So, I decided to upload the manuscript to the arXiv.

The essay describes the current situation of drug resistance in HIV. The main conclusion is that, overall, drug resistance is not as big a problem as one may think. Treatments have become very good, which means that the rate of evolution of drug resistance is low. At the same time, many new drugs have become available so that when drug resistance evolves, the patient can be switched to another set of drugs. However, in poor countries, where viral genotyping, viral load monitoring and many new drugs are not available, drug resistance still poses a serious threat to people’s health.

In the essay, I explain that transmitted drug resistance occurs, but at a level that is lower than many would have expected. Roughly 10% of newly infected patients are infected with an HIV strain with at least one major drug-resistance mutation. If the virus is genotyped before treatment is started (as is standard in rich, but not in poor, countries), then treatment success is very high for these patients.

Acquired drug resistance (when resistance evolves during treatment) is more common than transmitted drug resistance, and resistance can evolve even after many years of successful treatment. It can also happen that the virus becomes resistant against multiple drugs. Nowadays, there are many different drugs available, so that even patients with multi-class drug resistance can often be treated successfully, although this is not the case in poor countries, simply because the newer drugs are expensive.

I also describe what is known about resistance due to treatment for the prevention of mother-to-child-transmission (which is a big problem) and resistance due to pre-exposure prophylaxis (which occurs, but is uncommon). I also discuss the issue of low-frequency resistance mutations and their clinical relevance. Throughout the essay, I explain how certain effects are expected or surprising from an evolutionary perspective.

I thank my collaborators Daniel Rosenbloom and Alison Hill (both at Harvard) for useful comments on an earlier version of the manuscript.

Pleuni Pennings

HIV drug resistance: problems and perspectives

HIV drug resistance: problems and perspectives
Pleuni S Pennings
(Submitted on 25 Nov 2012)

Many HIV patients now have access to combination antiretroviral treatment (ART). At the end of 2011, more than eight million people were receiving antiretroviral therapy in low-income and middle-income countries. ART generally works well in keeping the virus suppressed and the patient healthy. However, treatment only works as long as the virus is not resistant against the drugs used. In the last decades HIV treatments have become better and better at slowing down the evolution of drug resistance, so that some patients are treated for many years without having any resistance problems. However, for some patients, especially in low-income countries, drug resistance is still a serious threat to their health. This essay will review what is known about transmitted and acquired drug resistance, multi-class drug resistance, resistance to newer drugs, resistance due to treatment for the prevention of mother-to-child transmission, the role of minority variants (low-frequency drug-resistance mutations), and resistance due to pre-exposure prophylaxis.

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed
Laurent Jacob, Johann Gagnon-Bartsch, Terence P. Speed
(Submitted on 18 Nov 2012)

When dealing with large scale gene expression studies, observations are commonly contaminated by unwanted variation factors such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g., when the goal is to cluster the samples or to build a corrected version of the dataset – as opposed to the study of an observed factor of interest – taking unwanted variation into account can become a difficult task. The unwanted variation factors may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data or build estimators for unsupervised problems. The proposed methods are then evaluated on three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state of the art corrections.