Membrane environment imposes unique selection pressures on transmembrane domains of G protein-coupled receptors

Membrane environment imposes unique selection pressures on transmembrane domains of G protein-coupled receptors
Stephanie J. Spielman, Claus O. Wilke
(Submitted on 25 Nov 2012)

We have investigated the influence of the plasma membrane environment on the molecular evolution of G protein-coupled receptors (GPCRs), the largest receptor family in Metazoa. In particular, we have analyzed the site-specific rate variation across the two primary structural partitions, transmembrane (TM) and extramembrane (EM), of these membrane proteins. We find that transmembrane domains evolve more slowly than do extramembrane domains, though TM domains display increased rate heterogeneity relative to their EM counterparts. Although the majority of residues across GPCRs experience strong to weak purifying selection, many GPCRs experience positive selection at both TM and EM residues, albeit with a slight bias towards the EM. Further, a subset of GPCRs, chemosensory receptors (including olfactory and taste receptors), exhibit increased rates of evolution relative to other GPCRs, an effect which is more pronounced in their TM spans. Although it has been previously suggested that the TM’s low evolutionary rate is caused by their high percentage of buried residues, we show that their attenuated rate seems to stem from the strong biophysical constraints of the membrane itself, or by functional requirements.

The genetic architecture of adaptations to high altitude in Ethiopia

The genetic architecture of adaptations to high altitude in Ethiopia

Gorka Alkorta-Aranburu, Cynthia M. Beall, David B. Witonsky, Amha Gebremedhin, Jonathan K. Pritchard, Anna Di Rienzo
(Submitted on 13 Nov 2012)

Although hypoxia is a major stress on physiological processes, several human populations have survived for millennia at high altitudes, suggesting that they have adapted to hypoxic conditions. This hypothesis was recently corroborated by studies of Tibetan highlanders, which showed that polymorphisms in candidate genes show signatures of natural selection as well as well-replicated association signals for variation in hemoglobin levels. We extended genomic analysis to two Ethiopian ethnic groups: Amhara and Oromo. For each ethnic group, we sampled low and high altitude residents, thus allowing genetic and phenotypic comparisons across altitudes and across ethnic groups. Genome-wide SNP genotype data were collected in these samples by using Illumina arrays. We find that variants associated with hemoglobin variation among Tibetans or other variants at the same loci do not influence the trait in Ethiopians. However, in the Amhara, SNP rs10803083 is associated with hemoglobin levels at genome-wide levels of significance. No significant genotype association was observed for oxygen saturation levels in either ethnic group. Approaches based on allele frequency divergence did not detect outliers in candidate hypoxia genes, but the most differentiated variants between high- and lowlanders have a clear role in pathogen defense. Interestingly, a significant excess of allele frequency divergence was consistently detected for genes involved in cell cycle control, DNA damage and repair, thus pointing to new pathways for high altitude adaptations. Finally, a comparison of CpG methylation levels between high- and lowlanders found several significant signals at individual genes in the Oromo.

The effect of multiple paternity on genetic diversity during and after colonisation

The effect of multiple paternity on genetic diversity during and after colonisation

M. Rafajlovic, A. Eriksson, A. Rimark, S. H. Saltin, G. Charrier, M. Panova, C. André, K. Johannesson, B. Mehlig
(Submitted on 5 Nov 2012)

In metapopulations, genetic variation of local populations is influenced by the genetic content of the founders, and of migrants following establishment. We analyse the effect of multiple paternity on genetic diversity using a model in which the highly promiscuous marine snail Littorina saxatilis expands from a mainland to colonise initially empty islands of an archipelago. Migrant females carry a large number of eggs fertilised by 1 – 10 mates. We quantify the genetic diversity of the population in terms of its heterozygosity: initially during the transient colonisation process, and at long times when the population has reached an equilibrium state with migration. During colonisation, multiple paternity increases the heterozygosity by 10 – 300 % in comparison with the case of single paternity. The equilibrium state, by contrast, is less strongly affected: multiple paternity gives rise to 10 – 50 % higher heterozygosity compared with single paternity. Further we find that far from the mainland, new mutations spreading from the mainland cause bursts of high genetic diversity separated by long periods of low diversity. This effect is boosted by multiple paternity. We conclude that multiple paternity facilitates colonisation and maintenance of small populations, whether or not this is the main cause for the evolution of extreme promiscuity in Littorina saxatilis.

Genomic mutation rates that neutralize adaptive evolution and natural selection

Genomic mutation rates that neutralize adaptive evolution and natural selection

Philip Gerrish, Alexandre Colato, Paul Sniegowski
(Submitted on 5 Nov 2012)

When mutation rates are low, natural selection remains effective, and increasing the mutation rate can give rise to an increase in adaptation rate. When mutation rates are high to begin with, however, increasing the mutation rate may have a detrimental effect because of the overwhelming presence of deleterious mutations. Indeed, if mutation rates are high enough: 1) adaptation rate can become negative despite the continued availability of adaptive and/or compensatory mutations, or 2) natural selection may be disabled because adaptive and/or compensatory mutations — whether established or newly-arising — are eroded by excessive mutation and decline in frequency. We apply these two criteria to a standard model of asexual adaptive evolution and derive mathematical expressions — some new, some old in new guise — delineating the mutation rates under which either adaptive evolution or natural selection is neutralized. The expressions are simple and require no \emph{a priori} knowledge of organism- and/or environment-specific parameters. Our discussion connects these results to each other and to previous theory, showing convergence or equivalence of the different results in most cases.

Response to Horizontal gene transfer may explain variation in θs

Response to Horizontal gene transfer may explain variation in \theta_s

Inigo Martincorena, Nicholas M. Luscombe
(Submitted on 5 Nov 2012)

In a short article submitted to ArXiv [1], Maddamsetti et al. argue that the variation in the neutral mutation rate among genes in Escherichia coli that we recently reported [2] might be explained by horizontal gene transfer (HGT). To support their argument they present a reanalysis of synonymous diversity in 10 E.coli strains together with an analysis of a collection of 1,069 synonymous mutations found in repair-deficient strains in a long-term in vitro evolution experiment. Here we respond to this communication. Briefly, we explain that HGT was carefully accounted for in our study by multiple independent phylogenetic and population genetic approaches, and we show that there is no new evidence of HGT affecting our results. We also argue that caution must be exercised when comparing mutations from repair deficient strains to data from wild-type strains, as these conditions are dominated by different mutational processes. Finally, we reanalyse Maddamsetti’s collection of mutations from a long-term in vitro experiment and we report preliminary evidence of non-random variation of the mutation rate in these repair deficient strains.

Our paper: The McDonald-Kreitman Test and its Extensions under Frequent Adaptation: Problems and Solutions

For our next guest post Philipp Messer and Dmitri Petrov write about their paper
The McDonald-Kreitman Test and its Extensions under Frequent Adaptation: Problems and Solutions, arXived here

The McDonald-Kreitman (MK) test is the basis of most modern approaches to measure the rate of adaptation from population genomic data. This test was used to argue that in some organisms, such as Drosophila, the rate of adaptation is surprisingly high. However, the MK test, and in fact most of the current machinery of population genetics, relies on the assumption that adaptation is rare so that the effects of selective sweeps on linked variation can be neglected. We test this assumption using a powerful forward simulation and show that the MK test is severely biased even when the rate of adaptation is only moderate. The biases arise from the complex linkage effects between slightly deleterious and strongly advantageous mutations. In order to deal with these biases, we suggest a new robust approach based on a simple asymptotic extension of the MK test.

We further show that already under very moderate amounts of adaptation, linkage effects from recurrent selective sweeps can profoundly affect key population genetic parameters, such as the fixation probabilities of deleterious mutations and the frequency distributions of polymorphisms. In synonymous polymorphism data, these linkage effects leave signatures that can easily be mistaken for the signatures of recent, severe population expansion.

The bigger claim of our paper is that the effects of linked selection cannot be simply swept under the rug by introducing effective parameters, such as effective population size or effective strength of selection, and then using these effective parameters in formulae derived from the diffusion approximation under the assumption of free recombination. Given that most of our estimates of the key evolutionary parameters are still obtained from methods based on this paradigm, we argue that it is crucial to verify whether they are robust to linkage effects.

Philipp Messer and Dmitri Petrov

Inference of Admixture Parameters in Human Populations Using Weighted Linkage Disequilibrium

Inference of Admixture Parameters in Human Populations Using Weighted Linkage Disequilibrium

Po-Ru Loh, Mark Lipson, Nick Patterson, Priya Moorjani, Joseph K Pickrell, David Reich, Bonnie Berger
(Submitted on 1 Nov 2012)

Long-range migrations and the resulting admixture between populations have been an important force shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture that previous formal tests cannot. We further show that we can discover phylogenetic relationships between populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the computation. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.

Using haplotype differentiation among hierarchically structured populations for the detection of selection signatures

Using haplotype differentiation among hierarchically structured populations for the detection of selection signatures

Marìa Inès Fariello, Simon Boitard, Hugo Naya, Magali SanCristobal, Bertrand Servin
(Submitted on 29 Oct 2012)

The detection of molecular signatures of selection is one of the major concerns of modern population genetics. A widely used strategy in this context is to compare samples from several populations, and to look for genomic regions with outstanding genetic differentiation between these populations. Genetic differentiation is generally based on allele frequency differences between populations, which are measured by Fst or related statistics. Here we introduce a new statistic, denoted hapFLK, which focuses instead on the differences of haplotype frequencies between populations. In contrast to most existing statistics, hapFLK accounts for the hierarchical structure of the sampled populations. Using computer simulations, we show that each of these two features – the use of haplotype information and of the hierarchical structure of populations – significantly improves the detection power of selected loci, and that combining them in the hapFLK statistic provides even greater power. We also show that hapFLK is robust with respect to bottlenecks and migration and improves over existing approaches in many situations. Finally, we apply hapFLK to a set of six sheep breeds from Northern Europe, and identify seven regions under selection, which include already reported regions but also several new ones. We propose a method to help identifying the population(s) under selection in a detected region, which reveals that in many of these regions selection most likely occurred in more than one population. Furthermore, several of the detected regions correspond to incomplete sweeps, where the favourable haplotype is only at intermediate frequency in the population(s) under selection.

Horizontal gene transfer may explain variation in θs

Horizontal gene transfer may explain variation in θs
Rohan Maddamsetti, Philip J. Hatcher, Stéphane Cruveiller, Claudine Médigue, Jeffrey E. Barrick, Richard E. Lenski
(Submitted on 28 Sep 2012)

Martincorena et al. estimated synonymous diversity (\theta s = 2N \mu ) across 2,930 orthologous gene alignments from 34 Escherichia coli genomes, and found substantial variation among genes in the density of synonymous polymorphisms. They argue that this pattern reflects variation in the mutation rate per nucleotide (\mu) among genes. However, the effective population size (N) is not necessarily constant across the genome. In particular, different genes may have different histories of horizontal gene transfer (HGT), whereas Martincorena et al. used a model with random recombination to calculate \theta s. They did filter alignments in an effort to minimize the effects of HGT, but we doubt that any procedure can completely eliminate HGT among closely related genomes, such as E. coli living in the complex gut community.
Here we show that there is no significant variation among genes in rates of synonymous substitutions in a long-term evolution experiment with E. coli and that the per-gene rates are not correlated with \theta s estimates from genome comparisons. However, there is a significant association between \theta s and HGT events. Together, these findings imply that \theta s variation reflects different histories of HGT, not local optimization of mutation rates to reduce the risk of deleterious mutations as proposed by Martincorena et al.

Our paper: An age-of-allele test of neutrality for transposable element insertions not at equilibrium

[This author post is by Justin Blumenstiel and Casey Bergman on An age-of-allele test of neutrality for transposable element insertions not at equilibrium, available from the arXiv here]

Studies over the past several decades in Drosophila melanogaster have demonstrated that TE insertion alleles in natural populations tend to segregate at low frequency, particularly in regions of the genome that have a high recombination rate where natural selection is most effective. These results have largely supported a model where natural selection acts to remove deleterious TE insertions from the genome.  The prevailing model of why TE insertions are deleterious is that they lead to chromosomal aberrations that occur when dispersed, non-allelic repeated sequences crossover with one another. This model is known as the ectopic recombination model and it has an important feature. Since each new insertion has the potential to recombine with all the other copies in the genome, fitness will go down faster and faster with each new copy. This yields a stable equilibrium in TE copy number.

But, are TEs at equilibrium in natural populations? Genome sequencing studies have shown that the rate of TE proliferation can vary widely over time and any given TE family may demonstrate non-equilibrium “boom and bust” behavior. How do we reconcile studies that assume equilibrium with the fact that we know TE dynamics are not at equilibrium? To deal with this problem, I began developing this model out of a class project with John Wakeley while I was a graduate student over a decade ago. This model arose of some work I published in 200­2 with Hartl and Lozovsky on the age structure of non-LTR elements in D. melanogaster. I wrote this model up for my Ph.D. thesis and presented a preliminary version in a paper with Neafsey and Hartl in 2004, but it sat on the back burner until I reviewed a paper by Bergman and Bensasson in 2007 that showed many TE families in D. melanogaster have recently inserted in the genome and may not be at equilibrium.

Shortly after their paper came out I contacted Casey with the model from my thesis and we decided to push this idea forward as a collaboration, which has taken several a few years to come to fruition (both being busy with other projects and starting our labs). Things started to really move ahead when Miaomiao He in Casey’s lab generated a crucial data set that could be specifically applied to the model – strain-specific presence/absence data for a very large number of TE insertions ascertained from the D. melanogaster genome sequence.  After a few more years with it on simmer, working out several kinks in the mean time (e.g. incorporating host  demography, trying many different methods for estimating the posterior distribution of TE ages), Casey and I finally wrapped it up just as Haldane’s Sieve is starting to hit its stride. I expect that all my papers in the future will be pre-released on arXiv.

I could speak at length on the specific results, but I would just be saying what is already in the abstract. So, I would like to bring up three points for potential conversation.

First, what does it mean for TEs to be at transposition-selection balance when we know different TE families show a signature of “boom and bust” in genome sequences? There may be one way to reconcile this apparent problem. Any particular TE family may in fact not be at transposition-selection balance. For example, the P element, which invaded Drosophila melanogaster only a few decades ago, is hardly at transposition-selection balance. Therefore, one must be careful in using insertion frequencies for P elements to describe general TE dynamics. However, by integrating over all TE families in the genome, one may in fact reach an approximation that might be reasonable for assuming equilibrium transposition-selection balance. But one must be careful of something I call “family ascertainment bias”. Sometimes the most recently activated TEs are the ones easiest to discover and annotate because these ones are easily cloned from insertion mutations or are most frequent in genome sequences.

Second, in this paper, we derive the probability distribution for each individual TE insertion frequency based on its age. We demonstrate that this provides a method for TE insertions that are either positively or negatively selected. In the case where we show allele frequencies are less than expected (i.e. predicted to be negatively selected), many of these are copies that have zero substitutions. In principle, all of these could have inserted one generation before the reference strain was collected for genome sequencing. The inference that selection is acting against these TEs implicitly assumes either: 1) this wasn’t the case for many of these insertions, and the posterior distribution of ages is a good representation of the true age distribution, or 2) it may have been the case, but natural selection has already acted to remove slightly older TEs from the population, therefore making them absent from the genome sequence.

Third, when putting the finishing touches on our analysis of TE insertion data in North America, we ran up against the issue that nobody has yet published an explicit demographic scenario for North American populations of D. melanogaster, similar to those that have been developed by Wolfgang Stephan‘s Lab and others for European and African populations. We found one paper by Yukilevich et al (2010) from John True’s Lab that generated similar findings to the demography of European populations, which is consistent with the idea that North America populations of D. melanogaster are mainly derived from European ancestors.  However, Yukilevich et al (2010) didn’t explicitly model the admixture with African populations, which is known to occur in North American populations as shown by Caracristi and Schlötterer in 2003. We were surprised that an explicit admixture scenario has not been published yet, especially since this is crucial for interpreting the data from population genomic projects like the Drosophila Genetic Reference Panel. This should be an important line of work for someone to pursue (if it isn’t being done already) and if anyone has information about this a demographic model for North American populations of D. melanogaster, we’d be keen to know more so we can see if might improve our analysis.

Justin and Casey