Neanderthals had our de novo genes.

Neanderthals had our de novo genes

John Stewart Taylor

In 2009 Knowles and McLysaght reported the discovery of three human genes derived from non-coding DNA. They provided evidence that these genes, CLUU1, C22orf45, and DNAH10OS, were transcribed and translated, they identified orthologous non-coding DNA in chimpanzee (Pan troglodytes) and macaque (Macaca mulatta), and for each gene they located the critical ?enabler? mutations that extended the open reading frames (ORFs) allowing the production of a protein. These genes had no BLASTp hits in any other genome and were considered to be novel human genes, possibly responsible for human-specific traits. Since the discovery of these genes, new high quality Denisovan and Neanderthal genomes have been reported. I used these resources in an effort to determine whether or not CLUU1, C22orf45, and DNAH10OS were truly human-specific.

Genetic drift suppresses bacterial conjugation in spatially structured populations

Genetic drift suppresses bacterial conjugation in spatially structured populations

Peter D. Freese, Kirill S. Korolev, Jose I. Jimenez, Irene A. Chen
(Submitted on 24 Feb 2014)

Conjugation is the primary mechanism of horizontal gene transfer that spreads antibiotic resistance among bacteria. Although conjugation normally occurs in surface-associated growth (e.g., biofilms), it has been traditionally studied in well-mixed liquid cultures lacking spatial structure, which is known to affect many evolutionary and ecological processes. Here we visualize spatial patterns of gene transfer mediated by F plasmid conjugation in a colony of Escherichia coli growing on solid agar, and we develop a quantitative understanding by spatial extension of traditional mass-action models. We found that spatial structure suppresses conjugation in surface-associated growth because strong genetic drift leads to spatial isolation of donor and recipient cells, restricting conjugation to rare boundaries between donor and recipient strains. These results suggest that ecological strategies, such as enforcement of spatial structure and enhancement of genetic drift, could complement molecular strategies in slowing the spread of antibiotic resistance genes.

An Improved Approximate-Bayesian Model-choice Method for Estimating Shared Evolutionary History

An Improved Approximate-Bayesian Model-choice Method for Estimating Shared Evolutionary History

Jamie R. Oaks
(Submitted on 25 Feb 2014)

To understand the processes that generate biodiversity, it is important to account for large-scale processes that affect the evolutionary history of groups of co-distributed populations of organisms. Such events predict temporally clustered divergences times, a pattern that can be estimated using genetic data from co-distributed species. I introduce a new approximate-Bayesian method for comparative phylogeographical model-choice that estimates the temporal distribution of divergences across taxa from multi-locus DNA sequence data. The model is an extension of that implemented in msBayes. By reparameterizing the model, introducing more flexible priors on demographic and divergence-time parameters, and implementing a non-parametric Dirichlet-process prior over divergence models, I improved the robustness, accuracy, and power of the method for estimating shared evolutionary history across taxa. The results demonstrate the improved performance of the new method is due to (1) more appropriate priors on divergence-time and demographic parameters that avoid prohibitively small marginal likelihoods for models with more divergence events, and (2) the Dirichlet-process providing a flexible prior on divergence histories that does not strongly disfavor models with intermediate numbers of divergence events. The new method yields more robust estimates of posterior uncertainty, and thus greatly reduces the tendency of the model to incorrectly estimate biogeographically interesting models with strong support.

Genetic drift opposes mutualism during spatial population expansion

Genetic drift opposes mutualism during spatial population expansion

Melanie JI Muller, Beverly I Neugeboren, David R Nelson, Andrew W Murray
(Submitted on 24 Feb 2014)

Mutualistic interactions benefit both partners, promoting coexistence and genetic diversity. Spatial structure can promote cooperation, but spatial expansions may also make it hard for mutualistic partners to stay together, since genetic drift at the expansion front creates regions of low genetic and species diversity. To explore the antagonism between mutualism and genetic drift, we grew cross-feeding strains of the budding yeast S. cerevisiae on agar surfaces as a model for mutualists undergoing spatial expansions. By supplying varying amounts of the exchanged nutrients, we tuned strength and symmetry of the mutualistic interaction. Strong mutualism suppresses genetic demixing during spatial expansions and thereby maintains diversity, but weak or asymmetric mutualism is overwhelmed by genetic drift even when mutualism is still beneficial, slowing growth and reducing diversity. Theoretical modeling using experimentally measured parameters predicts the size of demixed regions and how strong mutualism must be to survive a spatial expansion.

Strong selective sweeps associated with ampliconic regions in great ape X chromosomes

Strong selective sweeps associated with ampliconic regions in great ape X chromosomes

Kiwoong Nam, Kasper Munch, Asger Hobolth, Julien Y. Dutheil, Krishna Veeramah, August Woerner, Michael F. Hammer, Great Ape Genome Diversity Project, Thomas Mailund, Mikkel H. Schierup
(Submitted on 24 Feb 2014)

The unique inheritance pattern of X chromosomes makes them preferential targets of adaptive evolution. We here investigate natural selection on the X chromosome in all species of great apes. We find that diversity is more strongly reduced around genes on the X compared with autosomes, and that a higher proportion of substitutions results from positive selection. Strikingly, the X exhibits several megabase long regions where diversity is reduced more than five fold. These regions overlap significantly among species, and have a higher singleton proportion, population differentiation, and nonsynonymous to synonymous substitution ratio. We rule out background selection and soft selective sweeps as explanations for these observations, and conclude that several strong selective sweeps have occurred independently in similar regions in several species. Since these regions are strongly associated with ampliconic sequences we propose that intra-genomic conflict between the X and the Y chromosomes is a major driver of X chromosome evolution.

Author post: Genome scans for detecting footprints of local adaptation using a Bayesian factor model

This guest post is by Michael Blum, Eric Bazin, and Nicolas Duforet-Frebourg on their preprint Genome scans for detecting footprints of local adaptation using a Bayesian factor model, available from the arXiv here.

Finding genomic regions subject to local adaptation is a central part of population genomics, which is based on genotyping numerous molecular markers and looking for outlier loci. Most common approaches use measures of genetic differentiation such as Fst. There are many software implementing genome scans based on statistics related to Fst (BayeScan, DetSel, FDist2 , Lositan), and they contribute to the popularity of this approach in population genomics.

However, there are different statistical and computational problems that may arise with approaches based on Fst or related measures. The first problem arises because methods related to Fst assume the so-called F-model, which corresponds to a particular covariance structure for gene frequencies among populations (Bierne et al. 2013). When spatial structure departs from the assumption of the F-model, it can generate many false positives. A second potential problem concerns the computational burden of some Bayesian approaches, which can become an obstacle with large number of SNPs. The last problem is that individuals should be grouped into populations in advance whereas working at the scale of individuals is desirable because it avoids defining populations.

Using a Bayesian factor model, we address the three aforementioned problems. Factor models capture population structure by inferring latent variables called factors. Factor models have already been proposed to ascertain population structure (Engelhardt and Stephens 2010). Here we extend the framework of factor model in order to identify outlier loci in addition to the ascertainment of population structure. Our approach is not the first one to account for deviations to the assumptions of the F-model (Bonhomme et al. 2010, Günther and Coop 2013) but it does not require to define populations by contrast to the previous approaches. Using simulations, we show that factor model can achieve a 2-fold or more reduction of false discovery rate compared to the Fst-related approaches. We also analyze the HGDP human dataset to provide an example of how factor models can be used to detect local adaptation with a large number of SNPs. The Bayesian factor model is implemented in the PCAdapt software and we would be happy to answer to comments or questions regarding the software.

To explain why the factor model generates less false discoveries, we can introduce the notions of mechanistic and phenomenological models. Mechanistic models aim to mimic the biological processes that are thought to have given rise to the data whereas phenomenological models seek only to best describe the data using a statistical model. In the spectrum between mechanistic and phenomenological model, the F-model would stand close to mechanistic models whereas factor models would be closer to the phenomenological ones. Mechanistic models are appealing because they provide quantitative measures that can be related to biologically meaningful parameters. For instance, the parameters of the F-model measures genetic drift that can be related to migration rates, divergence times or population sizes. By contrast, phenomenological models work with mathematical abstractions such as latent factors that can be difficult to interpret biologically. The downside of mechanistic models is that violation of the modeling assumption can invalidate the proposed framework and generate many false discoveries in the context of selection scan. The F-model assumes a particular covariance matrix between populations which is found with star-like population trees for instance. However, more complex models of population structure can arise for various reasons including non-instantaneous divergence or isolation-by-distance, and they will violate the mechanistic assumptions and make phenomenological models preferable.

Michael Blum, Eric Bazin, and Nicolas Duforet-Frebourg

Genetic Analysis of Transformed Phenotypes

Genetic Analysis of Transformed Phenotypes

Nicolo Fusi, Christoph Lippert, Neil D. Lawrence, Oliver Stegle
(Submitted on 21 Feb 2014)

Linear mixed models (LMMs) are a powerful and established tool for studying the genetics of phenotypic variation. A limiting assumption of LMMs is that the phenotype is Gaussian distributed under the model, a requirement that rarely holds in practice. Since violations of this assumption can lead to false conclusions and losses in power, it’s common practice to pre-process the phenotypic values, for instance by applying logarithmic transformations. Unfortunately, these are not appropriate in every situation, and choosing a “good” transformation is in general challenging and subjective. Here, we present an extension of the LMM that estimates an optimal transformation from the data. We show in extensive simulations and real data from human, mouse and yeast that application of these optimal transformations leads to increased power in genome-wide association studies and higher accuracy in heritability estimates and phenotype predictions.

Extensive translation of small ORFs revealed by polysomal ribo-Seq

Extensive translation of small ORFs revealed by polysomal ribo-Seq

Julie L Aspden, Ying Chen Eyre-Walker, Rose J. Phillips, Michele Brocard, Unum Amin, Juan Couso

Thousands of small Open Reading Frames (smORFs) encoding small peptides of fewer than 100 amino acids exist in our genomes. Examples of functional smORFs have been characterised in a few species but the actual number of translated smORFs, and their molecular, functional and evolutionary features are not known. Here we present a genome-wide assessment of smORF translation by ribosomal profiling of polysomal fractions. This ‘polysomal ribo-Seq’ suggests that smORFs are translated at the same level and in the same relative numbers (80%) as normal proteins. The smORF peptides appear widely conserved, show activity in cells, and display a putative amino acid signature. These findings reinforce the idea that smORFs are an abundant and fundamental genome component, displaying features usually attributed to canonical proteins, including high translation levels, biological function, amino acid sequence specificity and cross-species conservation.

Genome scans for detecting footprints of local adaptation using a Bayesian factor model


Genome scans for detecting footprints of local adaptation using a Bayesian factor model

N. Duforet-Frebourg, E. Bazin, M.G.B. Blum
(Submitted on 21 Feb 2014)

A central part of population genomics consists of finding genomic regions implicated in local adaptation. Population genomic analyses are based on genotyping numerous molecular markers and looking for outlier loci in terms of patterns of genetic differentiation. One of the most common approach for selection scan is based on statistics that measure population differentiation such as FST. However they are important caveats with approaches related to FST because they require grouping individuals into populations and they additionally assume a particular model of population structure. Here we implement a more flexible individual-based approach based on Bayesian factor models. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. Factor models are strongly related to principal components analysis (PCA) and they model population structure with latent variables called factors. The hierarchical factor model considers that outlier loci are atypically explained by one of the factors. In a model of population divergence, we show that it can achieve a 2-fold or more reduction of false discovery rate compared to the software BayeScan or compared to a FST approach. We show that our software can handle large SNP datasets by analyzing the HGDP SNP dataset. The Bayesian factor model is implemented in the command-line PCAdapt software.

Author post: Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations

This guest post is by Matthieu Foll and Laurent Excoffier on their preprint (with co-authors) Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations, arXived here.

Background

Since the seminal paper of Lewontin and Krakauer (1973), Fst-based genome scan methods had to struggle with the confounding effect of population structure. These methods started to be very popular with the FDIST software implemented by Beaumont and Nichols (1996), which was based on an island model. At that time it was proposed that the island model was robust to different demographic scenario (recent divergence and growth, isolation by distance or heterogeneous levels of gene flow between populations). A Bayesian version of this model (generally called the F-model) in which populations can receive unequal number of migrants has then been proposed (Beaumont and Balding 2004; Foll and Gaggiotti 2008), and implemented in the BayeScan software (http://cmpg.unibe.ch/software/BayeScan/), which is now quite widely used.

However, all these models assume that migrant genes originate from a unique and common migrant pool. We started to realize that this assumption could lead to a massive amount of false positive when we tried to analyze the HGDP data, where this assumption was clearly not supported. To overcome this problem, we proposed an extension of Beaumont and Nichols’s (1996) based on a hierarchical island model (Excoffier et al. 2009) in which populations were assigned to different groups or regions. An island model was assumed in each group, and the group themselves were assumed to follow an island model. This new method was then implemented in Arlequin (Excoffier and Lischer 2010).

Note that alternative ways to deal with complex genetic structure have also been proposed (Coop et al. 2010; Bonhomme et al. 2010; Fariello et al. 2013; Günther and Coop 2013), but the main message people took from these papers was that methods aiming at identifying loci under selection can be quite sensitive to some hidden (or unaccounted) population structure and should be used with caution. Hermisson (2009) even rather provocatively asked: “Who believes in whole-genome scans for selection?”

One radical way to deal with the problem of complex genetic structure is to reduce the number of sampled populations to just two (Vitalis et al. 2001). This leads to a GWAS-like strategy where people sample two populations living in contrasting environments (playing the role of cases and controls in GWAS) with potentially different selection pressures. However, other problems occur when doing so: (i) having only two populations leads to a reduction in power, (ii) related to this first point, one generally needs to sample a larger number of individuals to have sufficient power, (iii) the comparison of results obtained from different pairs of populations can be problematic, especially when one is interested in detecting convergent selection by looking at the overlap in lists of candidate genes. In the last few years, studies comparing pairs of populations living in different environments have accumulated. Typically, each pairwise comparison produced a set of candidate loci, and people often used some informal criterion to identify “repeated outliers” based on the number of times they were identified in the different tests performed (see Nosil et al. 2008 or Paris et al. 2010 for example).

A new hierarchical F-model

We started to think that the introduction of a Bayesian F-model dealing with a hierarchical population structure could solve some of these problems. We therefore introduced a hierarchical F-model where populations are assigned to different groups. In each group the genetic structure is modeled with a classical F-model, and the group themselves are modeled with a higher-level F-model. One advantage is that the Beaumont and Balding (2004) decomposition of Fst as population- and locus-specific effects can be done in each group separately as well as between groups. This allows the identification of selection at different levels: within a specific group of populations, or at a higher level (among groups). Here again, an interesting question is to identify loci responding similarly to selection in several groups. In order to look at that particular case, we explicitly included a convergent selection model were at any given locus all groups share the same locus-specific effect. Posterior probabilities of all possible models of selection are then evaluated using a Reversible Jump MCMC algorithm.

Adaptation to high altitude

We applied this new method to the very interesting case of high altitude adaptation in humans. We reanalyzed a published large SNP dataset (Bigham et al. 2010) including two populations living at high altitude in the Andes and in Tibet, as well as two lowland related populations from Central-America and East Asia. One of the most striking results we find is that convergent selection is much more common than previously found based on separate analyses in the two continents. We checked with simulations that this was in fact expected: being able to analyze the four populations together is indeed more powerful than performing two separate pairwise tests. In addition to confirming several known candidate genes and biological processes involved in high altitude adaptation, we were able to identify additional new genes and processes under convergent selection. In particular, we were very excited to find two specific biological pathways that could have evolved to counter the toxic levels of fatty acids and the neuronal excitotoxicity induced by hypoxia in both continents. Interestingly, several genes included in these pathways had been identified in high altitude Ethiopians (Scheinfeldt et al. 2012; Alkorta-Aranburu et al. 2012; Huerta-Sánchez et al. 2013), suggesting that these pathways could represent a striking example of convergent adaptation in three continents.

Conclusion

Our hierarchical F-model appears very flexible and can cope with a variety of sampling strategies to identify adaptation. Whereas we have considered only two groups of two populations in our paper, it is worth noting that our method can handle more than two groups and more than two populations per group. An alternative sampling scheme to detect selection could for instance to contrast several genetically related high altitude populations to several related lowland populations (see e.g. Pagani et al. 2011). Our method could also deal with such a sampling scheme, but this time, one would focus on the decomposition of the genetic differentiation between the groups (i.e. Fct). In summary, our approach allows the simultaneous analysis of populations living in contrasting environments in several geographic regions. It can be used to specifically test for convergent adaptation, and this approach is more powerful than previous methods contrasting pairs of populations separately.

Matthieu Foll and Laurent Excoffier

References

Alkorta-Aranburu, G., C. M. Beall, D. B. Witonsky, A. Gebremedhin, et al., 2012 The genetic architecture of adaptations to high altitude in ethiopia. PLoS Genet 8: e1003110.

Beaumont, M. A., and R. A. Nichols, 1996 Evaluating Loci for Use in the Genetic Analysis of Population Structure. Proc Biol Sci 263: 1619-1626.

Beaumont, M. A., and D. J. Balding, 2004 Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol 13: 969-980.

Bigham, A., M. Bauchet, D. Pinto, X. Y. Mao, et al., 2010 Identifying Signatures of Natural Selection in Tibetan and Andean Populations Using Dense Genome Scan Data. PLoS Genet 6: e1001116.

Bonhomme, M., C. Chevalet, B. Servin, S. Boitard, et al., 2010 Detecting selection in population trees: the Lewontin and Krakauer test extended. Genetics 186: 241-262.

Coop, G., D. Witonsky, A. Di Rienzo, and J. K. Pritchard, 2010 Using environmental correlations to identify loci underlying local adaptation. Genetics 185: 1411-1423.

Excoffier, L., T. Hofer, and M. Foll, 2009 Detecting loci under selection in a hierarchically structured population. Heredity (Edinb) 103: 285-298.

Excoffier, L., and H. E. Lischer, 2010 Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10: 564-567.

Fariello, M. I., S. Boitard, H. Naya, M. SanCristobal, and B. Servin, 2013 Detecting signatures of selection through haplotype differentiation among hierarchically structured populations. Genetics 193: 929-941.

Foll, M., and O. Gaggiotti, 2008 A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics 180: 977-993.

Günther, T., and G. Coop, 2013 Robust identification of local adaptation from allele frequencies. Genetics 195: 205-220.

Hermisson, J., 2009 Who believes in whole-genome scans for selection? Heredity (Edinb) 103: 283-284.

Huerta-Sánchez, E., M. Degiorgio, L. Pagani, A. Tarekegn, et al., 2013 Genetic signatures reveal high-altitude adaptation in a set of ethiopian populations. Mol Biol Evol 30: 1877-1888.

Lewontin, R. C., and J. Krakauer, 1973 Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74: 175-195.

Nosil, P., S. P. Egan, and D. J. Funk, 2008 Heterogeneous genomic differentiation between walking-stick ecotypes: “isolation by adaptation” and multiple roles for divergent selection. Evolution 62: 316-336.

Pagani, L., Q. Ayub, D. G. Macarthur, Y. Xue, et al., 2011 High altitude adaptation in Daghestani populations from the Caucasus. Hum Genet 131: 423-433.

Paris, M., S. Boyer, A. Bonin, A. Collado, et al., 2010 Genome scan in the mosquito Aedes rusticus: population structure and detection of positive selection after insecticide treatment. Mol Ecol 19: 325-337.

Scheinfeldt, L. B., S. Soi, S. Thompson, A. Ranciaro, et al., 2012 Genetic adaptation to high altitude in the Ethiopian highlands. Genome Biol 13: R1.

Vitalis, R., K. Dawson, and P. Boursot, 2001 Interpretation of variation across marker loci as evidence of selection. Genetics 158: 1811-1823.