Extensive translation of small ORFs revealed by polysomal ribo-Seq

Extensive translation of small ORFs revealed by polysomal ribo-Seq

Julie L Aspden, Ying Chen Eyre-Walker, Rose J. Phillips, Michele Brocard, Unum Amin, Juan Couso

Thousands of small Open Reading Frames (smORFs) encoding small peptides of fewer than 100 amino acids exist in our genomes. Examples of functional smORFs have been characterised in a few species but the actual number of translated smORFs, and their molecular, functional and evolutionary features are not known. Here we present a genome-wide assessment of smORF translation by ribosomal profiling of polysomal fractions. This ‘polysomal ribo-Seq’ suggests that smORFs are translated at the same level and in the same relative numbers (80%) as normal proteins. The smORF peptides appear widely conserved, show activity in cells, and display a putative amino acid signature. These findings reinforce the idea that smORFs are an abundant and fundamental genome component, displaying features usually attributed to canonical proteins, including high translation levels, biological function, amino acid sequence specificity and cross-species conservation.

Author post: Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations

This guest post is by Matthieu Foll and Laurent Excoffier on their preprint (with co-authors) Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations, arXived here.

Background

Since the seminal paper of Lewontin and Krakauer (1973), Fst-based genome scan methods had to struggle with the confounding effect of population structure. These methods started to be very popular with the FDIST software implemented by Beaumont and Nichols (1996), which was based on an island model. At that time it was proposed that the island model was robust to different demographic scenario (recent divergence and growth, isolation by distance or heterogeneous levels of gene flow between populations). A Bayesian version of this model (generally called the F-model) in which populations can receive unequal number of migrants has then been proposed (Beaumont and Balding 2004; Foll and Gaggiotti 2008), and implemented in the BayeScan software (http://cmpg.unibe.ch/software/BayeScan/), which is now quite widely used.

However, all these models assume that migrant genes originate from a unique and common migrant pool. We started to realize that this assumption could lead to a massive amount of false positive when we tried to analyze the HGDP data, where this assumption was clearly not supported. To overcome this problem, we proposed an extension of Beaumont and Nichols’s (1996) based on a hierarchical island model (Excoffier et al. 2009) in which populations were assigned to different groups or regions. An island model was assumed in each group, and the group themselves were assumed to follow an island model. This new method was then implemented in Arlequin (Excoffier and Lischer 2010).

Note that alternative ways to deal with complex genetic structure have also been proposed (Coop et al. 2010; Bonhomme et al. 2010; Fariello et al. 2013; Günther and Coop 2013), but the main message people took from these papers was that methods aiming at identifying loci under selection can be quite sensitive to some hidden (or unaccounted) population structure and should be used with caution. Hermisson (2009) even rather provocatively asked: “Who believes in whole-genome scans for selection?”

One radical way to deal with the problem of complex genetic structure is to reduce the number of sampled populations to just two (Vitalis et al. 2001). This leads to a GWAS-like strategy where people sample two populations living in contrasting environments (playing the role of cases and controls in GWAS) with potentially different selection pressures. However, other problems occur when doing so: (i) having only two populations leads to a reduction in power, (ii) related to this first point, one generally needs to sample a larger number of individuals to have sufficient power, (iii) the comparison of results obtained from different pairs of populations can be problematic, especially when one is interested in detecting convergent selection by looking at the overlap in lists of candidate genes. In the last few years, studies comparing pairs of populations living in different environments have accumulated. Typically, each pairwise comparison produced a set of candidate loci, and people often used some informal criterion to identify “repeated outliers” based on the number of times they were identified in the different tests performed (see Nosil et al. 2008 or Paris et al. 2010 for example).

A new hierarchical F-model

We started to think that the introduction of a Bayesian F-model dealing with a hierarchical population structure could solve some of these problems. We therefore introduced a hierarchical F-model where populations are assigned to different groups. In each group the genetic structure is modeled with a classical F-model, and the group themselves are modeled with a higher-level F-model. One advantage is that the Beaumont and Balding (2004) decomposition of Fst as population- and locus-specific effects can be done in each group separately as well as between groups. This allows the identification of selection at different levels: within a specific group of populations, or at a higher level (among groups). Here again, an interesting question is to identify loci responding similarly to selection in several groups. In order to look at that particular case, we explicitly included a convergent selection model were at any given locus all groups share the same locus-specific effect. Posterior probabilities of all possible models of selection are then evaluated using a Reversible Jump MCMC algorithm.

Adaptation to high altitude

We applied this new method to the very interesting case of high altitude adaptation in humans. We reanalyzed a published large SNP dataset (Bigham et al. 2010) including two populations living at high altitude in the Andes and in Tibet, as well as two lowland related populations from Central-America and East Asia. One of the most striking results we find is that convergent selection is much more common than previously found based on separate analyses in the two continents. We checked with simulations that this was in fact expected: being able to analyze the four populations together is indeed more powerful than performing two separate pairwise tests. In addition to confirming several known candidate genes and biological processes involved in high altitude adaptation, we were able to identify additional new genes and processes under convergent selection. In particular, we were very excited to find two specific biological pathways that could have evolved to counter the toxic levels of fatty acids and the neuronal excitotoxicity induced by hypoxia in both continents. Interestingly, several genes included in these pathways had been identified in high altitude Ethiopians (Scheinfeldt et al. 2012; Alkorta-Aranburu et al. 2012; Huerta-Sánchez et al. 2013), suggesting that these pathways could represent a striking example of convergent adaptation in three continents.

Conclusion

Our hierarchical F-model appears very flexible and can cope with a variety of sampling strategies to identify adaptation. Whereas we have considered only two groups of two populations in our paper, it is worth noting that our method can handle more than two groups and more than two populations per group. An alternative sampling scheme to detect selection could for instance to contrast several genetically related high altitude populations to several related lowland populations (see e.g. Pagani et al. 2011). Our method could also deal with such a sampling scheme, but this time, one would focus on the decomposition of the genetic differentiation between the groups (i.e. Fct). In summary, our approach allows the simultaneous analysis of populations living in contrasting environments in several geographic regions. It can be used to specifically test for convergent adaptation, and this approach is more powerful than previous methods contrasting pairs of populations separately.

Matthieu Foll and Laurent Excoffier

References

Alkorta-Aranburu, G., C. M. Beall, D. B. Witonsky, A. Gebremedhin, et al., 2012 The genetic architecture of adaptations to high altitude in ethiopia. PLoS Genet 8: e1003110.

Beaumont, M. A., and R. A. Nichols, 1996 Evaluating Loci for Use in the Genetic Analysis of Population Structure. Proc Biol Sci 263: 1619-1626.

Beaumont, M. A., and D. J. Balding, 2004 Identifying adaptive genetic divergence among populations from genome scans. Mol Ecol 13: 969-980.

Bigham, A., M. Bauchet, D. Pinto, X. Y. Mao, et al., 2010 Identifying Signatures of Natural Selection in Tibetan and Andean Populations Using Dense Genome Scan Data. PLoS Genet 6: e1001116.

Bonhomme, M., C. Chevalet, B. Servin, S. Boitard, et al., 2010 Detecting selection in population trees: the Lewontin and Krakauer test extended. Genetics 186: 241-262.

Coop, G., D. Witonsky, A. Di Rienzo, and J. K. Pritchard, 2010 Using environmental correlations to identify loci underlying local adaptation. Genetics 185: 1411-1423.

Excoffier, L., T. Hofer, and M. Foll, 2009 Detecting loci under selection in a hierarchically structured population. Heredity (Edinb) 103: 285-298.

Excoffier, L., and H. E. Lischer, 2010 Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10: 564-567.

Fariello, M. I., S. Boitard, H. Naya, M. SanCristobal, and B. Servin, 2013 Detecting signatures of selection through haplotype differentiation among hierarchically structured populations. Genetics 193: 929-941.

Foll, M., and O. Gaggiotti, 2008 A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics 180: 977-993.

Günther, T., and G. Coop, 2013 Robust identification of local adaptation from allele frequencies. Genetics 195: 205-220.

Hermisson, J., 2009 Who believes in whole-genome scans for selection? Heredity (Edinb) 103: 283-284.

Huerta-Sánchez, E., M. Degiorgio, L. Pagani, A. Tarekegn, et al., 2013 Genetic signatures reveal high-altitude adaptation in a set of ethiopian populations. Mol Biol Evol 30: 1877-1888.

Lewontin, R. C., and J. Krakauer, 1973 Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74: 175-195.

Nosil, P., S. P. Egan, and D. J. Funk, 2008 Heterogeneous genomic differentiation between walking-stick ecotypes: “isolation by adaptation” and multiple roles for divergent selection. Evolution 62: 316-336.

Pagani, L., Q. Ayub, D. G. Macarthur, Y. Xue, et al., 2011 High altitude adaptation in Daghestani populations from the Caucasus. Hum Genet 131: 423-433.

Paris, M., S. Boyer, A. Bonin, A. Collado, et al., 2010 Genome scan in the mosquito Aedes rusticus: population structure and detection of positive selection after insecticide treatment. Mol Ecol 19: 325-337.

Scheinfeldt, L. B., S. Soi, S. Thompson, A. Ranciaro, et al., 2012 Genetic adaptation to high altitude in the Ethiopian highlands. Genome Biol 13: R1.

Vitalis, R., K. Dawson, and P. Boursot, 2001 Interpretation of variation across marker loci as evidence of selection. Genetics 158: 1811-1823.

Estimating the evolution of human life history traits in age-structured populations

Estimating the evolution of human life history traits in age-structured populations

Ryan Baldini

I propose a method that estimates the selection response of all vital rates in an age-structured population. I assume that vital rates are determined by the additive genetic contributions of many loci. The method uses all relatedness information in the sample to inform its estimates of genetic parameters, via an MCMC Bayesian framework. One can use the results to estimate the selection response of any life history trait that is a function of the vital rates, including the age at first reproduction, total lifetime fertility, survival to adulthood, and others. This method closely ties the empirical analysis of life history evolution to dynamically complete models of natural selection, and therefore enjoys some theoretical advantages over other methods. I demonstrate the method on a simulated model of evolution with two age classes. Finally I discuss how the method can be extended to more complicated cases.

Mapping eQTL networks with mixed graphical Markov models

Mapping eQTL networks with mixed graphical Markov models

Inma Tur, Alberto Roverato, Robert Castelo
(Submitted on 19 Feb 2014 (v1), last revised 29 Oct 2014 (this version, v5))

Expression quantitative trait loci (eQTL) mapping constitutes a challenging problem due to, among other reasons, the high-dimensional multivariate nature of gene-expression traits. Next to the expression heterogeneity produced by confounding factors and other sources of unwanted variation, indirect effects spread throughout genes as a result of genetic, molecular and environmental perturbations. From a multivariate perspective one would like to adjust for the effect of all of these factors to end up with a network of direct associations connecting the path from genotype to phenotype. In this paper we approach this challenge with mixed graphical Markov models, higher-order conditional independences and q-order correlation graphs. These models show that additive genetic effects propagate through the network as function of gene-gene correlations. Our estimation of the eQTL network underlying a well-studied yeast data set leads to a sparse structure with more direct genetic and regulatory associations that enable a straightforward comparison of the genetic control of gene expression across chromosomes. Interestingly, it also reveals that eQTLs explain most of the expression variability of network hub genes.

Migration and interaction in a contact zone: mtDNA variation among Bantu-speakers in southern Africa

Migration and interaction in a contact zone: mtDNA variation among Bantu-speakers in southern Africa

Chiara Barbieri, Mário Vicente, Sandra Oliveira, Koen Bostoen, Jorge Rocha, Mark Stoneking, Brigitte Pakendorf

Bantu speech communities expanded over large parts of sub-Saharan Africa within the last 4000-5000 years, reaching different parts of southern Africa 1200-2000 years ago. The Bantu languages subdivide in several major branches, with languages belonging to the Eastern and Western Bantu branches spreading over large parts of Central, Eastern, and Southern Africa. There is still debate whether this linguistic divide is correlated with a genetic distinction between Eastern and Western Bantu speakers. During their expansion, Bantu speakers would have come into contact with diverse local populations, such as the Khoisan hunter-gatherers and pastoralists of southern Africa, with whom they may have intermarried. In this study, we analyze complete mtDNA genome sequences from over 900 Bantu-speaking individuals from Angola, Zambia, Namibia, and Botswana to investigate the demographic processes at play during the last stages of the Bantu expansion. Our results show that most of these Bantu-speaking populations are genetically very homogenous, with no genetic division between speakers of Eastern and Western Bantu languages. Most of the mtDNA diversity in our dataset is due to different degrees of admixture with autochthonous populations. Only the pastoralist Himba and Herero stand out due to high frequencies of particular L3f and L3d lineages; the latter are also found in the neighboring Damara, who speak a Khoisan language and were foragers and small-stock herders. In contrast, the close cultural and linguistic relatives of the Herero and Himba, the Kuvale, are genetically similar to other Bantu-speakers. Nevertheless, as demonstrated by resampling tests, the genetic divergence of Herero, Himba, and Kuvale is compatible with a common shared ancestry with high levels of drift and differential female admixture with local pre-Bantu populations.

Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations

Hierarchical Bayesian model of population structure reveals convergent adaptation to high altitude in human populations

Matthieu Foll, Oscar E. Gaggiotti, Josephine T. Daub, Laurent Excoffier
(Submitted on 18 Feb 2014)

Detecting genes involved in local adaptation is challenging and of fundamental importance in evolutionary, quantitative, and medical genetics. To this aim, a standard strategy is to perform genome scans in populations of different origins and environments, looking for genomic regions of high differentiation. Because shared population history or population sub-structure may lead to an excess of false positives, analyses are often done on multiple pairs of populations, which leads to i) a global loss of power as compared to a global analysis, and ii) the need for multiple tests corrections. In order to alleviate these problems, we introduce a new hierarchical Bayesian method to detect markers under selection that can deal with complex demographic histories, where sampled populations share part of their history. Simulations show that our approach is both more powerful and less prone to false positive loci than approaches based on separate analyses of pairs of populations or those ignoring existing complex structures. In addition, our method can identify selection occurring at different levels (i.e. population or region-specific adaptation), as well as convergent selection in different regions. We apply our approach to the analysis of a large SNP dataset from low- and high-altitude human populations from America and Asia. The simultaneous analysis of these two geographic areas allows us to identify several new candidate genome regions for altitudinal selection, and we show that convergent evolution among continents has been quite common. In addition to identifying several genes and biological processes involved in high altitude adaptation, we identify two specific biological pathways that could have evolved in both continents to counter toxic effects induced by hypoxia.

Tracing evolutionary links between species

Tracing evolutionary links between species

Mike Steel
(Submitted on 16 Feb 2014)

The idea that all life on earth traces back to a common beginning dates back at least to Charles Darwin’s {\em Origin of Species}. Ever since, biologists have tried to piece together parts of this `tree of life’ based on what we can observe today: fossils, and the evolutionary signal that is present in the genomes and phenotypes of different organisms. Mathematics has played a key role in helping transform genetic data into phylogenetic (evolutionary) trees and networks. Here, I will explain some of the central concepts and basic results in phylogenetics, which benefit from several branches of mathematics, including combinatorics, probability and algebra.

Evolutionary rates for multivariate traits: the role of selection and genetic variation

Evolutionary rates for multivariate traits: the role of selection and genetic variation

William Pitchers, Jason B. Wolf, Tom Tregenza, John Hunt, Ian Dworkin

A fundamental question in evolutionary biology is the relative importance of selection and genetic architecture in determining evolutionary rates. Adaptive evolution can be described by the multivariate breeders’ equation (Δz = Gβ ), which predicts evolutionary change for a suite of phenotypic traits (Δz ) as a product of directional selection acting on them (β) and the genetic variance-covariance matrix for those traits (G). Despite being empirically challenging to estimate, there are enough published estimates of G and β to allow for synthesis of general patterns across species. We use published estimates to test the hypotheses that there are systematic differences in the rate of evolution among trait types, and that these differences are in part due to genetic architecture. We find evidence that sexually selected traits exhibit faster rates of evolution compared to life-history or morphological traits. This difference does not appear to be related to stronger selection on sexually selected traits. Using numerous proposed approaches to quantifying the shape, size and structure of G we examine how these parameters relate to one another, and how they vary among taxonomic and trait groupings. Despite considerable variation, they do not explain the observed differences in evolutionary rates.

A Novel Approach for Multi-Domain and Multi-Gene Family Identification Provides Insights into Evolutionary Dynamics of Disease Resistance Genes in Core Eudicot Plants

A Novel Approach for Multi-Domain and Multi-Gene Family Identification Provides Insights into Evolutionary Dynamics of Disease Resistance Genes in Core Eudicot Plants

Johannes A. Hofberger, Beifei Zhou, Haibao Tang, Jonathan DG Jones, M. Eric Schranz

Recent advances in DNA sequencing techniques resulted in more than forty sequenced plant genomes representing a diverse set of taxa of agricultural, energy, medicinal and ecological importance. However, gene family curation is often only inferred from DNA sequence homology and lacks insights into evolutionary processes contributing to gene family dynamics. In a comparative genomics framework, we integrated multiple lines of evidence provided by gene synteny, sequence homology and protein-based Hidden Markov Modelling to extract homologous super-clusters composed of multi-domain resistance (R)-proteins of the NB-LRR type (for NUCLEOTIDE BINDING/LEUCINE-RICH REPEATS), that are involved in plant innate immunity. To assess the diversity of R-proteins within and between species, we screened twelve eudicot plant genomes including six major crops and found a total of 2,363 NB-LRR genes. Our curated R-proteins set shows a 50% average for tandem duplicates and a 22% fraction of gene copies retained from ancient polyploidy events (ohnologs). We provide evidence for strong positive selection acting on all identified genes and show significant differences in molecular evolution rates (Ka/Ks-ratio) among tandem- (mean=1.59), ohnolog (mean=1.36) and singleton (mean=1.22) R-gene duplicates. To foster the process of gene-edited plant breeding, we report species-specific presence/absence of all 140 NB-LRR genes present in the model plant Arabidopsis and describe four distinct clusters of NB-LRR ?gatekeeper? loci sharing syntelogs across all analyzed genomes. In summary, we designed and implemented an easy-to-follow computational framework for super-gene family identification, and provide the most curated set of NB-LRR genes whose genetic versatility among twelve lineages can underpin crop improvement.

Cell specific eQTL analysis without sorting cells

Cell specific eQTL analysis without sorting cells

Harm-Jan Westra, Danny Arends, Tõnu Esko, Marjolein J. Peters, Claudia Schurmann, Katharina Schramm, Johannes Kettunen, Hanieh Yaghootkar, Benjamin Fairfax, Anand Kumar Andiappan, Yang Li, Jingyuan Fu, Juha Karjalainen, Mathieu Platteel, Marijn Visschedijk, Rinse Weersma, Silva Kasela, Lili Milani, Liina Tserel, Pärt Peterson, Eva Reinmaa, Albert Hofman, André G. Uitterlinden, Fernando Rivadeneira, Georg Homuth, Astrid Petersmann, Roberto Lorbeer, Holger Prokisch, Thomas Meitinger, Christian Herder, Michael Roden, Harald Grallert, Samuli Ripatti, Markus Perola, Adrew R. Wood, David Melzer, Luigi Ferrucci, Andrew B. Singleton, Dena G. Hernandez, Julian C. Knight, Rossella Melchiotti, Bernett Lee, Michael Poidinger, Francesca Zolezzi, Anis Larbi, De Yun Wang, Leonard H. van den Berg, Jan H. Veldink, Olaf Rotzschke, Seiko Makino, Timouthy Frayling, Veikko Salomaa, Konstantin Strauch, Uwe Völker, Joyce B.J. van Meurs, Andres Metspalu, Cisca Wijmenga, Ritsert C. Jansen, Lude Franke

Expression quantitative trait locus (eQTL) mapping on tissue, organ or whole organism data can detect associations that are generic across cell types. We describe a new method to focus upon specific cell types without first needing to sort cells. We applied the method to whole blood data from 5,683 samples and demonstrate that SNPs associated with Crohn’s disease preferentially affect gene expression within neutrophils.