Individual taste differences were first reported in the first half of the 20th century, but the primary reasons for these differences have remained uncertain. Much of the taste variation among different mammalian species can be explained by pseudogenization of taste receptors. In this study, by analyzing 14 ethnically diverse populations, we investigated whether the most recent disruptions of taste receptor genes segregate with their intact forms. Our results revealed an unprecedented prevalence of segregating loss-of-function (LoF) taste receptor variants, identifying one of the most pronounced cases of functional population diversity in the human genome. LoF variant frequency was considerably higher than the overall mutation rate, and many humans harbored varying numbers of critical mutations. In particular, molecular evolutionary rates of sour and bitter receptors were far higher in humans than those of sweet, salty, and umami receptors compared with other carnivorous mammals although not all of the taste receptors genes were identified. Many LoF variants are population-specific, some of which arose even after the population differentiation, but not before divergence of the modern and archaic (Neanderthal and Denisovan) human. Based on these findings, we conclude that modern humans might have been losing their taste receptor genes because of high-frequency LoF taste receptor variants. Finally I actually demonstrated the genetic testing of taste receptors from personal exome sequence.
Aditi Gupta, Christoph Adami
(Submitted on 12 Aug 2014)
The human immuno-deficiency virus sub-type 1 (HIV-1) is evolving to keep up with a changing fitness landscape, due to the various drugs introduced to stop the virus’s replication. As the virus adapts, the information the virus encodes about its environment must change, and this change is reflected in the amino-acid composition of proteins, as well as changes in viral RNAs, binding sites, and splice sites. Information can also be encoded in the interaction between residues in a single protein as well as across proteins, leading to a change in the epistatic patterns that can affect how the virus can change in the future. Measuring epistasis usually requires fitness measurements that are difficult to obtain in high-throughput. Here we show that epistasis can be inferred from the pair-wise information between residues, and study how epistasis and information have changed over the long-term. Using HIV-1 protease sequence data from public databases covering the years 1998-2006 (from both treated and untreated subjects), we show that drug treatment has increased the protease’s per-site entropies on average. At the same time, the sum of mutual entropies across all pairs of residues within the protease shows a significant increase over the years, indicating an increase in epistasis in response to treatment, a trend not seen within sequences from untreated subjects. Our findings suggest that information theory can be an important tool to study long-term trends in the evolution of macromolecules.
V genes in primates from whole genome shotgun data
David N Olivieri, Francisco Gambon-Deza
The adaptive immune system uses V genes for antigen recognition. The evolutionary diversification and selection processes within and across species and orders are poorly understood. Here, we studied the amino acid (AA) sequences obtained of translated in-frame V exons of immunoglobulins (IG) and T cell receptors (TR) from 16 primate species whose genomes have been sequenced. Multi-species comparative analysis supports the hypothesis that V genes in the IG loci undergo birth/death processes, thereby permitting rapid adaptability over evolutionary time. We also show that multiple cladistic groupings exist in the TRA (35 clades) and TRB (25 clades) V gene loci and that each primate species typically contributes at least one V gene to each of these clade. The results demonstrate that IG V genes and TR V genes have quite different evolutionary pathways; multiple duplications can explain the IG loci results, while co-evolutionary pressures can explain the phylogenetic results, as seen in genes of the TR loci. We describe how each of the 35 V genes clades of the TRA locus and 25 clades of the TRB locus must have specific and necessary roles for the viability of the species.
This guest post by Richard Neher discusses his preprint Predicting evolution from the shape of genealogical trees. Richard A. Neher, Colin A. Russell, Boris I. Shraiman. arXived here. This is cross-posted from the Neher lab website.
In this preprint — a collaboration with Colin Russell and Boris Shraiman — we show that it is possible to predict which individual from a population is most closely related to future populations. To this end, we have developed a method that uses the branching pattern of genealogical trees to estimate which part of the tree contains the “fittest” sequences, where fit means rapidly multiplying. Those that multiply rapidly, are most likely to take over the population. We demonstrate the power of our method by predicting the evolution of seasonal influenza viruses.
How does it work?
Individuals adapt to a changing environment by accumulating beneficial mutations, while avoiding deleterious mutations. We model this process assuming that there are many such mutations which change fitness in small increments. Using this model, we calculate the probability that an individual that lived in the past at time t leaves n descendants in the present. This distributions depends critically on the fitness of the ancestral individual. We then extend this calculation to the probability of observing a certain branch in a genealogical tree reconstructed from a sample of sequences. A branch in a tree connects an individual A that lived at time tA and had fitness xA and with an individual B that lived at a later time tB with fitness xB as illustrated in the figure. B has descendants in the sample, otherwise the branch would not be part of the tree. Furthermore, all sampled descendants of A are also descendants of B, otherwise the connection between A and B would have branched between tA and tB. We call the mathematical object describing fitness evolution between A and B “branch propagator” and denote it by g(xB,tB|xA,tA). The joint probability distribution of fitness values of all nodes of the tree is given by a product of branch propagators. We then calculate the expected fitness of each node and use it to rank the sampled sequences. The top ranked sequence is our prediction for the sequence of the progenitor of the future population.
Why do we care?
Being able to predict evolution could have immediate applications. The best example is the seasonal influenza vaccine, that needs to be updated frequently to keep up with the evolving virus. Vaccine strains are chosen among sampled virus strains, and the more closely this strain matches the future influenza virus population, the better the vaccine is going to be. Hence by predicting a likely progenitor of the future, our method could help to improve influenza vaccines. One of our predictions is shown in the figure, with the top ranked sequence marked by a black arrow. Influenza is not the only possible application. Since the algorithm only requires a reconstructed tree as input, it can be applied to other rapidly evolving pathogens or cancer cell populations. In addition, to being useful, the ability to predict also implies that the model captures an essential aspect of evolutionary dynamics: influenza evolution is to a substantial degree — enough to enable prediction — dependent on the accumulation of small effect mutations.
Comparison to other approaches
Given the importance of good influenza vaccines, there has been a number of previous efforts to anticipate influenza virus evolution, typically based on using patterns of molecular evolution from historical data. Along these lines, Luksza and Lässig have recently presented an explicit fitness model for influenza virus evolution that rewards mutations at positions known to convey antigenic novelty and penalizes likely deleterious mutations (+a few other things). By using molecular influenza specific signatures, this model is complementary to ours that uses only the tree reconstructed from nucleotide sequences. Interestingly, the two models do more or less equally well and combining different methods of prediction should result in more reliable results.
Sequence co-evolution gives 3D contacts and structures of protein complexes
Thomas A. Hopf, Charlotta P.I. Schärfe, João P.G.L.M. Rodrigues, Anna G. Green, Chris Sander, Alexandre M.J.J. Bonvin, Debora S. Marks
High-throughput experiments in bacteria and eukaryotic cells have identified tens of thousands of interactions between proteins. This genome-wide view of the protein interaction universe is coarse-grained, whilst fine-grained detail of macro-molecular interactions critically depends on lower throughput, labor-intensive experiments. Computational approaches using measures of residue co-evolution across proteins show promise, but have been limited to specific interactions. Here we present a new generalized method showing that patterns of evolutionary sequence changes across proteins reflect residues that are close in space, with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We demonstrate that the inferred evolutionary coupling scores accurately predict inter-protein residue interactions and can distinguish between interacting and non-interacting proteins. To illustrate the utility of the method, we predict co-evolved contacts between 50 E. coli complexes (of unknown structure), including the unknown 3D interactions between subunits of ATP synthase and find results consistent with detailed experimental data. We expect that the method can be generalized to genome-wide interaction predictions at residue resolution.
This guest post is by Rebekah Rogers (@evolscientist) on her paper with coauthors “Tandem duplications and the limits of natural selection in Drosophila yakuba and Drosophila simulans” arXived here.
Tandem duplications are widely recognized as a source of genetic novelty. Duplication of gene sequences can result in adaptive evolution through the development of novel functions or specialization in subsets of ancestral functions when ‘spare parts’ are relieved of evolutionary constraints. Additionally, tandem duplications have the potential to create entirely novel gene structures through chimeric gene formation and recruitment of formerly non-coding sequence. Here, we survey the limits of standing variation for tandem duplications in natural populations of D. yakuba and D. simulans, estimate the upper bound of mutation rates, and explore their role in rapid evolution.
Tandem duplicates on the X chromosome in D. simulans show an excess of high frequency variants consistent with adaptive evolution through tandem duplication. Furthermore, we identify an overrepresentation of genes involved in rapidly evolving phenotypes such as chorion development and oogenesis, drug and toxin metabolism, chitin cuticle formation, chemosensory processes, lipases and endopeptidases expressed in male reproduction, as well as immune response to pathogens in both D. yakuba and D. simulans. The enrichment of such rapidly evolving functional classes points to a role for tandem duplicates in Red Queen dynamics and responses to strong selective pressures.
In spite of the observed concordance across functional classes we observe few duplicated genes that are shared across species indicating that parallel recruitment of tandem duplications is rare. The span of duplicates in the population is quite limited, and we estimate that less than 15% of the genome is represented among the tandem duplications segregating in the entire population for the species. Moreover, many duplicates are present at low frequency and will have difficulty escaping the forces of drift during selective sweeps. This very limited standing variation combined with low mutation rates for tandem duplications results in severe limitations in the substrate of genetic novelty that is available for adaptation.
Thus, the limits of standing variation and the rate of new mutations are expected to play a vital role in defining evolutionary trajectories and the ability of organisms to adapt in the event of gross environmental change. Given the limited substrate of genetic novelty, we expect that if adaptation is dependent upon gene duplications, suboptimal outcomes in adaptive walks will be common, long wait times will occur for new phenotypic changes, and many multicellular eukaryotes will display limited ability to adapt to rapidly changing environments.
All modern approaches to molecular phylogenetics require a quantitative model for how genes evolve. Unfortunately, existing evolutionary models do not realistically represent the site-heterogeneous selection that governs actual sequence change. Attempts to remedy this problem have involved augmenting these models with a burgeoning number of free parameters. Here I demonstrate an alternative: experimental determination of a parameter-free evolutionary model via mutagenesis, functional selection, and deep sequencing. Using this strategy, I create an evolutionary model for influenza nucleoprotein that describes the gene phylogeny far better than existing models with dozens or even hundreds of free parameters. High-throughput experimental strategies such as the one employed here provide fundamentally new information that has the potential to transform the sensitivity of phylogenetic analyses.