A network approach to analyzing highly recombinant malaria parasite genes

A network approach to analyzing highly recombinant malaria parasite genes
Daniel B. Larremore, Aaron Clauset, Caroline O. Buckee
(Submitted on 23 Aug 2013)

The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-{\alpha} (DBL{\alpha}) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBL{\alpha} classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.

Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model

Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model
Denise K├╝hnert, Tanja Stadler, Timothy G. Vaughan, Alexei J. Drummond
(Submitted on 23 Aug 2013)

Evolution of RNA viruses such as HIV, Hepatitis C and Influenza virus occurs so rapidly that the viruses’ genomes contain information on past ecological dynamics. The interaction of ecological and evolutionary processes demands their joint analysis. Here we adapt a birth-death-sampling model, which allows for serially sampled data and rate changes over time to estimate epidemiological parameters of the underlying population dynamics in terms of a compartmental susceptible-infected-removed (SIR) model. Our proposed approach results in a phylodynamic method that enables the joint estimation of epidemiological parameters and phylogenetic history. In contrast to standard coalescent process approaches this method provides separate information on incidence and prevalence of infections. Detailed information on the interaction of host population dynamics and evolutionary history can inform decisions on how to contain or entirely avoid disease outbreaks.
We apply our Birth-Death SIR method (BDSIR) to five human immunodeficiency virus type 1 clusters sampled in the United Kingdom (UK) between 1999 and 2003. The estimated basic reproduction ratio ranges from 1.9 to 3.2 among the clusters. Our results imply that these local epidemics arose from introduction of infected individuals into populations of between 900 and 3000 effectively susceptible individuals, albeit with wide margins of uncertainty. All clusters show a decline in the growth rate of the local epidemic in the middle or end of the 90’s. The effective reproduction ratio of cluster 1 drops below one around 1994, with the local epidemic having almost run its course by the end of the sampled period. For the other four clusters the effective reproduction ratio also decreases over time, but stays above 1. The method is implemented as a BEAST2 package.

Our paper: Inferring HIV escape rates from multi-locus genotype data

This guest post is by Richard Neher on his paper with Taylor Kessinger and Alan Perelson: Kessinger et al. Inferring HIV escape rates from multi-locus genotype data. arXived here.
This is cross posted from the Neher lab website.

We have a new preprint on the arXiv (here on Haldane’s sieve). This work is the result of a collaboration between us and Alan Perelson, LANL, and explores methods to estimate parameters of the HIV-immune system interaction from time resolved sequence data. The focus of this paper is on early infeImagection dominated by a few rapid substitutions that fix because they prevent or reduce recognition of infected cells by the immune system via cytotoxic T-lymphocytes (CTL). CTL escape is one of the fastest instances of evolution I have come across. 4-6 mutations spread within a few weeks. It happens in most HIV infections and is partly predictable based on the HLA genotype of the infected person. These substitutions are so rapid that clonal interference has to be modeled. Our method fits a reduced model of clonal interference to the typically very sparse data and thereby estimates the selection coefficients, aka escape rates.

Why do we want to know these numbers?
The number of viruses in the blood of an infected person peaks 2-3 weeks after infection and thereafter drops by 2-3 order of magnitude. This drop is partly due to a response by the adaptive immune system. However, it has proved difficult to attribute this drop to specific parts of the immune response. The rates at which different mutations sweep through the population gives us information about the pressure exerted by the T-cell clones that target the epitope containing this mutation.

How do we do it?
Early in infection, the viral population is large and selection is strong. In these conditions, recombination is of minor importance since most double/triple… mutants are more efficiently produced by recurrent mutation than recombination. This implies that mutations accumulate sequentially always on a background one which already all previous mutations are present. The time at which a novel mutation happens in tightly constrained by the trajectory of preceding genotype. These constraints regularize the fitting problem to some degree and the multi-locus fitting is more robust than single locus fitting.

What do we learn about evolution in general?
In addition to the intrinsic interest in the HIV/CTL interaction, CTL escape is an ideal setting to study rapidly evolving populations. This evolution happens in its “natural” habitat and the selective pressure as well as the functional consequences of the observed molecular changes can be quantified via immunological data, protein structure, and replication assays. In addition, we have ample cross-sectional data (HIV sequences from many different patients) that allows us to look at prevalence of the escape mutations and potential compensatory mutations. None of this is done in this paper, but studying HIV/immune-system coevolution is a fascinating show case of rapid evolution.

Inferring HIV escape rates from multi-locus genotype data

Inferring HIV escape rates from multi-locus genotype data
Taylor A. Kessinger, Alan S. Perelson, Richard A. Neher
(Submitted on 6 Aug 2013)

Cytotoxic T-lymphocytes (CTLs) recognize viral protein fragments displayed by major histocompatibility complex (MHC) molecules on the surface of virally infected cells and generate an anti-viral response that can kill the infected cells. Virus variants whose protein fragments are not efficiently presented on infected cells or whose fragments are presented but not recognized by CTLs therefore have a competitive advantage and spread rapidly through the population. We present a method that allows a more robust estimation of these escape rates from serially sampled sequence data. The proposed method accounts for competition between multiple escapes by explicitly modeling the accumulation of escape mutations and the stochastic effects of rare multiple mutants. Applying our method to serially sampled HIV sequence data, we estimate rates of HIV escape that are substantially larger than those previously reported. The method can be extended to complex escapes that require compensatory mutations. We expect our method to be applicable in other contexts such as cancer evolution where time series data is also available.

The genome of the medieval Black Death agent

The genome of the medieval Black Death agent (extended abstract)
Ashok Rajaraman, Eric Tannier, Cedric Chauve
(Submitted on 29 Jul 2013)

The genome of a 650 year old Yersinia pestis bacteria, responsible for the medieval Black Death, was recently sequenced and assembled into 2,105 contigs from the main chromosome. According to the point mutation record, the medieval bacteria could be an ancestor of most Yersinia pestis extant species, which opens the way to reconstructing the organization of these contigs using a comparative approach. We show that recent computational paleogenomics methods, aiming at reconstructing the organization of ancestral genomes from the comparison of extant genomes, can be used to correct, order and complete the contig set of the Black Death agent genome, providing a full chromosome sequence, at the nucleotide scale, of this ancient bacteria. This sequence suggests that a burst of mobile elements insertions predated the Black Death, leading to an exceptional genome plasticity and increase in rearrangement rate.

Speed of adaptation and genomic signatures in arms race and trench warfare models of host-parasite coevolution

Speed of adaptation and genomic signatures in arms race and trench warfare models of host-parasite coevolution
Aurelien Tellier, Stefany Moreno-Game, Wolfgang Stephan
(Submitted on 25 Jul 2013)

Host and parasite population genomic data are increasingly used to discover novel major genes underlying coevolution, assuming that natural selection generates two distinguishable polymorphism patterns: selective sweeps and balancing selection. These genomic signatures would result from two coevolutionary dynamics, the trench warfare with fast cycles of allele frequencies and the arms race with slow recurrent fixation of alleles. However, based on genome scans for selection, few genes for coevolution have yet been found in hosts. To address this issue, we build a gene-for-gene model with genetic drift, mutation and integrating coalescent simulations to study observable genomic signatures at host and parasite loci. In contrast to the conventional wisdom, we show that coevolutionary cycles are not faster under the trench warfare model compared to the arms race, except for large population sizes and high values of coevolutionary costs. Based on the generated SNP frequencies, the expected balancing selection signature under the trench warfare dynamics appears to be only observable in parasite sequences in a limited range of parameter, if effective population sizes are sufficiently large (>1000) and if selection has been acting for a long time (>4N generations). On the other hand, the typical signature of the arms race dynamics, i.e. selective sweeps, can be detected in parasite and to a lesser extent in host populations even if coevolution is recent. We suggest to study signatures of coevolution via population genomics of parasites rather than hosts, and caution against inferring coevolutionary dynamics based on the speed of coevolution.

Dynamic Transcript Profiling of Candida Albicans Infection in Zebrafish: a Pathogen-Host Interaction Study

Dynamic Transcript Profiling of Candida Albicans Infection in Zebrafish: a Pathogen-Host Interaction Study
Yan Yu Chen, Chun-Cheih Chao, Fu-Chen Liu, Po-Chen Hsu, Hsueh-Fen Chen, Shih-Chi Peng, Yung-Jen Chuang, Chung-Yu Lan, Wen-Ping Hsieh, David Shan Hill Wong
(Submitted on 14 Jun 2013)

Candida albicans is responsible for a number of life-threatening infections and causes considerable morbidity and mortality in immunocompromised patients. Previous studies of C. albicans pathogenesis have suggested several steps must occur before virulent infection, including early adhesion, invasion, and late tissue damage. However, the mechanism that triggers C. albicans transformation from yeast to hyphae form during infection has yet to be fully elucidated. This study used a systems biology approach to investigate C. albicans infection in zebrafish. The surviving fish were sampled at different post-infection time points to obtain time-lapsed, genome-wide transcriptomic data from both organisms, which were accompanied with in sync histological analyses. Principal component analysis (PCA) was used to analyze the dynamic gene expression profiles of significant variations in both C. albicans and zebrafish. The results categorized C. albicans infection into three progressing phases: adhesion, invasion, and damage. Such findings were highly supported by the corresponding histological analysis. Furthermore, the dynamic interspecies transcript profiling revealed that C. albicans activated its filamentous formation during invasion and the iron scavenging functions during the damage phases, whereas zebrafish ceased its iron homeostasis function following massive hemorrhage during the later stages of infection. This was followed by massive hemorrhaging toward the end stage of infection. Most of the immune related genes were expressed as the infection progressed from invasion to the damage phase. Such global, inter-species evidence of virulence-immune and iron competition dynamics during C. albicans infection could be crucial in understanding control fungal pathogenesis.