Inferring HIV escape rates from multi-locus genotype data

Inferring HIV escape rates from multi-locus genotype data
Taylor A. Kessinger, Alan S. Perelson, Richard A. Neher
(Submitted on 6 Aug 2013)

Cytotoxic T-lymphocytes (CTLs) recognize viral protein fragments displayed by major histocompatibility complex (MHC) molecules on the surface of virally infected cells and generate an anti-viral response that can kill the infected cells. Virus variants whose protein fragments are not efficiently presented on infected cells or whose fragments are presented but not recognized by CTLs therefore have a competitive advantage and spread rapidly through the population. We present a method that allows a more robust estimation of these escape rates from serially sampled sequence data. The proposed method accounts for competition between multiple escapes by explicitly modeling the accumulation of escape mutations and the stochastic effects of rare multiple mutants. Applying our method to serially sampled HIV sequence data, we estimate rates of HIV escape that are substantially larger than those previously reported. The method can be extended to complex escapes that require compensatory mutations. We expect our method to be applicable in other contexts such as cancer evolution where time series data is also available.

Macro-evolutionary models and coalescent point processes: The shape and probability of reconstructed phylogenies

Macro-evolutionary models and coalescent point processes: The shape and probability of reconstructed phylogenies
Amaury Lambert, Tanja Stadler
(Submitted on 6 Aug 2013)

Forward-time models of diversification (i.e., speciation and extinction) produce phylogenetic trees that grow “vertically” as time goes by. Pruning the extinct lineages out of such trees leads to natural models for reconstructed trees (i.e., phylogenies of extant species). Alternatively, reconstructed trees can be modelled by coalescent point processes (CPP), where trees grow “horizontally” by the sequential addition of vertical edges. Each new edge starts at some random speciation time and ends at the present time; speciation times are drawn from the same distribution independently. CPP lead to extremely fast computation of tree likelihoods and simulation of reconstructed trees. Their topology always follows the uniform distribution on ranked tree shapes (URT). We characterize which forward-time models lead to URT reconstructed trees and among these, which lead to CPP reconstructed trees. We show that for any “asymmetric” diversification model in which speciation rates only depend on time and extinction rates only depend on time and on a non-heritable trait (e.g., age), the reconstructed tree is CPP, even if extant species are incompletely sampled. If rates additionally depend on the number of species, the reconstructed tree is (only) URT (but not CPP). We characterize the common distribution of speciation times in the CPP description, and discuss incomplete species sampling as well as three special model cases in detail: 1) extinction rate does not depend on a trait; 2) rates do not depend on time; 3) mass extinctions may happen additionally at certain points in the past.

Bayesian genome assembly and assessment by Markov Chain Monte Carlo sampling

Bayesian genome assembly and assessment by Markov Chain Monte Carlo sampling
Mark Howison, Felipe Zapata, Erika J. Edwards, Casey W. Dunn
(Submitted on 6 Aug 2013)

Most genome assemblers provide a point estimates of the true genome sequences, chosen from among many alternative hypotheses that are supported by the data. We present a Markov Chain Monte Carlo approach to sequence assembly that instead generates a distribution of assembly hypotheses with quantified probabilities. This statistically explicit Bayesian approach to assembly allows the investigator to evaluate alternative assembly hypotheses in a unified framework and propagate uncertainty about genomes assembly to downstream analyses. We implement this approach in a prototype assembler and illustrate its application to the genome of the bacteriophage $\Phi$X174.

Proceedings of the 13th Workshop on Algorithms in Bioinformatics (WABI2013)

Proceedings of the 13th Workshop on Algorithms in Bioinformatics (WABI2013)
Aaron Darling, Jens Stoye
(Submitted on 6 Aug 2013)

These are the proceedings of the 13th Workshop on Algorithms in Bioinformatics, WABI2013, which was held September 2-4 2013 in Sophia Antipolis, France. All manuscripts were peer reviewed by the WABI2013 program committee and external reviewers.

Lineage specific reductions in genome size in salamanders are associated with increased rates of mutation

Lineage specific reductions in genome size in salamanders are associated with increased rates of mutation
John Herrick, Bianca Sclavi
(Submitted on 4 Aug 2013)

Very low levels of genetic diversity have been reported in vertebrates with large genomes, notably salamanders and lungfish [1-3]. Interpreting differences in heterozygosity, which reflects genetic diversity in a population, is complicated because levels of heterozygosity vary widely between conspecific populations, and correlate with many different physiological and demographic variables such as body size and effective population size. Here we return to the question of genetic variability in salamanders, and report on the relationship between evolutionary rates and genome sizes in five different salamander families. We found that rates of evolution are exceptionally low in salamanders as a group. Evolutionary rates are as low as those reported for cartilaginous fish, which have the slowest rates recorded so far in vertebrates [4]. We also found that, independent of life history, salamanders with the smallest genomes (14 pg) are evolving at rates two to three times faster than salamanders with the largest genomes (>50 pg). After accounting for evolutionary duration, we conclude that speciation events in salamanders are associated with contractions in genome size and concomitant increases in mutation and diversification rates.

Effect of linkage on the equilibrium frequency of deleterious mutations

Effect of linkage on the equilibrium frequency of deleterious mutations
Sona John, Kavita Jain
(Submitted on 5 Aug 2013)

We study the evolution of an asexual population of binary sequences of finite length in which both deleterious and reverse mutations can occur. Such a model has been used to understand the prevalence of preferred codons due to selection, mutation and drift, and proposed as a possible mechanism for halting the irreversible degeneration of asexual population due to Muller’s ratchet. Using an analytical argument and numerical simulations, we study the dependence of the equilibrium fraction of deleterious mutations on various population genetic parameters. In contrast to the one-locus theory, where the fraction of disadvantageous mutations decreases exponentially fast with increasing population size, we find that in the multilocus model, it decreases to zero exponentially for very large populations but approaches a constant for smaller populations logarithmically. The weak dependence on the population size may explain the similar levels of codon bias seen in populations of different sizes.

The pattern and distribution of deleterious mutations in maize

The pattern and distribution of deleterious mutations in maize
Sofiane Mezmouk, Jeffrey Ross-Ibarra
(Submitted on 2 Aug 2013)

Most non-synonymous mutations are thought to be deleterious because of their effect on protein sequence. These polymorphisms are expected to be removed or kept at low frequency by the action of natural selection, and rare deleterious variants have been implicated as a possible explanation for the “missing heritability” seen in many studies of complex traits. Nonetheless, the effect of positive selection on linked sites or drift in small or inbred populations may also impact the evolution of deleterious alleles. Here, we made use of genome-wide genotyping data to characterize deleterious variants in a large panel of maize inbred lines. We show that, in spite of small effective population sizes and inbreeding, most putatively deleterious SNPs are indeed at low frequencies within individual genetic groups. We find that genes showing associations with a number of complex traits are enriched for deleterious variants. Together these data are consistent with the dominance model of heterosis, in which complementation of numerous low frequency, weak deleterious variants contribute to hybrid vigor.

Population subdivision with migration can facilitate evolution on rugged fitness landscapes

Population subdivision with migration can facilitate evolution on rugged fitness landscapes
Anne-Florence Bitbol, David J. Schwab
(Submitted on 1 Aug 2013)

We show that subdivision of an asexual population into demes connected by migration significantly accelerates the crossing of fitness valleys and plateaus over a wide parameter range, both with respect to the non-subdivided population and with respect to a single deme. We predict the existence of a parameter range where valley or plateau crossing by the metapopulation is as fast as that of the fastest deme, and we verify this prediction using stochastic simulations. Finally, we extend our work to the case of a large population connected by migration to one or several smaller islands.

Distortion of genealogical properties when the sample is very large

Distortion of genealogical properties when the sample is very large
Anand Bhaskar, Andrew G. Clark, Yun S. Song
(Submitted on 1 Aug 2013)

Study sample sizes in human genetics are growing rapidly, and in due course it will become routine to analyze samples with hundreds of thousands if not millions of individuals. In addition to posing computational challenges, such large sample sizes call for carefully re-examining the theoretical foundation underlying commonly-used analytical tools. Here, we study the accuracy of the coalescent, a central model for studying the ancestry of a sample of individuals. The coalescent arises as a limit of a large class of random mating models and it is an accurate approximation to the original model provided that the population size is sufficiently larger than the sample size. We develop a method for performing exact computation in the discrete-time Wright-Fisher (DTWF) model and compare several key genealogical quantities of interest with the coalescent predictions. For realistic demographic scenarios, we find that there are a significant number of multiple- and simultaneous-merger events under the DTWF model, which are absent in the coalescent by construction. Furthermore, for large sample sizes, there are noticeable differences in the expected number of rare variants between the coalescent and the DTWF model. To balance the tradeoff between accuracy and computational efficiency, we propose a hybrid algorithm that utilizes the DTWF model for the recent past and the coalescent for the more distant past. Our results demonstrate that the hybrid method with only a handful of generations of the DTWF model leads to a frequency spectrum that is quite close to the prediction of the full DTWF model.

Maximum likelihood evidence for Neandertal admixture in Eurasian populations from three genomes

Maximum likelihood evidence for Neandertal admixture in Eurasian populations from three genomes
Konrad Lohse, Laurent A.F. Frantz
(Submitted on 31 Jul 2013)

Although there has been much interest in estimating divergence and admixture from genomic data, it has proven difficult to distinguish gene flow after divergence from alternative histories involving structure in the ancestral population. The lack of a formal test to distinguish these scenarios has sparked recent controversy about the possibility of interbreeding between Neandertals and modern humans in Eurasia. We derive the probability of mutational configurations in non-recombining sequence blocks under alternative histories of divergence with admixture and ancestral structure. Dividing the genome into short blocks makes it possible to compute maximum likelihood estimates of parameters under both models. We apply this method to triplets of human Neandertal genomes and quantify the relative support for models of long-term population structure in the ancestral African popuation and admixture from Neandertals into Eurasian populations after their expansion out of Africa. Our analysis allows us — for the first time — to formally reject a history of ancestral population structure and instead reveals strong support for admixture from Neandertals into Eurasian populations at a higher rate (3.4%-7.9%) than suggested previously.