Integrating influenza antigenic dynamics with molecular evolution

Integrating influenza antigenic dynamics with molecular evolution
Trevor Bedford, Marc A. Suchard, Philippe Lemey, Gytis Dudas, Victoria Gregory, Alan J. Hay, John W. McCauley, Colin A. Russell, Derek J. Smith, Andrew Rambaut
(Submitted on 12 Apr 2013)

Influenza viruses undergo continual antigenic evolution allowing mutant viruses to evade immunity acquired by the host population to previous virus strains. Antigenic phenotype is often assessed through pairwise measurement of cross-reactivity between influenza strains using the hemagglutination inhibition (HI) assay. Here, we extend previous approaches to antigenic cartography, which seeks to place strains on an antigenic map, such that distances on this map best recapitulate titers observed across multiple HI assays. In our model, we simultaneously characterize antigenic and genetic evolution by including an evolutionary model in which antigenic location diffuses over a shared virus phylogeny. Using HI data for four lineages of influenza, encompassing influenza A subtypes H3N2 and H1N1, and influenza B lineages Victoria and Yamagata, we determine average rates of antigenic drift for each lineage, as well as year-to-year variability in the rate of drift. Through comparison with epidemiological data, we demonstrate a year-to-year correlation between drift and incidence and present evidence that antigenic drift mediates interference between influenza lineages. We investigate the selective underpinnings for differing antigenic dynamics across lineages and show that A/H3N2 benefits from both a higher influx of new antigenic mutations and also from more efficient conversion of antigenic variation into fixed differences. This work does much to elucidate the antigenic dynamics of influenza lineages, but also allows for substantial future advances in investigating the dynamics of influenza and other antigenically-variable pathogens by providing a model that intimately combines molecular and antigenic evolution.

Identifiability of a Coalescent-based Population Tree Model

Identifiability of a Coalescent-based Population Tree Model
Arindam RoyChoudhury
(Submitted on 12 Apr 2013)

Identifiability of evolutionary tree models has been a recent topic of discussion and some models have been shown to be non-identifiable. A coalescent-based rooted population tree model, originally proposed by Nielsen et al. 1998 [2], has been used by many authors in the last few years and is a simple tool to accurately model the changes in allele frequencies in the tree. However, the identifiability of this model has never been proven. Here we prove this model to be identifiable by showing that the model parameters can be expressed as functions of the probability distributions of subsamples. This a step toward proving the consistency of the maximum likelihood estimator of the population tree based on this model.

The influence of relatives on the efficiency and error rate of familial searching

The influence of relatives on the efficiency and error rate of familial searching
Rori V. Rohlfs, Erin Murphy, Yun S. Song, Montgomery Slatkin
(Submitted on 10 Apr 2013)

We investigate the consequences of adopting the criteria used by the state of California, as described by Myers et al. (2011), for conducting familial searches. We carried out a simulation study of randomly generated profiles of related and unrelated individuals with 13-locus CODIS genotypes and YFiler Y-chromosome haplotypes, on which the Myers protocol for relative identification was carried out. For Y-chromosome sharing first degree relatives, the Myers protocol has a high probability (80 – 99%) of identifying their relationship. For unrelated individuals, there is a low probability that an unrelated person in the database will be identified as a first-degree relative. For more distant Y-haplotype sharing relatives (half-siblings, first cousins, half-first cousins or second cousins) there is a substantial probability that the more distant relative will be incorrectly identified as a first-degree relative. For example, there is a 3 – 18% probability that a first cousin will be identified as a full sibling, with the probability depending on the population background. Although the California familial search policy is likely to identify a first degree relative if his profile is in the database, and it poses little risk of falsely identifying an unrelated individual in a database as a first-degree relative, there is a substantial risk of falsely identifying a more distant Y-haplotype sharing relative in the database as a first-degree relative, with the consequence that their immediate family may become the target for further investigation. This risk falls disproportionately on those ethnic groups that are currently overrepresented in state and federal databases.

YHap: software for probabilistic assignment of Y haplogroups from population re-sequencing data

YHap: software for probabilistic assignment of Y haplogroups from population re-sequencing data
Fan Zhang, Ruoyan Chen, Dongbing Liu, Xiaotian Yao, Guoqing Li, Yabin Jin, Chang Yu, Yingrui Li, Lachlan Coin
(Submitted on 11 Apr 2013)

Y haplogroup analyses are an important component of genealogical reconstruction, population genetic analyses, medical genetics and forensics. These fields are increasingly moving towards use of low-coverage, high throughput sequencing. However, there is as yet no software available for using sequence data to assign Y haplogroup groups probabilistically, such that the posterior probability of assignment fully reflects the information present in the data, and borrows information across all samples sequenced from a population. YHap addresses this problem.

The Maintenance of Sex: Ronald Fisher meets the Red Queen

The Maintenance of Sex: Ronald Fisher meets the Red Queen
David Green, Chris Mason
(Submitted on 10 Apr 2013)

Sex in higher diploids carries a two-fold cost of males that should reduce its fitness relative to cloning and result in extinction. Instead, sex is widespread and it is clonal species that face early obsolescence. One possible reason is that sex is an adaptation to resist parasites. We use computer simulations of finite populations to model a Red Queen in which a parasitic haploid mounts a negative frequency-dependent attack on a diploid host. Both host and parasite populations generate novel alleles by mutation and have access to large allele spaces. Sex outcompetes cloning by two overlapping mechanisms. First, sexual diploids adopt advantageous homozygous mutations more rapidly than clonal diploids under conditions of lag load. This rate advantage can offset the lesser fecundity of sex. Second, a relative advantage to sex emerges under host mutation rates that are fast enough to retain fitness in a rapidly mutating parasite environment and increase host polymorphism and polyclonality. Polyclonal populations disproportionately experience interference with selection at high mutation rates, both between and within loci, slowing clonal population adaptation to a changing parasite environment and reducing clonal population fitness relative to sex. This effect increases markedly with the number of loci under independent selection. Rates of parasite mutation exist that not only allow sex to survive despite the two-fold cost of males but which enable sexual and clonal populations to have equal fitness and co-exist. Since all higher organisms carry parasitic loads, the model is of general applicability.

Clusters of microRNAs emerge by new hairpins in existing transcripts

Clusters of microRNAs emerge by new hairpins in existing transcripts
Antonio Marco, Maria Ninova, Matthew Ronshaugen, Sam Griffiths-Jones
(Submitted on 9 Apr 2013)

Genetic linkage may result in the expression of multiple products from a single polycistronic transcript, under the control of a single promoter. In animals, protein-coding polycistronic transcripts are rare. However, microRNAs are frequently clustered in the genomes of animals and plants, and these clusters are often transcribed as a single unit. The evolution of microRNA clusters has been the subject of much speculation, and a selective advantage of clusters of functionally related microRNAs is often proposed. However, the origin of microRNA clusters has not been so far systematically explored. Here we study the evolution of all microRNA clusters in Drosophila melanogaster, and suggest a number of models for their emergence. We observed that a majority of microRNA clusters arose by the de novo formation of new microRNA-like hairpins in existing microRNA transcripts. Some clusters also emerged by tandem duplication of a single microRNA. Comparative genomics show that these clusters, once formed, are unlikely to split or undergo rearrangements. We did not find any instances of clusters appearing by rearrangement of pre-existing microRNA genes. We propose a model for microRNA cluster origin and evolution in which selection over one of the microRNAs in the cluster interferes with the evolution of the other tightly linked microRNAs. Our analysis suggests that the evolutionary study of microRNAs and other small RNAs must consider and account for linkage associations.

Change in Recessive Lethal Alleles Frequency in Inbred Populations

Change in Recessive Lethal Alleles Frequency in Inbred Populations
Arindam RoyChoudhury
(Submitted on 10 Apr 2013)

In a population practicing consanguineous marriage, rare recessive lethal alleles (RRLA) have higher chances of affecting phenotypes. As inbreeding causes more homozygosity and subsequently more deaths, the loss of individuals with RRLA decreases the frequency of these alleles. Although this phenomenon is well studied in general, here some hitherto unstudied cases are presented. An analytical formula for the RRLA frequency is presented for infinite monoecious population practicing several different types of inbreeding. In finite diecious populations, it is found that more severe inbreeding leads to quicker RRLA losses, making the upcoming generations healthier. A population of size 10,000 practicing 30% half-sib marriages loses more than 95% of its RRLA in 100 generations; a population practicing 30% cousin marriages loses about 75% of its RRLA. Our findings also suggest that given enough resources to grow, a small inbred population will be able to rebound while losing the RRLA.

Our paper: The causal meaning of Fisher’s average effect

This guest post is by James Lee on his paper with Carson Chow, “The causal meaning of Fisher’s average effect“, arXived here

Early in graduate school, I took it upon myself to read Reinhard Burger’s excellent treatise The Mathematical Theory of Selection, Recombination, and Mutation. Here I encountered the concepts of “average excess” and “average effect,” which were defined (rather unclearly to the casual reader) by Ronald Fisher in his presentation of the Fundamental Theorem of Natural Selection. Finding some of the distinctions made between these two concepts rather confusing, I directed some questions about them to the Yahoo quantitative genetics group. A respondent told me to consult Falconer (1985), which would “make things as clear as mud.”

My school did not have electronic access to Genetics Research at the time, so I did things the old-fashioned way and got my hands on a bound copy of the journal volume containing Falconer’s article. This masterpiece of exposition impressed me so much that I copied it down by hand; since the paper was at the end of the bound volume, the librarian was not able to scan it for me.

Falconer set out four distinct concepts that at various times have been put forth as definitions of the average excess, average effect, or both:

(A) Divide the population into two groups, one containing all A1A1 homozygotes and half of the heterozygotes, the other containing all A2A2 homozygotes and half of the heterozygotes. Take the difference between the conditional mean phenotypes of these two groups.

(B) Choose gametes bearing A1 and A2 at random. Measure the phenotypes of the mature organisms to which these gametes ultimately give rise. Take the difference between the conditional mean phenotypes of the A1 and A2 gametes.

(C) Regress the phenotype on the count (0, 1, or 2) of an arbitrarily chosen allele (A1 or A2). Take the regression coefficient of gene count.

(D) Take the average change in phenotype resulting from experimentally “zapping” one allele into the other, as if by mutation, in a zygote immediately after fertilization but before the onset of any developmental events.

Implicitly assuming that genotypes and environments are independent, Falconer then showed that all four concepts are equivalent under random mating. Now suppose that mating is not random. Then (A) and (B) are still equal and correspond to what Fisher called the average excess. The numerical value of this quantity is generally not equal to either (C) or (D), and in turn (C) and (D) are generally not equal to each other. Falconer concluded that (C) was what Fisher really meant by the average effect.

This conclusion disturbed me a great deal. As any GWAS researcher knows, the (partial) regression of phenotype on gene count does not necessarily pick out any biologically meaningful quantity if genotypes and environments are dependent (“population stratification”). The fundamental issue here is that (C) is merely a statistical definition, appealing only to passive observations of a static population, whereas (D) is a causal definition turning on the result of a hypothetical experimental intervention. I no longer remember now whether I had read Pearl (2009) by this point, but regardless my Spider Sense was unambiguously telling me that (D) was deeper and more meaningful than (C). Furthermore, if Fisher was not the one who coined the slogan “correlation is not causation,” he was certainly one of its first and most vocal promoters. How could Fisher, who invented randomization in experimental design, have preferred a correlational definition over a causal one when setting forth one of the key concepts in his evolutionary theory? Could it be because of the difficult in translating (D) from words into mathematical symbols without something like Pearl’s do operator, which was not available in Fisher’s time?

This paradox continued to bother me over the next several years. Soon after my daughter was born, I indulged one of those wild impulses that strike the sleepless: I emailed my questions regarding this matter to Anthony W. F. Edwards, the last student of the great Fisher himself. Anthony very generously sent me some of his unpublished work and also his correspondence with Falconer about the very article that had spurred my thoughts. This correspondence spanned a period of more than 20 years, and it provided a very poignant portrait of Douglas Falconer as a scientist (Hill and Mackay, 2004). I did not immediately find the answers to my questions in the materials that Anthony sent to me, but they set me on the path toward finding the answers. These are presented in the paper, which will shortly appear in Genetics Research.

It turns out that Fisher’s average effect must be given a causal interpretation after all. For the detailed story of the reconciliation between (C) and (D), you will have to read the paper, written in collaboration with my supervisor Carson Chow. I am particularly pleased with our proof that the frequency-weighted mean of the (experimental) average effects at any locus is equal to zero. In most texts this relation is extrinsically applied to the multilocus case without any motivation except that it holds automatically for the (regression) average effects in the case of a single locus. The fact that this identity, which otherwise is an arbitrary constraint, can be derived from a definition positing the experimental replacement of a homologous gene is rather striking evidence for the importance of a causal interpretation.

Our investigation unexpectedly turned up many connections to other parts of population genetics. I like to think that in the pages of our paper one can hear many masters of population and quantitative genetics–Hardy, Fisher, Wright, Kimura, Falconer, Price, Ewens, Lessard–engaging in a deep conversation.

There are some issues raised in the paper that I am still contemplating. First, there is a complication when one considers randomly sampling a zygote and experimentally changing its genotype to the one whose value needs to be known; such an experiment inevitably changes the frequencies of the genotypes, and for theoretical reasons any ensuing frequency-dependent changes in the phenotypic means of the genotypes needs to be excluded. I believe that one way to do this properly is by partition of the effects of the experiment according to Wright’s path analysis–which would be rather ironic given the well-known antagonism between Wright and Fisher. Second, in the multilocus case it might be possible to mathematically describe special subsets of possible gene substitutions defining a given average effect that satisfy the property that all changes in Hardy-Weinberg and linkage disequilibria are “small.” We look forward to future work (by ourselves?) on these questions.

Note: The bibliography gives the name of the journal in which Falconer (1985) appears as Genetical Research. This is the same journal as Genetics Research; the name was changed about ten years ago.

Detecting the structure of haplotypes, local ancestry and excessive local European ancestry in Mexicans

Detecting the structure of haplotypes, local ancestry and excessive local European ancestry in Mexicans
Yongtao Guan
(Submitted on 5 Apr 2013)

We present a two-layer hidden Markov model to detect structure of haplotypes for unrelated individuals. This allows modeling two scales of linkage disequilibrium (one within a group of haplotypes and one between groups), thereby taking advantage of rich haplotype information to infer local ancestry for admixed individuals. Our method outperforms competing state-of-art methods, particularly for regions of small ancestral track lengths. Applying our method to Mexican samples in HapMap3, we found five coding regions, ranging from $0.3 -1.3$ megabase (Mb) in lengths, that exhibit excessive European ancestry (average dosage > 1.6). A particular interesting region of 1.1Mb (with average dosage 1.95) locates on Chromosome 2p23 that harbors two genes, PXDN and MYT1L, both of which are associated with autism and schizophrenia. In light of the low prevalence of autism in Hispanics, this region warrants special attention. We confirmed our findings using Mexican samples from the 1000 genomes project. A software package implementing methods described in the paper is freely available at this http URL.

The causal meaning of Fisher’s average effect

The causal meaning of Fisher’s average effect
James J. Lee, Carson C. Chow
(Submitted on 6 Apr 2013)

In order to formulate the Fundamental Theorem of Natural Selection, Fisher defined the average excess and average effect of a gene substitution. Finding these notions to be somewhat opaque, some authors have recommended reformulating Fisher’s ideas in terms of covariance and regression, which are classical concepts of statistics. We argue that Fisher intended his two averages to express a distinction between correlation and causation. On this view the average effect is a specific weighted average of the actual phenotypic changes that result from physically changing the allelic states of homologous genes. We show that the statistical and causal conceptions of the average effect, perceived as inconsistent by Falconer, can be reconciled if certain relationships between the genotype frequencies and non-additive residuals are conserved. There are certain theory-internal considerations favoring Fisher’s original formulation in terms of causality; for example, the frequency-weighted mean of the average effects equaling zero at each locus becomes a derivable consequence rather than an arbitrary constraint. More broadly, Fisher’s distinction between correlation and causation is of critical importance to gene-trait mapping studies and the foundations of evolutionary biology.