# Our paper: The causal meaning of Fisher’s average effect

This guest post is by James Lee on his paper with Carson Chow, “The causal meaning of Fisher’s average effect“, arXived here

Early in graduate school, I took it upon myself to read Reinhard Burger’s excellent treatise The Mathematical Theory of Selection, Recombination, and Mutation. Here I encountered the concepts of “average excess” and “average effect,” which were defined (rather unclearly to the casual reader) by Ronald Fisher in his presentation of the Fundamental Theorem of Natural Selection. Finding some of the distinctions made between these two concepts rather confusing, I directed some questions about them to the Yahoo quantitative genetics group. A respondent told me to consult Falconer (1985), which would “make things as clear as mud.”

My school did not have electronic access to Genetics Research at the time, so I did things the old-fashioned way and got my hands on a bound copy of the journal volume containing Falconer’s article. This masterpiece of exposition impressed me so much that I copied it down by hand; since the paper was at the end of the bound volume, the librarian was not able to scan it for me.

Falconer set out four distinct concepts that at various times have been put forth as definitions of the average excess, average effect, or both:

(A) Divide the population into two groups, one containing all A1A1 homozygotes and half of the heterozygotes, the other containing all A2A2 homozygotes and half of the heterozygotes. Take the difference between the conditional mean phenotypes of these two groups.

(B) Choose gametes bearing A1 and A2 at random. Measure the phenotypes of the mature organisms to which these gametes ultimately give rise. Take the difference between the conditional mean phenotypes of the A1 and A2 gametes.

(C) Regress the phenotype on the count (0, 1, or 2) of an arbitrarily chosen allele (A1 or A2). Take the regression coefficient of gene count.

(D) Take the average change in phenotype resulting from experimentally “zapping” one allele into the other, as if by mutation, in a zygote immediately after fertilization but before the onset of any developmental events.

Implicitly assuming that genotypes and environments are independent, Falconer then showed that all four concepts are equivalent under random mating. Now suppose that mating is not random. Then (A) and (B) are still equal and correspond to what Fisher called the average excess. The numerical value of this quantity is generally not equal to either (C) or (D), and in turn (C) and (D) are generally not equal to each other. Falconer concluded that (C) was what Fisher really meant by the average effect.

This conclusion disturbed me a great deal. As any GWAS researcher knows, the (partial) regression of phenotype on gene count does not necessarily pick out any biologically meaningful quantity if genotypes and environments are dependent (“population stratification”). The fundamental issue here is that (C) is merely a statistical definition, appealing only to passive observations of a static population, whereas (D) is a causal definition turning on the result of a hypothetical experimental intervention. I no longer remember now whether I had read Pearl (2009) by this point, but regardless my Spider Sense was unambiguously telling me that (D) was deeper and more meaningful than (C). Furthermore, if Fisher was not the one who coined the slogan “correlation is not causation,” he was certainly one of its first and most vocal promoters. How could Fisher, who invented randomization in experimental design, have preferred a correlational definition over a causal one when setting forth one of the key concepts in his evolutionary theory? Could it be because of the difficult in translating (D) from words into mathematical symbols without something like Pearl’s do operator, which was not available in Fisher’s time?

This paradox continued to bother me over the next several years. Soon after my daughter was born, I indulged one of those wild impulses that strike the sleepless: I emailed my questions regarding this matter to Anthony W. F. Edwards, the last student of the great Fisher himself. Anthony very generously sent me some of his unpublished work and also his correspondence with Falconer about the very article that had spurred my thoughts. This correspondence spanned a period of more than 20 years, and it provided a very poignant portrait of Douglas Falconer as a scientist (Hill and Mackay, 2004). I did not immediately find the answers to my questions in the materials that Anthony sent to me, but they set me on the path toward finding the answers. These are presented in the paper, which will shortly appear in Genetics Research.

It turns out that Fisher’s average effect must be given a causal interpretation after all. For the detailed story of the reconciliation between (C) and (D), you will have to read the paper, written in collaboration with my supervisor Carson Chow. I am particularly pleased with our proof that the frequency-weighted mean of the (experimental) average effects at any locus is equal to zero. In most texts this relation is extrinsically applied to the multilocus case without any motivation except that it holds automatically for the (regression) average effects in the case of a single locus. The fact that this identity, which otherwise is an arbitrary constraint, can be derived from a definition positing the experimental replacement of a homologous gene is rather striking evidence for the importance of a causal interpretation.

Our investigation unexpectedly turned up many connections to other parts of population genetics. I like to think that in the pages of our paper one can hear many masters of population and quantitative genetics–Hardy, Fisher, Wright, Kimura, Falconer, Price, Ewens, Lessard–engaging in a deep conversation.

There are some issues raised in the paper that I am still contemplating. First, there is a complication when one considers randomly sampling a zygote and experimentally changing its genotype to the one whose value needs to be known; such an experiment inevitably changes the frequencies of the genotypes, and for theoretical reasons any ensuing frequency-dependent changes in the phenotypic means of the genotypes needs to be excluded. I believe that one way to do this properly is by partition of the effects of the experiment according to Wright’s path analysis–which would be rather ironic given the well-known antagonism between Wright and Fisher. Second, in the multilocus case it might be possible to mathematically describe special subsets of possible gene substitutions defining a given average effect that satisfy the property that all changes in Hardy-Weinberg and linkage disequilibria are “small.” We look forward to future work (by ourselves?) on these questions.

Note: The bibliography gives the name of the journal in which Falconer (1985) appears as Genetical Research. This is the same journal as Genetics Research; the name was changed about ten years ago.