Change in Recessive Lethal Alleles Frequency in Inbred Populations

Change in Recessive Lethal Alleles Frequency in Inbred Populations
Arindam RoyChoudhury
(Submitted on 10 Apr 2013)

In a population practicing consanguineous marriage, rare recessive lethal alleles (RRLA) have higher chances of affecting phenotypes. As inbreeding causes more homozygosity and subsequently more deaths, the loss of individuals with RRLA decreases the frequency of these alleles. Although this phenomenon is well studied in general, here some hitherto unstudied cases are presented. An analytical formula for the RRLA frequency is presented for infinite monoecious population practicing several different types of inbreeding. In finite diecious populations, it is found that more severe inbreeding leads to quicker RRLA losses, making the upcoming generations healthier. A population of size 10,000 practicing 30% half-sib marriages loses more than 95% of its RRLA in 100 generations; a population practicing 30% cousin marriages loses about 75% of its RRLA. Our findings also suggest that given enough resources to grow, a small inbred population will be able to rebound while losing the RRLA.

Our paper: The causal meaning of Fisher’s average effect

This guest post is by James Lee on his paper with Carson Chow, “The causal meaning of Fisher’s average effect“, arXived here

Early in graduate school, I took it upon myself to read Reinhard Burger’s excellent treatise The Mathematical Theory of Selection, Recombination, and Mutation. Here I encountered the concepts of “average excess” and “average effect,” which were defined (rather unclearly to the casual reader) by Ronald Fisher in his presentation of the Fundamental Theorem of Natural Selection. Finding some of the distinctions made between these two concepts rather confusing, I directed some questions about them to the Yahoo quantitative genetics group. A respondent told me to consult Falconer (1985), which would “make things as clear as mud.”

My school did not have electronic access to Genetics Research at the time, so I did things the old-fashioned way and got my hands on a bound copy of the journal volume containing Falconer’s article. This masterpiece of exposition impressed me so much that I copied it down by hand; since the paper was at the end of the bound volume, the librarian was not able to scan it for me.

Falconer set out four distinct concepts that at various times have been put forth as definitions of the average excess, average effect, or both:

(A) Divide the population into two groups, one containing all A1A1 homozygotes and half of the heterozygotes, the other containing all A2A2 homozygotes and half of the heterozygotes. Take the difference between the conditional mean phenotypes of these two groups.

(B) Choose gametes bearing A1 and A2 at random. Measure the phenotypes of the mature organisms to which these gametes ultimately give rise. Take the difference between the conditional mean phenotypes of the A1 and A2 gametes.

(C) Regress the phenotype on the count (0, 1, or 2) of an arbitrarily chosen allele (A1 or A2). Take the regression coefficient of gene count.

(D) Take the average change in phenotype resulting from experimentally “zapping” one allele into the other, as if by mutation, in a zygote immediately after fertilization but before the onset of any developmental events.

Implicitly assuming that genotypes and environments are independent, Falconer then showed that all four concepts are equivalent under random mating. Now suppose that mating is not random. Then (A) and (B) are still equal and correspond to what Fisher called the average excess. The numerical value of this quantity is generally not equal to either (C) or (D), and in turn (C) and (D) are generally not equal to each other. Falconer concluded that (C) was what Fisher really meant by the average effect.

This conclusion disturbed me a great deal. As any GWAS researcher knows, the (partial) regression of phenotype on gene count does not necessarily pick out any biologically meaningful quantity if genotypes and environments are dependent (“population stratification”). The fundamental issue here is that (C) is merely a statistical definition, appealing only to passive observations of a static population, whereas (D) is a causal definition turning on the result of a hypothetical experimental intervention. I no longer remember now whether I had read Pearl (2009) by this point, but regardless my Spider Sense was unambiguously telling me that (D) was deeper and more meaningful than (C). Furthermore, if Fisher was not the one who coined the slogan “correlation is not causation,” he was certainly one of its first and most vocal promoters. How could Fisher, who invented randomization in experimental design, have preferred a correlational definition over a causal one when setting forth one of the key concepts in his evolutionary theory? Could it be because of the difficult in translating (D) from words into mathematical symbols without something like Pearl’s do operator, which was not available in Fisher’s time?

This paradox continued to bother me over the next several years. Soon after my daughter was born, I indulged one of those wild impulses that strike the sleepless: I emailed my questions regarding this matter to Anthony W. F. Edwards, the last student of the great Fisher himself. Anthony very generously sent me some of his unpublished work and also his correspondence with Falconer about the very article that had spurred my thoughts. This correspondence spanned a period of more than 20 years, and it provided a very poignant portrait of Douglas Falconer as a scientist (Hill and Mackay, 2004). I did not immediately find the answers to my questions in the materials that Anthony sent to me, but they set me on the path toward finding the answers. These are presented in the paper, which will shortly appear in Genetics Research.

It turns out that Fisher’s average effect must be given a causal interpretation after all. For the detailed story of the reconciliation between (C) and (D), you will have to read the paper, written in collaboration with my supervisor Carson Chow. I am particularly pleased with our proof that the frequency-weighted mean of the (experimental) average effects at any locus is equal to zero. In most texts this relation is extrinsically applied to the multilocus case without any motivation except that it holds automatically for the (regression) average effects in the case of a single locus. The fact that this identity, which otherwise is an arbitrary constraint, can be derived from a definition positing the experimental replacement of a homologous gene is rather striking evidence for the importance of a causal interpretation.

Our investigation unexpectedly turned up many connections to other parts of population genetics. I like to think that in the pages of our paper one can hear many masters of population and quantitative genetics–Hardy, Fisher, Wright, Kimura, Falconer, Price, Ewens, Lessard–engaging in a deep conversation.

There are some issues raised in the paper that I am still contemplating. First, there is a complication when one considers randomly sampling a zygote and experimentally changing its genotype to the one whose value needs to be known; such an experiment inevitably changes the frequencies of the genotypes, and for theoretical reasons any ensuing frequency-dependent changes in the phenotypic means of the genotypes needs to be excluded. I believe that one way to do this properly is by partition of the effects of the experiment according to Wright’s path analysis–which would be rather ironic given the well-known antagonism between Wright and Fisher. Second, in the multilocus case it might be possible to mathematically describe special subsets of possible gene substitutions defining a given average effect that satisfy the property that all changes in Hardy-Weinberg and linkage disequilibria are “small.” We look forward to future work (by ourselves?) on these questions.

Note: The bibliography gives the name of the journal in which Falconer (1985) appears as Genetical Research. This is the same journal as Genetics Research; the name was changed about ten years ago.

Detecting the structure of haplotypes, local ancestry and excessive local European ancestry in Mexicans

Detecting the structure of haplotypes, local ancestry and excessive local European ancestry in Mexicans
Yongtao Guan
(Submitted on 5 Apr 2013)

We present a two-layer hidden Markov model to detect structure of haplotypes for unrelated individuals. This allows modeling two scales of linkage disequilibrium (one within a group of haplotypes and one between groups), thereby taking advantage of rich haplotype information to infer local ancestry for admixed individuals. Our method outperforms competing state-of-art methods, particularly for regions of small ancestral track lengths. Applying our method to Mexican samples in HapMap3, we found five coding regions, ranging from $0.3 -1.3$ megabase (Mb) in lengths, that exhibit excessive European ancestry (average dosage > 1.6). A particular interesting region of 1.1Mb (with average dosage 1.95) locates on Chromosome 2p23 that harbors two genes, PXDN and MYT1L, both of which are associated with autism and schizophrenia. In light of the low prevalence of autism in Hispanics, this region warrants special attention. We confirmed our findings using Mexican samples from the 1000 genomes project. A software package implementing methods described in the paper is freely available at this http URL.

The causal meaning of Fisher’s average effect

The causal meaning of Fisher’s average effect
James J. Lee, Carson C. Chow
(Submitted on 6 Apr 2013)

In order to formulate the Fundamental Theorem of Natural Selection, Fisher defined the average excess and average effect of a gene substitution. Finding these notions to be somewhat opaque, some authors have recommended reformulating Fisher’s ideas in terms of covariance and regression, which are classical concepts of statistics. We argue that Fisher intended his two averages to express a distinction between correlation and causation. On this view the average effect is a specific weighted average of the actual phenotypic changes that result from physically changing the allelic states of homologous genes. We show that the statistical and causal conceptions of the average effect, perceived as inconsistent by Falconer, can be reconciled if certain relationships between the genotype frequencies and non-additive residuals are conserved. There are certain theory-internal considerations favoring Fisher’s original formulation in terms of causality; for example, the frequency-weighted mean of the average effects equaling zero at each locus becomes a derivable consequence rather than an arbitrary constraint. More broadly, Fisher’s distinction between correlation and causation is of critical importance to gene-trait mapping studies and the foundations of evolutionary biology.

Our paper: The effects of transcription factor competition on gene regulation

This guest post is by Radu Zabet on his papers “The effects of transcription factor competition on gene regulation” and “The influence of transcription factor competition on the relationship between occupancy and affinity”

Transcription factors (TFs) find their genomic target sites by a combination of three-dimensional diffusion and one-dimensional translocation on the DNA. We previously developed the stochastic simulation framework GRiP (http://logic.sysbiol.cam.ac.uk/grip/) that allows the realistic representation of the target finding process. The following two papers show our application of GRiP to address a few interesting phenomena:

The effects of transcription factor competition on gene regulation
arXiv:1303.6793

The binding of site-specific TFs to their genomic target sites controls the transcription rate of the target genes. In this manuscript, we discuss the influence of TF abundance on the arrival time of TFs on their target sites as well as the time they stay bound to the DNA. We investigate the TF search process using stochastic simulations and found that molecular crowding on the DNA always leads to longer times required by TF molecules to locate their target sites as well as to lower occupancy. There is also an “emergent property” in cases where many molecules compete in some sort of molecular traffic jam on the DNA. This newly identified noise component may be a contributor to transcriptional noise, by affecting both the size of the fluctuations and the distribution of the arrival times (unimodal or bimodal).

The influence of transcription factor competition on the relationship between occupancy and affinity
arXiv:1303.6869

This manuscript deals with the discrepancy between “predicted occupancy” of a TF to a binding site on the basis of, say, a PWM, in contrast to a “measured occupancy” when we simulate the system with our GRiP framework. Again, we can show that absolute TF abundances play an important role in gene expression, and also provide a compelling case where selecting “the highest peaks” from a ChIP experiment may not necessarily identify the most affine binding sites. Our results showed that for medium and high affinity sites, TF competition does not play a significant role for genomic occupancy except in cases when the abundance of the TF is significantly increased, or when the PWM displays relatively low information content. Nevertheless, for medium and low affinity sites, an increase in TF abundance (for both cognate and non-cognate molecules) leads to an increase in occupancy at several sites.

Our paper: Improving transcriptome assembly through error correction of high-throughput sequence reads

This guest post is by Matt MacManes on his preprint with Michael Eisen, “Improving transcriptome assembly through error correction of high-throughput sequence reads“, arXived here. This is cross-posted from his blog.

I am writing this blog post in support of a paper that I have just submitted to arXiv: Improving transcriptome assembly through error correction of high-throughput sequence reads. My goal is not to talk about the nuts and bolts of the paper so much as it is to ramble about its motivation and the writing process.

First, a little bit about me, as this is my 1st paper with my postdoctoral advisor, Mike Eisen. In short, I am a evolutionary biologist by training, having done my PhD on the relationship between mating systems and immunogenes in wild rodents. My postdoc work focuses on adaptation to desert life in rodents- I work on Peromyscus rodents in the Southern California deserts, combining field work and genomics. My overarching goals include the ability to operate in multiple domains– genomics, field biology, evolutionary biology to better understand basic questions– the links between genotype and phenotype, adaptation, etc… OK, enough.. on the the paper.

Abstract:

The study of functional genomics–particularly in non-model organisms has been dramatically improved over the last few years by use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure–the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on, and while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, by reducing assembly error by nearly 50%, and therefore should be applied to all datasets.

For the past couple of years, I have had an interest in better understanding the dynamics of de novo transcriptome assembly.. I had mostly selfish/practical reasons for wanting to understand–a large amount of my work depends on getting these assemblies ‘right’.. It was quickly evident that much of the computational research is directed at assembly itself, and very little on the pre- and post-assembly processes.. We know these things are important, but often an understanding of their effects is lacking…

How error correction of sequencing reads affects assembly accuracy has been one of the specific ideas I’ve been interested in thinking about for the past several months. The idea of simulating RNAseq reads, applying various error corrections, then understanding their effects is logical– so much so that I was really surprised that this has not been done before. So off I went..

I wrote this paper over the coarse of a couple of weeks. It is a short and simple paper, and was quite easy to write. Of note, about 75% of the paper was written on the playground in the UC Berkeley University Village, while (loosely) providing supervision for my 2 youngest daughters. How is that for work-life balance!

The read data will be available on Figshare, and I owe thanks to those guys for lifting the upload limit- the read file is 2.6Gb with .bz2 compression, so not huge, but not small either. The winning (AllPathsLG corrected) assembly is there as well.

This type of work is inspired, in a very real sense, by C. Titus Brown, who is quickly becoming to be the go-to guy for understanding the nuts and bolts of genome assembly (and also got tenure based on his klout score HA!). His post and paper on The challenges of mRNAseq analysis is the type of stuff that I aspire to…

Anyway, I’d be really interested in hearing what you all think of the paper, so read, enjoy, commentand get to error correcting those reads!

Improving transcriptome assembly through error correction of high-throughput sequence reads

Improving transcriptome assembly through error correction of high-throughput sequence reads
Matthew D MacManes, Michael B Eisen
(Submitted on 3 Apr 2013)

The study of functional genomics–particularly in non-model organisms has been dramatically improved over the last few years by use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally intensive procedure–the de novo construction of a reference transcriptome must be completed as a prerequisite to further analyses. The accurate reference is critically important as all downstream steps, including estimating transcript abundance are critically dependent on the construction of an accurate reference. Though a substantial amount of research has been done on assembly, only recently have the pre-assembly procedures been studied in detail. Specifically, several stand-alone error correction modules have been reported on, and while they have shown to be effective in reducing errors at the level of sequencing reads, how error correction impacts assembly accuracy is largely unknown. Here, we show via use of a simulated dataset, that applying error correction to sequencing reads has significant positive effects on assembly accuracy, by reducing assembly error by nearly 50%, and therefore should be applied to all datasets.

Concurrent and Accurate RNA Sequencing on Multicore Platforms

Concurrent and Accurate RNA Sequencing on Multicore Platforms
Héctor Martínez (1), Joaquín Tárraga (2), Ignacio Medina (2), Sergio Barrachina (1), Maribel Castillo (1), Joaquín Dopazo (2), Enrique S. Quintana-Ortí (1) ((1) Dpto. de Ingeniería y Ciencia de los Computadores, Universidad Jaume I, Castellón, Spain, (2) Computational Genomics Institute, Centro de Investigación Príncipe Felipe, Valencia, Spain)
(Submitted on 2 Apr 2013)

In this paper we introduce a novel parallel pipeline for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, named HPG-Aligner, leverages the speed of the Burrows-Wheeler Transform to map a large number of RNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, that is employed to deal with conflictive reads. The aligner is complemented with a careful strategy to detect splice junctions based on the division of RNA reads into short segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing useful information for the successful alignment of the complete reads.
Experimental results on platforms with AMD and Intel multicore processors report the remarkable parallel performance of HPG-Aligner, on short and long RNA reads, which excels in both execution time and sensitivity to an state-of-the-art aligner such as TopHat 2 built on top of Bowtie and Bowtie 2.

Low-virulence Strains of Toxoplasma gondii Result in Permanent Loss of Innate Fear of Cats in Mice, Even after Parasite Clearance

Low-virulence Strains of Toxoplasma gondii Result in Permanent Loss of Innate Fear of Cats in Mice, Even after Parasite Clearance
Wendy Marie Ingram, Leeanne M Goodrich, Ellen A Robey, Michael B Eisen
(Submitted on 1 Apr 2013)

Toxoplasma gondii chronic infection in rodent secondary hosts has been reported to lead to a loss of innate, hard-wired fear toward cats, its primary host. However the generality of this response across T. gondii strains and the underlying mechanism for this pathogenmediated behavioral change remain unknown. To begin exploring these questions, we evaluated the effects of infection with isolates from the three major North American clonal lineages of T. gondii. Using an hour-long open field activity assay optimized for this purpose, we measured mouse aversion toward predator and non-predator urines. We show that loss of innate aversion of cat urine is a general trait caused by infection with all three major clonal lineages of parasite. Surprisingly, we found that infection with an attenuated Type I parasite results in sustained loss of fear at times post infection when neither parasite nor ongoing brain inflammation were detectable. This suggests that T. gondii-mediated interruption of mouse innate aversion of cats may occur during early acute infection in a permanent manner, not requiring persistence of parasite cysts or continuing brain inflammation.

A new approach to estimate directional genetic differentiation and asymmetric migration patterns

A new approach to estimate directional genetic differentiation and asymmetric migration patterns
Lisa Sundqvist, Martin Zackrisson, David Kleinhans
(Submitted on 30 Mar 2013)

In the field of population genetics measures of genetic differentiation are widely used to gather information on the structure and the amount of gene flow between populations. These indirect measures are based on a number of simplifying assumptions, for instance equal population size and symmetric migration. Structured populations with asymmetric migration patterns, frequently occur in nature and information about directional gene flow would here be of great interest. Nevertheless current measures of genetic differentiation cannot be used in such systems without violating the assumptions. To get information on asymmetric migration patterns from genetic data rather complex models using maximum likelihood or Bayesian approaches generally need to be applied. In such models a large number of parameters are estimated simultaneously and this involves complex optimization algorithms. We here introduce a new approach that intends to fill the gap between the complex approaches and the symmetric measures of genetic differentiation. Our approach makes it possible to calculate a directional component of genetic differentiation at low computational effort using any of the classical measures of genetic differentiation. The approach is based on defining a pool of migrants for any pair of populations and calculating measures for genetic differentiation between the populations and the respective pools. The directional measures of genetic differentiation can further be used to calculate asymmetric migration. The procedure is demonstrated with a simulated data set with known migration pattern. A comparison of the estimation results with the migration pattern used for simulation suggests, that our method captures relevant properties of migration patterns even at low migration frequencies and with few marker loci.