A novel spectral method for inferring general selection from time series genetic data

A novel spectral method for inferring general selection from time series genetic data

Matthias Steinrücken, Anand Bhaskar, Yun S. Song
(Submitted on 3 Oct 2013)

Recently there has been growing interest in using time series genetic variation data, either from experimental evolution studies or ancient DNA samples, to make inference about evolutionary processes. While such temporal data can facilitate identifying genomic regions under selective pressure and estimating associated fitness parameters, it is a challenging problem to compute the likelihood of the underlying selection model given DNA samples obtained at several time points. Here, we develop an efficient algorithm to tackle this challenge. The key methodological advance in our work is the development of a novel spectral method to analytically and efficiently integrate over all trajectories of the population allele frequency between consecutive time points. This advance circumvents the limitations of existing methods which require fine-tuning the discretization of the allele frequency space to approximate certain integrals using numerical schemes. Furthermore, our method is flexible enough to handle general diploid models of selection where the heterozygote and homozygote fitness parameters can take any values, while previous methods focused on only a few restricted models of selection. We demonstrate the utility of our method on simulated data and apply the method to analyze time series ancient DNA data from genetic loci (ASIP and MC1R) associated with coat coloration in horses. In contrast to the conclusions of previous studies which considered only a few special selection schemes, our exploration of the full fitness parameter space reveals that balancing selection (in the form of heterozygote advantage) may have been acting on these loci.

Some mathematical tools for the Lenski experiment

Some mathematical tools for the Lenski experiment
Bernard Ycart (LJK), Agnès Hamon (LJK), Joël Gaffé (LAPM), Dominique Schneider (LAPM)
(Submitted on 2 Oct 2013)

The Lenski experiment is a long term daily reproduction of Escherichia coli, that has evidenced phenotypic and genetic evolutions along the years. Some mathematical models, that could be usefull in understanding the results of that experiment, are reviewed here: stochastic and deterministic growth, mutation appearance and fixation, competition of species.

Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible

Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible
Paul J. McMurdie, Susan Holmes
(Submitted on 1 Oct 2013)

The interpretation of count data originating from the current generation of DNA sequencing platforms requires special attention. In particular, the per-sample library sizes often vary by orders of magnitude from the same sequencing run, and the counts are overdispersed relative to a simple Poisson model These challenges can be addressed using an appropriate mixture model that simultaneously accounts for library size differences and biological variability. This approach is already well-characterized and implemented for RNA-Seq data in R packages such as edgeR and DESeq.
We use statistical theory, extensive simulations, and empirical data to show that variance stabilizing normalization using a mixture model like the negative binomial is appropriate for microbiome count data. In simulations detecting differential abundance, normalization procedures based on a Gamma-Poisson mixture model provided systematic improvement in performance over crude proportions or rarefied counts — both of which led to a high rate of false positives. In simulations evaluating clustering accuracy, we found that the rarefying procedure discarded samples that were nevertheless accurately clustered by alternative methods, and that the choice of minimum library size threshold was critical in some settings, but with an optimum that is unknown in practice. Techniques that use variance stabilizing transformations by modeling microbiome count data with a mixture distribution, such as those implemented in edgeR and DESeq, substantially improved upon techniques that attempt to normalize by rarefying or crude proportions. Based on these results and well-established statistical theory, we advocate that investigators avoid rarefying altogether. We have provided microbiome-specific extensions to these tools in the R package, phyloseq.

Identical inferences about correlated evolution arise from ancestral state reconstruction and independent contrasts

Identical inferences about correlated evolution arise from ancestral state reconstruction and independent contrasts
Michael G. Elliot
(Submitted on 30 Sep 2013)

Inferences about the evolution of continuous traits based on reconstruction of ancestral states has often been considered more error-prone than analysis of independent contrasts. Here we show that both methods in fact yield identical estimators for the correlation coefficient and regression gradient of correlated traits, indicating that reconstructed ancestral states are a valid source of information about correlated evolution. We show that the independent contrast associated with a pair of sibling nodes on a phylogenetic tree can be expressed in terms of the maximum likelihood ancestral state function at those nodes and their common parent. This expression gives rise to novel formulae for independent contrasts for any model of evolution admitting of a local likelihood function. We thus derive new formulae for independent contrasts applicable to traits evolving under directional drift, and use simulated data to show that these directional contrasts provide better estimates of evolutionary model parameters than standard independent contrasts, when traits in fact evolve with a directional tendency.

Most viewed on Haldane’s Sieve: September 2013

The most viewed preprints on Haldane’s Sieve this month were:

Characterizing the infection-induced transcriptome of Nasonia vitripennis reveals a preponderance of taxonomically-restricted immune genes

Characterizing the infection-induced transcriptome of Nasonia vitripennis reveals a preponderance of taxonomically-restricted immune genes
Timothy B. Sackton, John H. Werren, Andrew G. Clark
(Submitted on 23 Sep 2013)

The innate immune system in insects consists of a conserved core signaling network and rapidly diversifying effector and recognition components, often containing a high proportion of taxonomically-restricted genes. In the absence of functional annotation, genes encoding immune system proteins can thus be difficult to identify, as homology-based approaches generally cannot detect lineage-specific genes. Here, we use RNA-seq to compare the uninfected and infection-induced transcriptome in the parasitoid wasp Nasonia vitripennis to identify genes regulated by infection. We identify 183 genes significantly up-regulated by infection and 61 genes significantly down-regulated by infection. We also produce a new homology-based immune catalog in N. vitripennis, and show that most infection-induced genes are not assigned an immune function from homology alone, suggesting the potential for substantial novel immune components in less-well-studied systems. Finally, we show that a high proportion of these novel induced genes are taxonomically-restricted, highlighting the rapid evolution of immune gene content. The combination of functional annotation using RNA-seq and homology-based annotation provides a robust method to characterize the innate immune response across a wide variety of insects, and reveals significant novel features of the Nasonia immune response.

The effect of paternal age on offspring intelligence and personality when controlling for paternal trait level

The effect of paternal age on offspring intelligence and personality when controlling for paternal trait level

Ruben C. Arslan, Lars Penke, Wendy Johnson, William G. Iacono, Matt McGue
(Submitted on 18 Sep 2013)

Paternal age at conception has been found to predict the number of new genetic mutations. We examined the effect of father’s age at birth on offspring intelligence, head circumference and personality traits. Using the Minnesota Twin Family Study sample we tested paternal age effects while controlling for parents’ trait levels measured with the same precision as offspring’s. From evolutionary genetic considerations we predicted a negative effect of paternal age on offspring intelligence, but not on other traits. Controlling for parental IQ had the effect of turning a positive-zero order association negative. We found paternal age effects on offspring IQ and MPQ Absorption, but they were not robustly significant, nor replicable with additional covariates. No other noteworthy effects were found. Parents’ intelligence and personality correlated with their ages at twin birth, which may have obscured a small negative effect of advanced paternal age (< 1% of variance explained) on intelligence. We discuss future avenues for studies of paternal age effects and suggest that stronger research designs are needed to rule out confounding factors involving birth order and the Flynn effect.

A Survey on Migration-Selection Models in Population Genetics

A Survey on Migration-Selection Models in Population Genetics
Reinhard Bürger
(Submitted on 10 Sep 2013)

This survey focuses on the most important aspects of the mathematical theory of population genetic models of selection and migration between discrete niches. Such models are most appropriate if the dispersal distance is short compared to the scale at which the environment changes, or if the habitat is fragmented. The general goal of such models is to study the influence of population subdivision and gene flow among subpopulations on the amount and pattern of genetic variation maintained. Only deterministic models are treated. Because space is discrete, they are formulated in terms of systems of nonlinear difference or differential equations. A central topic is the exploration of the equilibrium and stability structure under various assumptions on the patterns of selection and migration. Another important, closely related topic concerns conditions (necessary or sufficient) for fully polymorphic (internal) equilibria. First, the theory of one-locus models with two or multiple alleles is laid out. Then, mostly very recent, developments about multilocus models are presented. Finally, as an application, analysis and results of an explicit two-locus model emerging from speciation theory are highlighted.

A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity

A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity
M. Cyrus Maher, Ryan D. Hernandez
(Submitted on 9 Sep 2013)

Ortholog detection (OD) is a critical step for comparative genomic analysis of protein-coding sequences. There is a range of methods available for OD. However, relative performance varies by application, stymying attempts to identify a single best method. In this paper, we present a novel tool, MOSAIC, which is capable of integrating the entire swath of OD methods. We analyze the results of applying MOSAIC over four methodologically diverse OD methods. Relative to component and competing methods, we demonstrate large gains in the number of detected orthologs while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality.

Inferring selective constraint and recent gain and loss of function from population genomic data

Inferring selective constraint and recent gain and loss of function from population genomic data
Daniel R. Schrider, Andrew D. Kern
(Submitted on 10 Sep 2013)

The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human specific function in the genome. Using only allele frequency information from the complete low coverage 1000 Genomes Project dataset in conjunction with a support vector machine trained from known functional and non-functional portions of the genome, we are able to identify functional portions of the genome with extremely high accuracy (~88%). Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain of function along the human lineage include a novel isoform of a killer cell immunoglobulin-like receptor, while loss of function candidates include many members of a gene cluster involved in shaping the complexity of synaptic connections in the brain. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods.