Chimeric protein complexes in hybrid species generate novel evolutionary phenotypes
Elzbieta M. Piatkowska, David Knight, Daniela Delneri
(Submitted on 19 Sep 2012)
Hybridization between species is an important mechanism for the origin of novel lineages and adaptation to new environments. Increased allelic variation and modification of the transcriptional network are the two recognized forces currently deemed to be responsible for the phenotypic properties seen in hybrids. However, since the majority of the biological functions in a cell are carried out by protein complexes, inter-specific protein assemblies therefore represent another important source of natural variation upon which evolutionary forces can act. Here we studied the composition of six protein complexes in two different Saccharomyces “sensu strictu” hybrids, to understand whether chimeric interactions can be freely formed in the cell in spite of species-specific co-evolutionary forces, and whether the different types of complexes cause a change in hybrid fitness. The protein assemblies were isolated from the hybrids via affinity chromatography and identified via mass spectrometry. We found evidence of spontaneous chimericity for four of the six protein assemblies tested and we showed that different types of complexes can cause a variety of phenotypes in selected environments. In the case of TRP2/TRP3 complex, the effect of such chimeric formation resulted in the fitness advantage of the hybrid in an environment lacking tryptophan, while only one type of parental combination of the MBF complex could confer viability to the hybrid under respiratory conditions. This study provides empirical evidence that chimeric protein complexes can freely assemble in cells and reveals a new mechanism to generate phenotypic novelty and plasticity in hybrids to complement the genomic innovation resulting from gene duplication. The ability to exchange orthologous members has also important implications for the adaptation and subsequent genome evolution of the hybrids in terms of pattern of gene loss.
Category Archives: Uncategorized
Single–crossover recombination and ancestral recombination trees.
Single–crossover recombination and ancestral recombination trees.
by Ellen Baake, Ute von Wangenheim
We consider the Wright-Fisher model for a population of $N$ individuals, each identified with a sequence of a finite number of sites, and single-crossover recombination between them. We trace back the ancestry of single individuals from the present population. In the $N \to \infty$ limit without rescaling of parameters or time, this ancestral process is described by a random tree, whose branching events correspond to the splitting of the sequence due to recombination. With the help of a decomposition of the trees into subtrees and an inclusion-exclusion principle, we find a closed-form expression for the probabilities of the topologies of the ancestral trees. At the same time, these probabilities lead to an explicit solution of the deterministic single-crossover equation. The latter is a discrete-time dynamical system that emerges from the Wright-Fisher model via a law of large numbers and has been waiting for a solution for many decades.
A semi-automatic method to guide the choice of ridge parameter in ridge regression
A semi-automatic method to guide the choice of ridge parameter in ridge regression
Erika Cule, Maria De Iorio
(Submitted on 3 May 2012)
We consider the application of a popular penalised regression method, Ridge Regression, to data with very high dimensions and many more covariates than observations. Our motivation is the problem of out-of-sample prediction and the setting is high-density genotype data from a genome-wide association or resequencing study. Ridge regression has previously been shown to offer improved performance for prediction when compared with other penalised regression methods. One problem with ridge regression is the choice of an appropriate parameter for controlling the amount of shrinkage of the coefficient estimates. Here we propose a method for choosing the ridge parameter based on controlling the variance of the predicted observations in the model.
Using simulated data, we demonstrate that our method outperforms subset selection based on univariate tests of association and another penalised regression method, HyperLasso regression, in terms of improved prediction error. We extend our approach to regression problems when the outcomes are binary (representing cases and controls, as is typically the setting for genome-wide association studies) and demonstrate the method on a real data example consisting of case-control and genotype data on Bipolar Disorder, taken from the Wellcome Trust Case Control Consortium and the Genetic Association Information Network.
LMM-Lasso: A Lasso Multi-Marker Mixed Model for Association Mapping with Population Structure Correction
LMM-Lasso: A Lasso Multi-Marker Mixed Model for Association Mapping with Population Structure Correction
Barbara Rakitsch, Christoph Lippert, Oliver Stegle, Karsten Borgwardt
(Submitted on 30 May 2012)
Exploring the genetic basis of heritable traits remains one of the central challenges in biomedical research. In simple cases, single polymorphic loci explain a significant fraction of the phenotype variability. However, many traits of interest appear to be subject to multifactorial control by groups of genetic loci instead. Accurate detection of such multivariate associations is nontrivial and often hindered by limited power. At the same time, confounding influences such as population structure cause spurious association signals that result in false positive findings if they are not accounted for in the model. Here, we propose LMM-Lasso, a mixed model that allows for both, multi-locus mapping and correction for confounding effects. Our approach is simple and free of tuning parameters, effectively controls for population structure and scales to genome-wide datasets. We show practical use in genome-wide association studies and linkage mapping through retrospective analyses. In data from Arabidopsis thaliana and mouse, our method is able to find a genetic cause for significantly greater fractions of phenotype variation in 91% of the phenotypes considered. At the same time, our model dissects this variability into components that result from individual SNP effects and population structure. In addition to this increase of genetic heritability, enrichment of known candidate genes suggests that the associations retrieved by LMM-Lasso are more likely to be genuine.
Kernel Approximate Bayesian Computation for Population Genetic Inferences
Kernel Approximate Bayesian Computation for Population Genetic Inferences
Shigeki Nakagome, Kenji Fukumizu, Shuhei Mano
(Submitted on 15 May 2012)
As genomic data accumulate, Bayesian inferences can be applied to estimate evolutionary parameters. However, the complexity of stochastic models used in population genetics makes it difficult to derive the likelihoods needed for Bayesian inferences. Approximate Bayesian Computation (ABC) is an alternative approach for obtaining Bayesian inferences without likelihoods. ABC is a rejection-based method that applies a tolerance of dissimilarity between sets of summary statistics from observed and simulated data. ABC gives an exact sampler from the posterior density in the limit of zero tolerance. However, the choices for summary statistics and metrics of dissimilarity are ambiguous, and acceptance rates decrease with an increasing number of summary statistics. Therefore, it is difficult to maintain estimator consistency using ABC. In this study, we apply the kernel Bayes’ rule proposed by Fukumizu et al. (2011) to ABC. We report that kernel ABC (i) avoids the need for tolerance, (ii) upholds the consistency of estimators, and (iii) is tractable for a large number of summary statistics. We demonstrate these advantages by comparing kernel ABC with conventional ABC for population genetic inferences.
Structured Input-Output Lasso, with Application to eQTL Mapping, and a Thresholding Algorithm for Fast Estimation
Structured Input-Output Lasso, with Application to eQTL Mapping, and a Thresholding Algorithm for Fast Estimation
Seunghak Lee, Eric P. Xing
(Submitted on 9 May 2012)
We consider the problem of learning a high-dimensional multi-task regression model, under sparsity constraints induced by presence of grouping structures on the input covariates and on the output predictors. This problem is primarily motivated by expression quantitative trait locus (eQTL) mapping, of which the goal is to discover genetic variations in the genome (inputs) that influence the expression levels of multiple co-expressed genes (outputs), either epistatically, or pleiotropically, or both. A structured input-output lasso (SIOL) model based on an intricate l1/l2-norm penalty over the regression coefficient matrix is employed to enable discovery of complex sparse input/output relationships; and a highly efficient new optimization algorithm called hierarchical group thresholding (HiGT) is developed to solve the resultant non-differentiable, non-separable, and ultra high-dimensional optimization problem. We show on both simulation and on a yeast eQTL dataset that our model leads to significantly better recovery of the structured sparse relationships between the inputs and the outputs, and our algorithm significantly outperforms other optimization techniques under the same model. Additionally, we propose a novel approach for efficiently and effectively detecting input interactions by exploiting the prior knowledge available from biological experiments.
An efficient group test for genetic markers that handles confounding.
An efficient group test for genetic markers that handles confounding. (arXiv:1205.0793v1 [q-bio.GN])
by Jennifer Listgarten, Christoph Lippert, David Heckerman
Approaches for testing groups of variants for association with complex traits are becoming critical. Examples of groups typically include a set of rare or common variants within a gene, but could also be variants within a pathway or any other set. These tests are critical for aggregation of weak signal within a group, allow interplay among variants to be captured, and also reduce the problem of multiple hypothesis testing. Unfortunately, these approaches do not address confounding by, for example, family relatedness and population structure, a problem that is becoming more important as larger data sets are used to increase power. We introduce a new approach for group tests that can handle confounding, based on Bayesian linear regression, which is equivalent to the linear mixed model. The approach uses two sets of covariates (equivalently, two random effects), one to capture the group association signal and one to capture confounding. We also introduce a computational speedup for the two-random-effects model that makes this approach feasible even for extremely large cohorts, whereas it otherwise would not be. Application of our approach to richly structured GAW14 data, comprising over eight ethnicities and many related family members, demonstrates that our method successfully corrects for population structure, while application of our method to WTCCC Crohn’s disease and hypertension data demonstrates that our method recovers genes not recoverable by univariate analysis, while still correcting for confounding structure.
Landscape genomic tests for associations between loci and environmental gradients
Landscape genomic tests for associations between loci and environmental gradients
Eric Frichot (1), Sean Schoville (1), Guillaume Bouchard (2), Olivier François (1) ((1) UJF, CNRS, TIMC-IMAG, FRANCE, (2) Xerox Research Center Europe, France)
(Submitted on 15 May 2012)
Adaptation to local environments often occurs through natural selection acting on large number of alleles, each having a weak phenotypic effect. One way to detect those alleles is by identifying genetic polymorphisms that exhibit high correlation with some environmental gradient or with the variables used as proxies for ecological pressures. Here we proposed an integrated framework based on population genetics, ecological modeling and machine learning techniques for screening genomes for signatures of local adaptation. We implemented fast algorithms using a hierarchical Bayesian mixed model based on a variant of principal component analysis in which residual population structure is introduced via unobserved or latent factors. Our algorithms can detect correlations between environmental and genetic variation at the same time as they infer the background levels of population structure. We provided evidence that latent factor models efficiently estimated random effects due to population history and isolation-by-distance mechanisms when computing gene-environment correlations, and that they decreased the number of false-positive associations in genome scans for selection. We applied these models to plant and human genetic data and we detected several genes with functions related to multicellular organ development exhibiting unusual correlations with climatic gradients.
Emergence of clones in sexual populations
Emergence of clones in sexual populations
Richard A. Neher, Marija Vucelja, Marc Mézard, Boris I. Shraiman
(Submitted on 9 May 2012 (v1), last revised 21 Jul 2012 (this version, v2))
In sexual population, recombination reshuffles genetic variation and produces novel combinations of existing alleles, while selection amplifies the fittest genotypes in the population. If recombination is more rapid than selection, populations consist of a diverse mixture of many genotypes, as is observed in many populations. In the opposite regime, which is realized for example in the facultatively sexual populations that outcross in only a fraction of reproductive cycles, selection can amplify individual genotypes into large clones. Such clones emerge when the fitness advantage of some of the genotypes is large enough that they grow to a significant fraction of the population despite being broken down by recombination. The occurrence of this “clonal condensation” depends, in addition to the outcrossing rate, on the heritability of fitness. Clonal condensation leads to a strong genetic heterogeneity of the population which is not adequately described by traditional population genetics measures, such as Linkage Disequilibrium. Here we point out the similarity between clonal condensation and the freezing transition in the Random Energy Model of spin glasses. Guided by this analogy we explicitly calculate the probability, Y, that two individuals are genetically identical as a function of the key parameters of the model. While Y is the analog of the spin-glass order parameter, it is also closely related to rate of coalescence in population genetics: Two individuals that are part of the same clone have a recent common ancestor.