Detection of correlation between genotypes and environmental variables. A fast computational approach for genomewide studies

Detection of correlation between genotypes and environmental variables. A fast computational approach for genomewide studies
Gilles Guillot
(Submitted on 5 Jun 2012)

Genomic regions displaying outstanding correlation with some environmental variables are likely to be under selection and this is the rationale of recent methods of identifying selected loci and retrieve functional information about them. To be efficient, such methods need to be able to disentangle the potential effect of environmental variables from the confounding effect of population history. For the routine analysis of genomewide data-sets, one also need fast inference and model selection algorithms. We describe a method based on an explicit spatial model that builds on the theoretical and computational framework developed by Rue et al. (2009) and Lindgren et al. (2011}. The methods allows one to quantify correlation between genotypes and environmental variables and to rank loci accordingly. It works for SNP and AFLP data obtained either at the individual or at the population level. We provide R scripts with detailed comments that can be used readily for the analysis of real data without specific prior knowledge of the R language.

Lineage-specific transposons drove massive gene expression recruitments during the evolution of pregnancy in mammals

Lineage-specific transposons drove massive gene expression recruitments during the evolution of pregnancy in mammals
Vincent J. Lynch, Mauris Nnamani, Kathryn J. Brayer, Deena Emera, Joel O. Wertheim, Sergei L. Kosakovsky Pond, Frank Grützner, Stefan Bauersachs, Alexander Graf, Aurélie Kapusta, Cédric Feschotte, Günter P. Wagner
(Submitted on 22 Aug 2012)

A major challenge in biology is explaining how novel characters originate, however, the molecular mechanisms that underlie the emergence of evolutionary innovations are unclear. Here we show that while gene expression in the uterus evolves at a slow and relatively constant rate, it has been punctuated by periods of rapid change associated with the recruitment of thousands of genes into uterine expression during the evolution of pregnancy in mammals. We found that numerous genes and signaling pathways essential for the establishment of pregnancy and maternal-fetal communication evolved uterine expression in mammals. Remarkably the majority of genes recruited into endometrial expression have cis-regulatory elements derived from lineage-specific transposons, suggesting that that bursts of transposition facilitate adaptation and speciation through genomic and regulatory reorganization.

Blood ties: ABO is a trans-species polymorphism in primates

Blood ties: ABO is a trans-species polymorphism in primates
Laure Ségurel, Emma E. Thompson, Timothée Flutre, Jessica Lovstad, Aarti Venkat, Susan W. Margulis, Jill Moyse, Steve Ross, Kathryn Gamble, Guy Sella, Carole Ober, Molly Przeworski
(Submitted on 22 Aug 2012)

The ABO histo-blood group, the critical determinant of transfusion incompatibility, was the first genetic polymorphism discovered in humans. Remarkably, ABO antigens are also polymorphic in many other primates, with the same two amino acid changes responsible for A and B specificity in all species sequenced to date. Whether this recurrence of A and B antigens is the result of an ancient polymorphism maintained across species or due to numerous, more recent instances of convergent evolution has been debated for decades, with a current consensus in support of convergent evolution. We show instead that genetic variation data in humans and gibbons as well as in Old World Monkeys are inconsistent with a model of convergent evolution and support the hypothesis of an ancient, multi-allelic polymorphism of which some alleles are shared by descent among species. These results demonstrate that the ABO polymorphism is a trans-species polymorphism among distantly related species and has remained under balancing selection for tens of millions of years, to date, the only such example in Hominoids and Old World Monkeys outside of the Major Histocompatibility Complex.

Our paper: The Genomic Signature of Crop-Wild Introgression in Maize

Our inaugural author post is by Matt Hufford and Jeff Ross-Ibarra [@lab_ri] on their paper:
The Genomic Signature of Crop-Wild Introgression in Maize ArXived here.

Evolutionary biologists have long been fascinated by introgressive hybridization. Numerous examples in which introgression has played an important evolutionary role are known, but genetic characterization has typically focused on only a handful of loci.

We took advantage of the recent development of inexpensive genotyping to address a long-standing question of introgression in maize evolution. Maize was domesticated in the warm low elevations of southwest Mexico, and likely colonized the highlands of central Mexico only thousands of years later. Maize is frequently cultivated in sympatry with its wild relatives the teosintes and is known to hybridize with them. Hybridization is especially common in the highlands, where maize and teosinte share several derived morphological features thought to be adaptive to high elevation.

We set out to discover the genomic extent of introgression in highland maize and teosinte populations and the degree to which this has been adaptive. We genotyped 9 sympatric population pairs of maize and teosinte at ~39,000 SNPs. We used two different algorithms (in the software STRUCTURE and HAPMIX) to model chromosomes as mosaics of maize and teosinte, and characterized regions of putative introgression. Surprisingly, we found shared regions of introgression across many populations and primarily only from teosinte into maize. To test whether this introgression may have facilitated maize adaptation to the highlands, we conducted a growth chamber experiment that revealed significant differences in putatively adaptive morphological traits between maize populations with and without introgression.

We submitted the paper to arXiv because this is a fast-moving area for empirical evolutionary genomics and we hoped to start the dialogue early on how to move forward with our results. We’d like feedback on the paper and specifically the following questions:

Are there recent advances in modeling admixture and introgression that we should apply?

Are our main findings surprising considering the putative history of maize diffusion?

Matt Hufford and Jeff Ross Ibarra

The variance of identity-by-descent sharing in the Wright-Fisher model

The variance of identity-by-descent sharing in the Wright-Fisher model

Shai Carmi, Pier Francesco Palamara, Vladimir Vacic, Todd Lencz, Ariel Darvasi, Itsik Pe’er
(Submitted on 21 Jun 2012)

Widespread sharing of long, identical-by-descent (IBD) genetic segments is a hallmark of populations that have experienced a recent bottleneck. The detection of these IBD segments is now feasible, enabling a wide range of applications from phasing and imputation to demographic inference. Here, we study the distribution of IBD sharing in the Wright-Fisher model. Using coalescent theory, we calculate the mean and variance of the total sharing between arbitrary pairs of individuals. We then study the cohort-averaged sharing: the average total sharing between one individual to the rest of the cohort. We find that for large cohorts, the cohort-averaged sharing is distributed approximately normally. Surprisingly, the variance of this distribution remains large even for large cohorts, implying the existence of “hyper-sharing” individuals. The presence of such individuals bears important consequences to the design of sequencing studies, since, if they are selected for whole-genome sequencing, a larger fraction of the cohort can be subsequently imputed. We calculate the expected gain in power of imputation by IBD, and subsequently, in power to detect an association, when individuals are either randomly selected or are specifically the hyper-sharing individuals. Finally, we study the distribution of pairwise sharing and cohort-averaged sharing in the Ashkenazi Jewish population.

Integrated analysis of variants and pathways in genome-wide association studies using polygenic models of disease

Integrated analysis of variants and pathways in genome-wide association studies using polygenic models of disease

Peter Carbonetto, Matthew Stephens
(Submitted on 21 Aug 2012)

Many common diseases are highly polygenic, modulated by a large number genetic factors with small effects on susceptibility to disease. These small effects are difficult to map reliably in genetic association studies. To address this problem, researchers have developed methods that aggregate information over sets of related genes, such as biological pathways, to identify gene sets that are enriched for genetic variants associated with disease. However, these methods fail to answer a key question: which genes and genetic variants are associated with disease risk? We develop a method based on sparse multiple regression that simultaneously identifies enriched pathways, and prioritizes the variants within these pathways, to locate additional variants associated with disease susceptibility. A central feature of our approach is an estimate of the strength of enrichment, which yields a coherent way to prioritize variants in enriched pathways. We illustrate the benefits of our approach in a genome-wide association study of Crohn’s disease with ~440,000 genetic variants genotyped for ~4700 study subjects. We obtain strong support for enrichment of IL-12, IL-23 and other cytokine signaling pathways. Furthermore, prioritizing variants in these enriched pathways yields support for additional disease-association variants, all of which have been independently reported in other case-control studies for Crohn’s disease.

Approximate Bayesian computation via empirical likelihood

Approximate Bayesian computation via empirical likelihood
K. L. Mengersen (QUT, Brisbane), P. Pudlo (Universite Montpellier 2), C. P. Robert (Universite Paris-Dauphine)
(Submitted on 25 May 2012)

Approximate Bayesian computation (ABC) has now become an essential tool for the analysis of complex stochastic models when the likelihood function is unavailable. The well-established statistical method of empirical likelihood however provides another route to such settings that bypasses simulations from the model and the choices of the ABC parameters (summary statistics, distance, tolerance), while being provably convergent in the number of observations. Furthermore, avoiding model simulations leads to significant time savings in complex models, such as those used in population genetics. The ABCel algorithm we develop in this paper also provides an evaluation of its own performance through an associated effective sample size. The method is illustrated using several examples, including estimation of standard and quantile distributions, and time series and population genetics models.

Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster

Population genomics of the Wolbachia endosymbiont in Drosophila melanogaster

Mark F. Richardson, Lucy A. Weinert, John J. Welch, Raquel S. Linheiro, Michael M. Magwire, Francis M. Jiggins, Casey M. Bergman
(Submitted on 25 May 2012 (v1), last revised 2 Aug 2012 (this version, v2))

Wolbachia are maternally-inherited symbiotic bacteria commonly found in arthropods, which are able to manipulate the reproduction of their host in order to maximise their transmission. Here we use whole genome resequencing data from 290 lines of Drosophila melanogaster from North America, Europe and Africa to predict Wolbachia infection status, estimate cytoplasmic genome copy number, and reconstruct Wolbachia and mtDNA genome sequences. Complete Wolbachia and mitochondrial genomes show congruent phylogenies, consistent with strict vertical transmission through the maternal cytoplasm and recurrent loss of Wolbachia in multiple populations. Bayesian phylogenetic analysis reveals that the most recent common ancestor of all Wolbachia and mitochondrial genomes in D. melanogaster dates to around 8,000 years ago. We find evidence for a recent incomplete global replacement of ancestral Wolbachia and mtDNA lineages, which is likely to be one of several similar incomplete replacement events that have occurred since the out-of-Africa migration that allowed D. melanogaster to colonize worldwide habitats.

Single–crossover recombination and ancestral recombination trees.

Single–crossover recombination and ancestral recombination trees.
by Ellen Baake, Ute von Wangenheim

We consider the Wright-Fisher model for a population of $N$ individuals, each identified with a sequence of a finite number of sites, and single-crossover recombination between them. We trace back the ancestry of single individuals from the present population. In the $N \to \infty$ limit without rescaling of parameters or time, this ancestral process is described by a random tree, whose branching events correspond to the splitting of the sequence due to recombination. With the help of a decomposition of the trees into subtrees and an inclusion-exclusion principle, we find a closed-form expression for the probabilities of the topologies of the ancestral trees. At the same time, these probabilities lead to an explicit solution of the deterministic single-crossover equation. The latter is a discrete-time dynamical system that emerges from the Wright-Fisher model via a law of large numbers and has been waiting for a solution for many decades.

Generative Probabilistic Model for Detecting Selection on Dispersed Genomic Elements from Polymorphism and Divergence

Generative Probabilistic Model for Detecting Selection on Dispersed Genomic Elements from Polymorphism and Divergence
Ilan Gronau, Leonardo Arbiza, Adam Siepel
(Submitted on 29 Sep 2011 (v1), last revised 13 Aug 2012 (this version, v3))

We present a new probabilistic method for measuring the influence of natural selection on a collection of short elements scattered across a genome based on observed patterns of polymorphism and divergence. This is a challenging task for various reasons, including variation across loci in mutation rates and genealogical backgrounds, and the influence of demography on patterns of polymorphism. In addition, accounting for the combined effects of different modes of selection is known to be a serious challenge for tests of selection that use patterns of polymorphism and divergence. Our method addresses these challenges by contrasting patterns of polymorphism and divergence in the elements of interest with those in flanking neutral sites. While this general approach is common to several existing tests of selection, our method improves substantially on these methods by making use of a full generative probabilistic model, directly accommodating weak negative selection, allowing information from many short elements to be combined in a statistically rigorous manner, and integrating phylogenetic information from multiple outgroup species with genome-wide population genetic data. Our model is able to account for of weak negative, strong negative, and strong positive selection, by making a small set of simple assumptions on their separate effects on polymorphism and divergence. We implemented an expectation maximization algorithm for inference under this model and applied it to simulated and real data. Using simulations, we show that our inference procedure effectively disentangles the different modes of selection and provides accurate estimates of the parameters of interest that are robust to demography. We demonstrate an application of our methods to real data by analyzing several collections of human transcription factor binding sites identified using recently generated genome-wide ChIP-seq data.