Functional analysis and co-evolutionary model of chromatin and DNA methylation networks in embryonic stem cells

Functional analysis and co-evolutionary model of chromatin and DNA methylation networks in embryonic stem cells
Enrique Carrillo de Santa Pau, Juliane Perner, David Juan, Simone Marsili, David Ochoa, Ho-Ryun Chung, Daniel Rico, Martin Vingron, Alfonso Valencia
doi: http://dx.doi.org/10.1101/008821
We have analyzed publicly available epigenomic data of mouse embryonic stem cells (ESCs) combining diverse next-generation sequencing (NGS) studies (139 experiments from 30 datasets with a total of 77 epigenomic features) into a homogeneous dataset comprising various cytosine modifications (5mC, 5hmC and 5fC), histone marks and Chromatin related Proteins (CrPs). We applied a set of newly developed statistical analysis methods with the goal of understanding the associations between chromatin states, detecting co-occurrence of DNA-protein binding and epigenetic modification events, as well as detecting coevolution of core CrPs. The resulting networks reveal the complex relations between cytosine modifications and protein complexes and their dependence on defined ESC chromatin contexts. A detailed analysis allows us to detect proteins associated to particular chromatin states whose functions are related to the different cytosine modifications, i.e. RYBP with 5fC and 5hmC, NIPBL with 5hmC and OGT with 5hmC. Moreover, in a co-evolutionary analysis suggesting a central role of the Cohesin complex in the evolution of the epigenomic network, as well as strong co-evolutionary links between proteins that co-locate in the ESC epigenome with DNA methylation (MBD2 and CBX3) and hydroxymethylation (TET1 and KDM2A). In summary, the new application of computational methodologies reveals the complex network of relations between cytosine modifications and epigenomic players that is essential in shaping the molecular state of ESCs.

Generation of a Panel of Induced Pluripotent Stem Cells From Chimpanzees: a Resource for Comparative Functional Genomics

Generation of a Panel of Induced Pluripotent Stem Cells From Chimpanzees: a Resource for Comparative Functional Genomics
Irene Gallego Romero, Bryan J Pavlovic, Irene Hernando-Herraez, Nicholas E Banovich, Courtney L Kagan, Jonathan E Burnett, Constance H Huang, Amy Mitrano, Claudia I Chavarria, Inbar F Ben-Nun, Yingchun Li, Karen Sabatini, Trevor Leonardo, Mana Parast, Tomas Marques-Bonet, Louise C Laurent, Jeanne F Loring, Yoav Gilad
doi: http://dx.doi.org/10.1101/008862

Comparative genomics studies in primates are extremely restricted because we only have access to a few types of cell lines from non-human apes and to a limited collection of frozen tissues. In order to gain better insight into regulatory processes that underlie variation in complex phenotypes, we must have access to faithful model systems for a wide range of tissues and cell types. To facilitate this, we have generated a panel of 7 fully characterized chimpanzee (Pan troglodytes) induced pluripotent stem cell (iPSC) lines derived from fibroblasts of healthy donors. All lines appear to be free of integration from exogenous reprogramming vectors, can be maintained using standard iPSC culture techniques, and have proliferative and differentiation potential similar to human and mouse lines. To begin demonstrating the utility of comparative iPSC panels, we collected RNA sequencing data and methylation profiles from the chimpanzee iPSCs and their corresponding fibroblast precursors, as well as from 7 human iPSCs and their precursors, which were of multiple cell type and population origins. Overall, we observed much less regulatory variation within species in the iPSCs than in the somatic precursors, indicating that the reprogramming process has erased many of the differences observed between somatic cells of different origins. We identified 4,918 differentially expressed genes and 3,598 differentially methylated regions between iPSCs of the two species, many of which are novel inter-species differences that were not observed between the somatic cells of the two species. Our panel will help realise the potential of iPSCs in primate studies, and in combination with genomic technologies, transform studies of comparative evolution.

Mixed Model with Correction for Case-Control Ascertainment Increases Association Power

Mixed Model with Correction for Case-Control Ascertainment Increases Association Power

tristan hayeck, Noah Zaitlen, Po-Ru Loh, Bjarni Vilhjalmsson, Samuela Pollack, Alexander Gusev, Jian Yang, Guo-Bo Chen, Michael E. Goddard, Peter M. Visscher, Nick Patterson, Alkes Price
doi: http://dx.doi.org/10.1101/008755

We introduce a Liability Threshold Mixed Linear Model (LTMLM) association statistic for ascertained case-control studies that increases power vs. existing mixed model methods, with a well-controlled false-positive rate. Recent work has shown that existing mixed model methods suffer a loss in power under case-control ascertainment, but no solution has been proposed. Here, we solve this problem using a chi-square score statistic computed from posterior mean liabilities (PML) under the liability threshold model. Each individual’s PML is conditional not only on that individual’s case-control status, but also on every individual’s case-control status and on the genetic relationship matrix obtained from the data. The PML are estimated using a multivariate Gibbs sampler, with the liability-scale phenotypic covariance matrix based on the genetic relationship matrix (GRM) and a heritability parameter estimated via Haseman-Elston regression on case-control phenotypes followed by transformation to liability scale. In simulations of unrelated individuals, the LTMLM statistic was correctly calibrated and achieved higher power than existing mixed model methods in all scenarios tested, with the magnitude of the improvement depending on sample size and severity of case-control ascertainment. In a WTCCC2 multiple sclerosis data set with >10,000 samples, LTMLM was correctly calibrated and attained a 4.1% improvement (P=0.007) in chi-square statistics (vs. existing mixed model methods) at 75 known associated SNPs, consistent with simulations. Larger increases in power are expected at larger sample sizes. In conclusion, an increase in power over existing mixed model methods is available for ascertained case-control studies of diseases with low prevalence.

Cross-population Meta-analysis of eQTLs: Fine Mapping and Functional Study

Cross-population Meta-analysis of eQTLs: Fine Mapping and Functional Study

Xiaoquan Wen, Francesca Luca, Roger Pique-Regi
doi: http://dx.doi.org/10.1101/008797

Mapping expression quantitative trait loci (eQTLs) has been shown as a powerful tool to uncover the genetic underpinnings of many complex traits at the molecular level. In this paper, we present an integrative analysis approach that leverages eQTL data collected from multiple population groups. In particular, our approach effectively identifies multiple independent cis-eQTL signals that are consistently presented across populations, accounting for population heterogeneity in allele frequencies and linkage disequilibrium patterns. Furthermore, by integrating genomic annotations, our analysis framework enables high-resolution functional analysis of eQTLs. We applied our statistical approach to analyze the GEUVADIS data consisting of samples from five population groups. From this analysis, we concluded that i) jointly analysis across population groups greatly improves the power of eQTL discovery and the resolution of fine mapping of causal eQTL. ii) many genes harbor multiple independent eQTLs in their cis regions iii) genetic variants that disrupt transcription factor binding are significantly enriched in eQTLs (p-value = 4.93 × 10-22).

The projection of a test genome onto a reference population and applications to humans and archaic hominins

The projection of a test genome onto a reference population and applications to humans and archaic hominins

Melinda A Yang, Montgomery Slatkin
doi: http://dx.doi.org/10.1101/008805

We introduce a method for comparing a test genome with numerous genomes from a reference population. Sites in the test genome are given a weight w that depends on the allele frequency x in the reference population. The projection of the test genome onto the reference population is the average weight for each x, w(x). The weight is assigned in such a way that if the test genome is a random sample from the reference population, w(x)=1. Using analytic theory, numerical analysis, and simulations, we show how the projection depends on the time of population splitting, the history of admixture and changes in past population size. The projection is sensitive to small amounts of past admixture, the direction of admixture and admixture from a population not sampled (a ghost population). We compute the projection of several human and two archaic genomes onto three reference populations from the 1000 Genomes project, Europeans (CEU), Han Chinese (CHB) and Yoruba (YRI) and discuss the consistency of our analysis with previously published results for European and Yoruba demographic history. Including higher amounts of admixture between Europeans and Yoruba soon after their separation and low amounts of admixture more recently can resolve discrepancies between the projections and demographic inferences from some previous studies.

Highly epistatic genetic architecture of root length in Arabidopsis thaliana

Highly epistatic genetic architecture of root length in Arabidopsis thaliana

Jennifer Lachowiec, Xia Shen, Christine Queitsch, Örjan Carlborg
doi: http://dx.doi.org/10.1101/008789

Efforts to identify loci underlying complex traits generally assume that most genetic variance is additive. This is despite the fact that non-additive genetic effects, such as epistatic interactions and developmental noise, are also likely to make important contributions to the phenotypic variability. Analyses beyond additivity require additional care in the design and collection of data, and introduce significant analytical and computational challenges in the statistical analyses. Here, we have conducted a study that, by focusing on a model complex trait that allows precise phenotyping across many replicates and by applying advanced analytical tools capable of capturing epistatic interactions, overcome these challenges. Specifically, we examined the genetic determinants of Arabidopsis thaliana root length, considering both trait mean and variance. Analysis of narrow-and broad-sense heritability of mean root length identified a large contribution of non-additive variation and a low contribution of additive variation. Also, no loci were found to contribute to mean root length using a standard additive model based genome-wide association analysis (GWAS). We could, however, identify one locus regulating developmental noise and seven loci contributing to root-length through epistatic interactions, and four of these were also experimentally confirmed. The candidate locus associated with root length variance contains a candidate gene that, when mutated, appears to decrease developmental noise. This is particularly interesting as most other known noise regulators in multicellular organisms increase noise when mutated. The mutant analysis of candidate genes within the seven epistatic loci implicated four genes in root development, including three without previously described root phenotypes. In summary, we identify several novel genes affecting root development, demonstrate the benefits of advanced analytical tools to study the genetic determinants of complex traits, and show that epistatic interactions can be a major determinant of complex traits in A. thaliana.

The general recombination equation in continuous time and its solution

The general recombination equation in continuous time and its solution

Ellen Baake, Michael Baake, Majid Salamat
(Submitted on 4 Sep 2014)

The process of recombination in population genetics, in its deterministic limit, leads to a nonlinear ODE in the Banach space of finite measures on a locally compact product space. It has an embedding into a larger family of nonlinear ODEs that permits a systematic analysis with lattice-theoretic methods for general partitions of finite sets. We discuss this type of system, reduce it to an equivalent finite-dimensional nonlinear problem, and solve the latter recursively for generic sets of parameters. We also briefly discuss the singular cases, and how to extend the solution to this situation.

Tracing the genetic origin of Europe’s first farmers reveals insights into their social organization

Tracing the genetic origin of Europe’s first farmers reveals insights into their social organization

Anna Szécsényi-Nagy, Guido Brandt, Victoria Keerl, János Jakucs, Wolfgang Haak, Sabine Möller-Rieker, Kitti Köhler, Balázs Mende, Marc Fecher, Krisztián Oross, Tibor Paluch, Anett Osztás, Viktória Kiss, György Pálfi, Erika Molnár, Katalin Sebők, András Czene, Tibor Paluch, Mario Šlaus, Mario Novak, Nives Pećina-Šlaus, Brigitta Ősz, Vanda Voicsek, Krisztina Somogyi, Gábor Tóth, Bernd Kromer, Eszter Bánffy, Kurt Alt

Farming was established in Central Europe by the Linearbandkeramik culture (LBK), a well-investigated archaeological horizon, which emerged in the Carpathian Basin, in today’s Hungary. However, the genetic background of the LBK genesis has not been revealed yet. Here we present 9 Y chromosomal and 84 mitochondrial DNA profiles from Mesolithic, Neolithic Starčevo and LBK sites (7th/6th millennium BC) from the Carpathian Basin and south-eastern Europe. We detect genetic continuity of both maternal and paternal elements during the initial spread of agriculture, and confirm the substantial genetic impact of early farming south-eastern European and Carpathian Basin cultures on Central European populations of the 6th-4th millennium BC. Our comprehensive Y chromosomal and mitochondrial DNA population genetic analyses demonstrate a clear affinity of the early farmers to the modern Near East and Caucasus, tracing the expansion from that region through south-eastern Europe and the Carpathian Basin into Central Europe. Our results also reveal contrasting patterns for male and female genetic diversity in the European Neolithic, suggesting patrilineal descent system and patrilocal residential rules among the early farmers.

Conservation of expression regulation throughout the animal kingdom

Conservation of expression regulation throughout the animal kingdom

Michael Kuhn, Andreas Beyer
doi: http://dx.doi.org/10.1101/007252

Gene expression programs have been found to be highly conserved between closely related species, especially when comparing the same tissue types between species. Such analysis is, however, much more challenging over larger evolutionary distances when complementary tissues cannot readily be defined. Here, we present the first cross-species mapping of tissue-specific and developmental gene expression patterns across a wide range of animals, including many non-model species. Importantly, our approach does not require the definition of homologous tissues. In our survey of 32 datasets across 23 species, we detected conserved expression programs on all taxonomic levels, both within animals and between the animals and their closest unicellular relatives, the choanoflagellates. We found that the rate of change in tissue expression patterns is a property of gene families. Subsequently, we used the conservation of expression programs as a means to identify neofunctionalization of gene duplication products. We found 1206 duplication events where one of the two genes kept the expression program of the original gene, whereas the other copy adopted a novel expression program. We corroborated such potential neofunctionalizations using independent network information: the duplication product with the more conserved expression pattern shared more interaction partners with the non-duplicated reference gene than the more divergent duplication product. Our findings open new avenues of study for the comparison and transfer of knowledge between different species.

Looking down in the ancestral selection graph: A probabilistic approach to the common ancestor type distribution

Looking down in the ancestral selection graph: A probabilistic approach to the common ancestor type distribution

Ute Lenz, Sandra Kluth, Ellen Baake, Anton Wakolbinger
(Submitted on 2 Sep 2014)

In a (two-type) Wright-Fisher diffusion with directional selection and two-way mutation, let x denote today’s frequency of the beneficial type, and given x, let h(x) be the probability that, among all individuals of today’s population, the individual whose progeny will eventually take over in the population is of the beneficial type. Fearnhead [Fearnhead, P., 2002. The common ancestor at a nonneutral locus. J. Appl. Probab. 39, 38-54] and Taylor [Taylor, J. E., 2007. The common ancestor process for a Wright-Fisher diffusion. Electron. J. Probab. 12, 808-847] obtained a series representation for h(x). We develop a construction that contains elements of both the ancestral selection graph and the lookdown construction and includes pruning of certain lines upon mutation. Besides interest in its own right, this construction allows a transparent derivation of the series coefficients of h(x) and gives them a probabilistic meaning.