Whole Genome Regulatory Variant Evaluation for Transcription Factor Binding

Whole Genome Regulatory Variant Evaluation for Transcription Factor Binding

Haoyang Zeng , Tatsunori Hashimoto , Daniel D. Kang , David K. Gifford
doi: http://dx.doi.org/10.1101/017392

Contemporary approaches to predict single nucleotide polymorphisms (SNPs) that alter transcription factor binding rely upon the sequence affinity of a transcription factor as represented by its canonical motif. WAVE (Whole-genome regulAtory Variants Evaluation) is a novel method for predicting more general regulatory variants that affect transcription factor binding, including those that fall outside of the canonical motif. WAVE learns a k-mer based generative model of transcription factor binding from ChIP-seq data and scores variants using its generative binding model. The k-mers learned by WAVE capture more sequence feature in transcription factor binding than a motif-based approach alone, including both a transcription factor’s canonical motif as well as associated co-factor motifs. WAVE significantly outperforms motif-based methods in predicting SNPs associated with allele-specific binding.

Abundant contribution of short tandem repeats to gene expression variation in humans

Abundant contribution of short tandem repeats to gene expression variation in humans

Melissa Gymrek , Thomas Willems , Haoyang Zeng , Barak Markus , Mark J Daly , Alkes L Price , Jonathan Pritchard , Yaniv Erlich
doi: http://dx.doi.org/10.1101/017459

Expression quantitative trait loci (eQTLs) are a key tool to dissect cellular processes mediating complex diseases. However, little is known about the role of repetitive elements as eQTLs. We report a genome-wide survey of the contribution of Short Tandem Repeats (STRs), one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from linked SNPs and indels and found that eSTRs contribute 10%-15% of the cis-heritability mediated by all common variants. Functional genomic analyses showed that eSTRs are enriched in conserved regions, co-localize with regulatory elements, and are predicted to modulate histone modifications. Our results show that eSTRs provide a novel set of regulatory variants and highlight the contribution of repeats to the genetic architecture of quantitative human traits.

Rate of Adaptive Evolution under Blending Inheritance

Rate of Adaptive Evolution under Blending Inheritance

Alan R. Rogers
(Submitted on 1 Apr 2015)

In a population of size N, adaptive evolution is 2N times faster under Mendelian inheritance than under the 19th-century theory of blending inheritance.

Neanderthal Genomics Suggests a Pleistocene Time Frame for the First Epidemiologic Transition

Neanderthal Genomics Suggests a Pleistocene Time Frame for the First Epidemiologic Transition

Charlotte Jane Houldcroft , Simon Underdown
doi: http://dx.doi.org/10.1101/017343

High quality Altai Neanderthal and Denisovan genomes are revealing which regions of archaic hominin DNA have persisted in the modern human genome. A number of these regions are associated with response to infection and immunity, with a suggestion that derived Neanderthal alleles found in modern Europeans and East Asians may be associated with autoimmunity. Independent sources of DNA-based evidence allow a re-evaluation of the nature and timing of the first epidemiologic transition. By combining skeletal, archaeological and genetic evidence we question whether the first epidemiologic transition in Eurasia was as tightly tied to the onset of the Holocene as has previously been assumed. There is clear evidence to suggest that this transition began before the appearance of agriculture and occurred over a timescale of tens of thousands of years. The transfer of pathogens between human species may also have played a role in the extinction of the Neanderthals.

XWAS: a software toolset for genetic data analysis and association studies of the X chromosome

XWAS: a software toolset for genetic data analysis and association studies of the X chromosome

Feng Gao , Diana Chang , Arjun Biddanda , Li Ma , Yingjie Guo , Zilu Zhou , Alon Keinan
doi: http://dx.doi.org/10.1101/009795

XWAS is a new software for the analysis of the X chromosome in association studies and similar studies. The X chromosome plays an important role in human disease, especially those with sexually dimorphic characteristics. Special attention needs to be given to its analysis due to the unique inheritance pattern, leading to analytical complications that have resulted in the majority of genome-wide association studies (GWAS) either not considering X or mishandling it with GWAS toolsets that have been designed for non-sex chromosomes.. Hence, XWAS fills the need for tools that are specially designed for analysis of X. Following extensive, stringent, and X-specific quality control, XWAS offers an array of statistical tests of association, including: (1) the standard test between a SNP (single nucleotide polymorphism) and disease risk, including after first stratifying individuals by sex, (2) a test for a differential effect of a SNP on disease between males and females, (3) motivated by X-inactivation, a test for higher variance of a trait in heterozygous females as compared to homozygous females, and (4) for all tests, a version that allows for combining evidence across all SNPs in a whole gene. We applied the toolset analysis pipeline to 16 GWAS datasets of immune-related disorders and to 7 risk factors of coronary artery disease, and discovered several new X-linked genetic associations. XWAS will provide the tools and incentive for others to incorporate the X chromosome into GWAS, hence enabling discoveries of novel loci implicated in many diseases and in their sexual dimorphism.

Most viewed on Haldane’s Sieve: March 2015

The most viewed posts on Haldane’s Sieve this month were:

Simple genetic models for autism spectrum disorder

Simple genetic models for autism spectrum disorder
Swagatam Mukhopadhyay , Michael Wigler , Dan Levy
doi: http://dx.doi.org/10.1101/017301

To explore the interplay between new mutation, transmission, and gender bias in genetic disease requires formal quantitative modeling. Autism spectrum disorders offer an ideal case: they are genetic in origin, complex, and show a gender bias. The high reproductive costs of autism ensure that most strongly associated genetic mutations are short-lived, and indeed the disease exhibits both transmitted and de novo components. There is a large body of both epidemiologic and genomic data that greatly constrain the genetic mechanisms that may contribute to the disorder. We develop a computational framework that assumes classes of additive variants, each member of a class having equal effect. We restrict our initial exploration to single class models, each having three parameters. Only one model matches epidemiological data. It also independently matches the incidence of de novo mutation in simplex families, the gender bias in unaffected siblings in simplex populations, and rates of mutation in target genes. This model makes strong and as yet not fully tested predictions, namely that females are the primary carriers in cases of genetic transmission, and that the incidence of de novo mutation in target genes for families at high risk for autism are not especially elevated. In its simplicity, this model does not account for MZ twin concordance or the distorted gender bias of high functioning children with ASD, and does not accommodate all the known mechanisms contributing to ASD. We point to the next steps in applying the same computational framework to explore more complex models.

Genetic variability under the seed bank coalescent

Genetic variability under the seed bank coalescent
Jochen Blath , Bjarki Eldon , Adrian Casanova , Noemi Kurt , Maite Wilke-Berenguer
doi: http://dx.doi.org/10.1101/017244

We analyse patterns of genetic variability of populations in the presence of a large seed bank with the help of a new coalescent structure called the seed bank coalescent. This ancestral process appears naturally as scaling limit of the genealogy of large populations that sustain seed banks, if the seed bank size and individual dormancy times are of the same order as the active population. Mutations appear as Poisson processes on the active lineages, and potentially at reduced rate also on the dormant lineages. The presence of `dormant’ lineages leads to qualitatively altered times to the most recent common ancestor and non-classical patterns of genetic diversity. To illustrate this we provide a Wright-Fisher model with seed bank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seed bank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Estimates (obtained by simulations) of the distributions of commonly employed distance statistics, in the presence and absence of a seed bank, are compared. The effect of a seed bank on the expected site-frequency spectrum is also investigated using simulations. Our results indicate that the presence of a large seed bank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect the presence of a large seed bank in genetic data.

Network analysis of genome-wide selective constraint reveals a gene network active in early fetal brain intolerant of mutation

Network analysis of genome-wide selective constraint reveals a gene network active in early fetal brain intolerant of mutation

Jinmyung Choi , Parisa Shooshtari , Kaitlin E Samocha , Mark J Daly , Chris Cotsapas
doi: http://dx.doi.org/10.1101/017277
AbstractInfo/HistoryMetrics Preview PDF
Abstract

Using robust, integrated analysis of multiple genomic datasets, we show that genes depleted for non-synonymous de novo mutations form a subnetwork of 72 members under strong selective constraint. We further show this subnetwork is preferentially expressed in the early development of the human hippocampus and is enriched for genes mutated in neurological, but not other, Mendelian disorders. We thus conclude that carefully orchestrated developmental processes are under strong constraint in early brain development, and perturbations caused by mutation have adverse outcomes subject to strong purifying selection. Our findings demonstrate that selective forces can act on groups of genes involved in the same process, supporting the notion that adaptation can act coordinately on multiple genes. Our approach provides a statistically robust, interpretable way to identify the tissues and developmental times where groups of disease genes are active. Our findings highlight the importance of considering the interactions between genes when analyzing genome-wide sequence data.

RiboDiff: Detecting Changes of Translation Efficiency from Ribosome Footprints

RiboDiff: Detecting Changes of Translation Efficiency from Ribosome Footprints

Yi Zhong , Theofanis Karaletsos , Philipp Drewe , Vipin Thankam T Sreedharan , Kamini Singh , Hans-Guido Wendel , Gunnar Rätsch
doi: http://dx.doi.org/10.1101/017111

Motivation: Deep sequencing based ribosome footprint profiling can provide novel insights into the regulatory mechanisms of protein translation. However, the observed ribosome profile is fundamentally confounded by transcriptional activity. In order to decipher principles of translation regulation, tools that can reliably detect changes in translation efficiency in case-control studies are needed. Results: We present a statistical framework and analysis tool, RiboDiff, to detect genes with changes in translation efficiency across experimental treatments. RiboDiff uses generalized linear models to estimate the over-dispersion of RNA-Seq and ribosome profiling measurements separately, and performs a statistical test for differential translation efficiency using both mRNA abundance and ribosome occupancy. Availability: Source code and documentation are available at http://github.com/ratschlab/ribodiff. Supplementary Material can be found at http://bioweb.me/ribo.