SumVg: Total heritability explained by all variants in genome-wide association studies based on summary

SumVg: Total heritability explained by all variants in genome-wide association studies based on summary statistics with standard error estimates
Hon-Cheong SO , Pak C. SHAM
doi: http://dx.doi.org/10.1101/016857

Genome-wide association studies (GWAS) have become increasingly popular these days and one of the key questions is how much heritability could be explained by all variants in GWAS. We have previously proposed an approach to answer this question, based on recovering the “true” z-statistics from a set of observed z-statistics. Only summary statistics are required. However, methods for standard error (SE) estimation are not available yet, thereby limiting the interpretation of the results. In this study we developed resampling-based approaches to estimate the SE and the methods are implemented in an R package. We found that delete-d-jackknife and parametric bootstrap approaches provide good estimates of the SE. Methods to compute the sum of heritability explained and the corresponding SE are implemented in the R package SumVg, available at https://sites.google.com/site/honcheongso/software/var-totalvg

The advent of genome-wide association studies for bacteria

The advent of genome-wide association studies for bacteria
Peter E Chen , B Jesse Shapiro
doi: http://dx.doi.org/10.1101/016873

Significant advances in sequencing technologies and genome-wide association studies (GWAS) have revealed substantial insight into the genetic architecture of human phenotypes. In recent years, the application of this approach in bacteria has begun to reveal the genetic basis of bacterial host preference, antibiotic resistance, and virulence. Here, we consider relevant differences between bacterial and human genome dynamics, apply GWAS to a global sample of Mycobacterium tuberculosis genomes to highlight the impacts of linkage disequilibrium, population stratification, and natural selection, and finally compare the traditional GWAS against phyC, a contrasting method of mapping genotype to phenotype based upon evolutionary convergence. We discuss strengths and weaknesses of both methods, and make suggestions for factors to be considered in future bacterial GWAS.

Cline coupling and uncoupling in a stickleback hybrid zone

Cline coupling and uncoupling in a stickleback hybrid zone
Tim Vines , Anne Dalziel , Arianne Albert , Thor Veen , Patricia Schulte , Dolph Schluter
doi: http://dx.doi.org/10.1101/016832

Strong ecological selection on a genetic locus can maintain allele frequency differences between populations in different environments, even in the face of hybridization. When alleles at divergent loci come into tight linkage disequilibria, selection acts on them as a unit and can significantly reduce gene flow. For populations interbreeding across a hybrid zone, linkage disequilibria between loci can force clines to share the same slopes and centers. However, strong ecological selection can push clines away from the others, reducing linkage disequilibria and weakening the barrier to gene flow. We looked for this ‘cline uncoupling’ effect in a hybrid zone between stream resident and anadromous sticklebacks at two genes known to be under divergent natural selection (Eda and ATP1a1) and five morphological traits that repeatedly evolve in freshwater stickleback. We used 10 anonymous SNPs to characterize the shape of the zone. We found that the clines at Eda, ATP1a1, and four morphological traits were concordant and coincident, suggesting that direct selection on each is outweighed by the indirect selection generated by linkage disequilibria. Interestingly, the cline for pectoral fin length was much steeper and displaced 200m downstream, and two anonymous SNPs also had steep clines.

Exploring functional variation affecting ceRNA regulation in humans

Exploring functional variation affecting ceRNA regulation in humans
Mulin Jun Li , Jiexing Wu , Peng Jiang , Wei Li , Yun Zhu , Daniel Fernandez , Russell J. H. Ryan , Yiwen Chen , Junwen Wang , Jun S. Liu , X. Shirley Liu
doi: http://dx.doi.org/10.1101/016865

MicroRNA (miRNA) sponges have been shown to function as competing endogenous RNAs (ceRNAs) to regulate the expression of other miRNA targets in the network by sequestering available miRNAs. As the first systematic investigation of the genome-wide genetic effect on ceRNA regulation, we applied multivariate response regression and identified widespread genetic variations that are associated with ceRNA competition using 462 Geuvadis RNA-seq data in multiple human populations. We showed that SNPs in gene 3’UTRs at the miRNA seed binding regions can simultaneously regulate gene expression changes in both cis and trans by the ceRNA mechanism. We termed these loci as endogenous miRNA sponge expression quantitative trait loci or “emsQTLs”, and found that a large number of them were unexplored in conventional eQTL mapping. We identified many emsQTLs are undergoing recent positive selection in different human populations. Using GWAS results, we found that emsQTLs are significantly enriched in traits/diseases associated loci. Functional prediction and prioritization extend our understanding on causality of emsQTL allele in disease pathways. We illustrated that emsQTL can synchronously regulate the expression of tumor suppressor and oncogene through ceRNA competition in angiogenesis. Together these results provide a distinct catalog and characterization of functional noncoding regulatory variants that control ceRNA crosstalk.

Transcriptome Differences between Alternative Sex Determining Genotypes in the House Fly, Musca domestica

Transcriptome Differences between Alternative Sex Determining Genotypes in the House Fly, Musca domestica
Richard P Meisel , Jeffrey G Scott , Andrew G Clark
doi: http://dx.doi.org/10.1101/016774

Sex determination evolves rapidly, often because of turnover of the genes at the top of the pathway. The house fly, Musca domestica, has a multifactorial sex determination system, allowing us to identify the selective forces responsible for the evolutionary turnover of sex determination in action. There is a male determining factor, M, on the Y chromosome (YM), which is probably the ancestral state. An M factor on the third chromosome (IIIM) has reached high frequencies in multiple populations across the world, but the evolutionary forces responsible for the invasion of IIIM are not resolved. To test if the IIIM chromosome invaded because of sex-specific selection pressures, we used mRNA sequencing to determine if isogenic males that differ only in the presence of the YM or IIIM chromosome have different gene expression profiles. We find that more genes are differentially expressed between YM and IIIM males in testis than head, and that genes with male-biased expression are most likely to be differentially expressed between YM and IIIM males. This suggests that male phenotypes, especially those related to male fertility, are more likely to be affected by the male-determining chromosome, supporting the hypothesis that sex-specific selection acts on alleles linked to the male-determining locus driving evolutionary turnover in the sex determination pathway. We additionally find that IIIM males have a “masculinized” gene expression profile, suggesting that the IIIM chromosome has accumulated an excess of male- beneficial alleles because of its male-limited transmission.

Breaking through evolutionary constraint by environmental fluctuations

Breaking through evolutionary constraint by environmental fluctuations
Marjon GJ de Vos , Alexandre Dawid , Vanda Sunderlikova , Sander J Tans
doi: http://dx.doi.org/10.1101/016790

Epistatic interactions can frustrate and shape evolutionary change. Indeed, phenotypes may fail to evolve because essential mutations can only be selected positively if fixed simultaneously. How environmental variability affects such constraints is poorly understood. Here we studied genetic constraints in fixed and fluctuating environments, using the Escherichia coli lac operon as a model system for genotype-environment interactions. The data indicated an apparent paradox: in different fixed environments, mutational trajectories became trapped at sub-optima where no further improvements were possible, while repeated switching between these same environments allowed unconstrained adaptation by continuous improvements. Pervasive cross-environmental trade-offs transformed peaks into valleys upon environmental change, thus enabling escape from entrapment. This study shows that environmental variability can lift genetic constraint, and that trade-offs not only impede but can also facilitate adaptive evolution.

The fate of a mutation in a fluctuating environment

The fate of a mutation in a fluctuating environment

Ivana Cvijovic , Benjamin H. Good , Elizabeth R. Jerison , Michael M. Desai
doi: http://dx.doi.org/10.1101/016709

Natural environments are never truly constant, but the evolutionary implications of temporally varying selection pressures remain poorly understood. Here we investigate how the fate of a new mutation in a variable environment depends on the dynamics of environmental fluctuations and on the selective pressures in each condition. We find that even when a mutation experiences many environmental epochs before fixing or going extinct, its fate is not necessarily determined by its time-averaged selective effect. Instead, environmental variability reduces the efficiency of selection across a broad parameter regime, rendering selection unable to distinguish between mutations that are substantially beneficial and substantially deleterious on average. Temporal fluctuations can also dramatically increase fixation probabilities, often making the details of these fluctuations more important than the average selection pressures acting on each new mutation. For example, mutations that result in a tradeoff between conditions but are strongly deleterious on average can nevertheless be more likely to fix than mutations that are always neutral or beneficial. These effects can have important implications for patterns of molecular evolution in variable environments, and they suggest that it may often be difficult for populations to maintain specialist traits, even when their loss leads to a decline in time-averaged fitness.

Dimensionality and the statistical power of multivariate genome-wide association studies

Dimensionality and the statistical power of multivariate genome-wide association studies

Eladio J. Marquez , David Houle
doi: http://dx.doi.org/10.1101/016592

Mutations virtually always have pleiotropic effects, yet most genome-wide association studies (GWAS) analyze effects one trait at a time. In order to investigate the performance of a multivariate approach to GWAS, we simulated scenarios where variation in a d-dimensional phenotype space was caused by a known subset of SNPs. Multivariate analyses of variance were then carried out on k traits, where k could be less than, greater than or equal to d. Our results show that power is maximized and false discovery rate (FDR) minimized when the number of traits analyzed, k, matches the true dimensionality of the phenotype being analyzed, d. When true dimensionality is high, the power of a single univariate analysis can be an order of magnitude less than the k=d case, even when the single trait with the largest genetic variance is chosen for analysis. When traits are added to a study in order of their independent genetic variation, the gains in power from increasing k up to d are much larger than the loss in power when k exceeds d. Simulations that explicitly model linkage disequilibrium (LD) indicate that when SNPs in disequilibrium are subjected to multivariate analysis, the magnitude of the apparent effect induced onto null SNPs by SNPs carrying a true effect weakens as k approaches d, such that the rank of P-values among a set of correlated SNPs becomes an increasingly reliable predictor of true positives. Multivariate GWAS outperform univariate ones under a wide range of conditions, and should become the standard in studies of the inheritance of complex phenotypes.

Analysis of whole mitogenomes from ancient samples

Analysis of whole mitogenomes from ancient samples

Gloria G. Fortes, Johanna L.A. Paijmans
(Submitted on 17 Mar 2015)

Ancient mitochondrial DNA has been used in a wide variety of palaeontological and archaeological studies, ranging from population dynamics of extinct species to patterns of domestication. Most of these studies have traditionally been based on the analysis of short fragments from the mitochondrial control region, analysed using PCR coupled with Sanger sequencing. With the introduction of high-throughput sequencing, as well as new enrichment technologies, the recovery of full mitochondrial genomes (mitogenomes) from ancient specimens has become significantly less complicated. Here we present a protocol to build ancient extracts into Illumina high-throughput sequencing libraries, and subsequent Agilent array-based capture to enrich for the desired mitogenome. Both are based on previously published protocols, with the introduction of several improvements aimed to increase the recovery of short DNA fragments, while keeping the cost and effort requirements low. This protocol was designed for enrichment of mitochondrial DNA in ancient or degraded samples. However, the protocols can be easily adapted for using for building libraries for shotgun-sequencing of whole genomes, or enrichment of other genomic regions.

Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions

Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions

Alicia Schep , Jason D Buenrostro , Sarah K Denny , Katja Schwartz , Gavin Sherlock , William J Greenleaf
doi: http://dx.doi.org/10.1101/016642

Transcription factors canonically bind nucleosome-free DNA, making the positioning of nucleosomes within regulatory regions crucial to the regulation of gene expression. We observe a highly structured pattern of DNA fragment lengths and positions generated by the assay of transposase accessible chromatin (ATAC-seq) around nucleosomes in S. cerevisiae, and use this distinctive two-dimensional nucleosomal “fingerprint” as the basis for a new nucleosome-positioning algorithm called NucleoATAC. We show that NucleoATAC can identify the rotational and translational positions of nucleosomes with up to base pair resolution and provide quantitative measures of nucleosome occupancy in S. cerevisiae, S. pombe, and human cells. We demonstrate application of NucleoATAC to a number of outstanding problems in chromatin biology, including analysis of sequence features underlying nucleosome positioning, promoter chromatin architecture across species, identification of transient changes in nucleosome occupancy and positioning during a dynamic cellular response, and integrated analysis of nucleosome occupancy and transcription factor binding.