Pollen-specific genes accumulate more deleterious mutations than sporophytic genes under relaxed purifying selection in Arabidopsis thaliana.

Pollen-specific genes accumulate more deleterious mutations than sporophytic genes under relaxed purifying selection in Arabidopsis thaliana.

Mark Christian Harrison , Eamonn B Mallon , Dave Twell , Robert L Hammond
doi: http://dx.doi.org/10.1101/016626

The strength of purifying selection varies among loci and leads to differing frequencies of deleterious alleles within genomes. Selection is generally stronger for highly and broadly expressed genes but can be less efficient for diploid expressed, deleterious alleles if heterozygous. In plants expression level, tissue specificity and ploidy level differ between pollen specific and sporophyte specific genes. This may explain why the reported strength and direction of the relationship between selection and the specificity of a gene to either pollen or sporophytic tissues varies between studies and species. In this study, we investigate the individual effects of expression level and tissue specificity on selection efficacy within pollen genes and sporophytic genes of Arabidopsis thaliana. Due to high homozygosity levels caused by selfing, masking is expected to play a lesser role. We find that expression level and tissue specificity independently influence selection in A. thaliana. Furthermore, contrary to expectations, pollen genes are evolving faster due to relaxed purifying selection and have accumulated a higher frequency of deleterious alleles. This suggests that high homozygosity levels resulting from high selfing rates reduce the effects of pollen competition and masking in A. thaliana, so that the high tissue specificity and expression noise of pollen genes are leading to lower selection efficacy compared to sporophyte genes.

Sex chromosome dosage compensation in Heliconius butterflies: global yet still incomplete?

Sex chromosome dosage compensation in Heliconius butterflies: global yet still incomplete?

James R Walters , Thomas J Hardcastle , Chris Jiggins
doi: http://dx.doi.org/10.1101/016675

The evolution of heterogametic sex chromosome is often – but not always – accompanied by the evolution of dosage compensating mechanisms that mitigate the impact of sex-specific gene dosage on levels of gene expression. One emerging view of this process is that such mechanisms may only evolve in male-heterogametic (XY) species but not in female-heterogametic (ZW) species, which will consequently exhibit “incomplete” sex chromosome dosage compensation. However, some recent results from moths suggest that Lepidoptera (moths and butterflies) may prove to be an exception to this prediction. Here we report an analysis of sex chromosome dosage compensation in Heliconius butterflies, sampling multiple individuals for several different adult tissues (head, abdomen, leg, mouth, and antennae). Methodologically, we introduce a novel application of linear mixed-effects models to assess dosage compensation, offering a unified statistical framework that can estimate effects specific to chromosome, to sex, and their interactions (i.e., a dosage effect). Our results show substantially reduced Z-linked expression relative to autosomes in both sexes, as previously observed in bombycoid moths. This observation is consistent with an increasing body of evidence that at least some species of moths and butterflies possess an epigenetic sex chromosome dosage compensating mechanism that operates by reducing Z chromosome expression in males. However, this mechanism appears to be imperfect in Heliconius, resulting in a modest dosage effect that produces an average 5-20% male-bias on the Z chromosome, depending on the tissue. Strong sex chromosome dosage effects have been previously in a pyralid moth. Thus our results reflect a mixture of previous patterns reported for Lepidoptera and bisect the emerging view that female-heterogametic ZW taxa have incomplete dosage compensation because they lack a chromosome-wide epigenetic mechanism mediating sex chromosome dosage compensation. In the case of Heliconius, sex chromosome dosage effects persist apparently despite such a mechanism. We also analyze chromosomal distributions of sex-biased genes and show an excess of male-biased and a dearth of female-biased genes on the Z chromosome relative to autosomes, consistent with predictions of sexually antagonistic evolution.

Introgression obscures and reveals historical relationships among the American live oaks

Introgression obscures and reveals historical relationships among the American live oaks

Deren Eaton , Antonio Gonzalez-Rodriguez , Andrew Hipp , Jeannine Cavender-Bares
doi: http://dx.doi.org/10.1101/016238

Introgressive hybridization challenges the concepts we use to define species and our ability to infer their evolutionary relationships. Methods for inferring historical introgression from the genomes of extant species are now widely used, however, few guidelines have been articulated for how best to interpret their results. Because these tests are inherently comparative, we show that they are sensitivite to the effects of missing data (unsampled species) and to non-independence (hierarchical relationships among species). We demonstrate this using genomic RAD data sampled from populations across the geographic ranges of all extant species in the American live oaks (Quercus series Virentes), a group notorious for hybridization. By considering all species in the clade, and their phylogenetic relationships, we were able to distinguish true hybridizing lineages from those that falsely appear admixed due to phylogenetic structure among hybridizing relatives. Six of seven species show evidence of admixture, often with multiple other species, but which can be explained by hybrid introgression among few related lineages where they occur in close proximity. We identify the Cuban oak as a highly admixed lineage and use an information-theoretic model comparison approach to test alternative scenarios for its origin. Hybrid speciation is a poor fit compared to a model in which a population from Central America colonized Cuba and received subsequent gene flow from Florida. The live oaks form a continuous ring-like distribution around the Gulf of Mexico, connected in Cuba, across which they could effectively exchange alleles. However, introgression appears to remain localized to areas of sympatry, suggesting that oak species boundaries, and their geographic ranges have remained relatively stable over evolutionary time.

Two variance component model improves genetic prediction in family data sets

Two variance component model improves genetic prediction in family data sets

George Tucker , Po-Ru Loh , Iona M MacLeod , Ben J Hayes , Michael E Goddard , Bonnie Berger , Alkes L Price
doi: http://dx.doi.org/10.1101/016618

Genetic prediction based on either identity by state (IBS) sharing or pedigree information has been investigated extensively using Best Linear Unbiased Prediction (BLUP) methods. However, methods to combine IBS sharing and pedigree information for genetic prediction in humans have not been explored. We introduce a two variance component model for genetic prediction: one component for IBS sharing and one for approximate pedigree structure, both estimated using genetic markers. In simulations using real genotypes from CARe and FHS family cohorts, we demonstrate that the two variance component model achieves gains in prediction r2 over standard BLUP at current sample sizes, and we project based on simulations that these gains will continue to hold at larger sample sizes. Accordingly, in analyses of four quantitative phenotypes from CARe and two quantitative phenotypes from FHS, the two variance component model significantly improves prediction r2 in each case, with up to a 16% relative improvement. We also find that standard mixed model association tests can produce inflated test statistics in datasets with related individuals, whereas the two variance component model corrects for inflation.

Tools and best practices for allelic expression analysis

Tools and best practices for allelic expression analysis

Stephane E Castel , Ami Levy-Moonshine , Pejman Mohammadi , Eric Banks , Tuuli Lappalainen
doi: http://dx.doi.org/10.1101/016097

Allelic expression (AE) analysis has become an important tool for integrating genome and transcriptome data to characterize various biological phenomena such as cis-regulatory variation and nonsense-mediated decay. In this paper, we systematically analyze the properties of AE read count data and technical sources of error, such as low-quality or double-counted RNA-seq reads, genotyping errors, allelic mapping bias, and technical covariates due to sample preparation and sequencing, and variation in total read depth. We provide guidelines for correcting and filtering for such errors, and show that the resulting AE data has extremely low technical noise. Finally, we introduce novel software for high-throughput production of AE data from RNA-sequencing data, implemented in the GATK framework. These improved tools and best practices for AE analysis yield higher quality AE data by reducing technical bias. This provides a practical framework for wider adoption of AE analysis by the genomics community.

Bacterial Infection Remodels the DNA Methylation Landscape of Human Dendritic Cells

Bacterial Infection Remodels the DNA Methylation Landscape of Human Dendritic Cells

Alain Pacis , Ludovic Tailleux , John Lambourne , Vania Yotova , Anne Dumaine , Anne Danckaert , Francesca Luca , Jean-Christophe Grenier , Kasper Hansen , Brigitte Gicquel , Miao Yu , Athma Pai , Jenny Tung , Chuan He , Tomi Pastinen , Roger Pique-Regi , Yoav Gilad , Luis Barreiro
doi: http://dx.doi.org/10.1101/016022

DNA methylation is thought to be robust to environmental perturbations on a short time scale. Here, we challenge that view by demonstrating that the infection of human dendritic cells (DCs) with a pathogenic bacteria is associated with rapid changes in methylation at thousands of loci. Infection-induced changes in methylation occur primarily at distal enhancer elements, including those associated with the activation of key immune transcription factors and genes involved in the crosstalk between DCs and adaptive immunity. Active demethylation is associated with extensive epigenetic remodeling and is strongly predictive of changes in the expression levels of nearby genes. Collectively, our observations show that rapid changes in methylation play a previously unappreciated role in regulating the transcriptional response of DCs to infection.

Efficient computation of the joint sample frequency spectra for multiple populations

Efficient computation of the joint sample frequency spectra for multiple populations

John A. Kamm, Jonathan Terhorst, Yun S. Song
(Submitted on 3 Mar 2015)

A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences. In particular, recently there has been growing interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. Although much methodological progress has been made, existing SFS-based inference methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable efficient computation of the expected joint SFS for multiple populations related by a complex demographic model with arbitrary population size histories (including piecewise exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study involving tens of populations, we demonstrate our improvements to numerical stability and computational complexity.

The interplay between DNA methylation and sequence divergence in recent human evolution

The interplay between DNA methylation and sequence divergence in recent human evolution

Irene Hernando-Herraez , Holger Heyn , Marcos Fernandez-Callejo , Enrique Vidal , Hugo Fernandez-Bellon , Javier Prado-Martinez , Andrew J Sharp , Manel Esteller , Tomas Marques-Bonet
doi: http://dx.doi.org/10.1101/015966

DNA methylation is a key regulatory mechanism in mammalian genomes. Despite the increasing knowledge about this epigenetic modification, the understanding of human epigenome evolution is in its infancy. We used whole genome bisulfite sequencing to study DNA methylation and nucleotide divergence between human and great apes. We identified 360 and 210 differentially hypo- and hypermethylated regions (DMRs) in humans compared to non-human primates and estimated that 20% and 36% of these regions, respectively, were detectable throughout several human tissues. Human DMRs were enriched for specific histone modifications and contrary to expectations, the majority were located distal to transcription start sites, highlighting the importance of regions outside the direct regulatory context. We also found a significant excess of endogenous retrovirus elements in human-specific hypomethylated regions suggesting their association with local epigenetic changes. We also reported for the first time a close interplay between inter-species genetic and epigenetic variation in regions of incomplete lineage sorting, transcription factor binding sites and human differentially hypermethylated regions. Specifically, we observed an excess of human-specific substitutions in transcription factor binding sites located within human DMRs, suggesting that alteration of regulatory motifs underlies some human-specific methylation patterns. We also found that the acquisition of DNA hypermethylation in the human lineage is frequently coupled with a rapid evolution at nucleotide level in the neighborhood of these CpG sites. Taken together, our results reveal new insights into the mechanistic basis of human-specific DNA methylation patterns and the interpretation of inter-species non-coding variation.

Tandem repeat variation in human and great ape populations and its impact on gene expression divergence

Tandem repeat variation in human and great ape populations and its impact on gene expression divergence

Tugce Bilgin Sonay , Tiago Carvalho , Mark Robinson , Maja Greminger , Michael Krützen , David Comas , Gareth Highnam , David Mittelman , Andrew Sharp , Tomas Marques-Bonet , Andreas Wagner
doi: http://dx.doi.org/10.1101/015784

Tandem repeats (TR) are stretches of DNA that are highly variable in length and mutate rapidly, and thus an important source of genetic variation. This variation is highly informative for population and conservation genetics, and has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation have been scarce due to the technical difficulties derived from short-read technology. Here, we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, and their impact on gene expression evolution. We found that populations and species diversity patterns can be efficiently captured with short TRs (repeat unit length 1-5 base pairs) with potential applications in conservation genetics. We also examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length 2-50 base pairs). About one third of the 13,035 one-to-one orthologous genes contained TRs within 5 kilobase pairs of their transcription start site, and had higher expression divergence than genes without such TRs. The same observation held for genes with repeats in their 3′ untranslated region, in introns, and in exons. Using our polymorphism data for the shortest TRs, we found that genes with polymorphic repeats in their promoters showed higher expression divergence in humans and chimpanzees compared to genes with fixed or no TRs in the promoters. Our findings highlight the potential contribution of TRs to recent human evolution through gene regulation.

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Bjarni Vilhjalmsson , Jian Yang , Hilary Kiyo Finucane , Alexander Gusev , Sara Lindstrom , Stephan Ripke , Giulio Genovese , Po-Ru Loh , Gaurav Bhatia , Ron Do , Tristian Hayeck , Hong-Hee Won , Schizophrenia Working Group of the Psychiatric Genomics Consortium , the Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study , Sekar Kathiresan , Michele Pato , Carlos Pato , Rulla Tamimi , Eli Stahl , Noah Zaitlen , Bogdan Pasaniuc , Mikkel Schierup , Phillip De Jager , Nikolaos Patsopoulos , Steven A McCarroll , Mark Daly , Shaun Purcell , Daniel Chasman , Benjamin Neale , Mike Goddard , Peter M Visscher , Peter Kraft , Nick J Patterson , Alkes L Price
doi: http://dx.doi.org/10.1101/015859

Polygenic risk scores have shown great promise in predicting complex disease risk, and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves LD-pruning markers and applying a P-value threshold to association statistics, but this discards information and may reduce predictive accuracy. We introduce a new method, LDpred, which infers the posterior mean causal effect size of each marker using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the pruning/thresholding approach, particularly at large sample sizes. Accordingly, prediction R2 increased from 20.1% to 25.3% in a large schizophrenia data set and from 9.8% to 12.0% in a large multiple sclerosis data set. A similar relative improvement in accuracy was observed for three additional large disease data sets and when predicting in non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.