The interplay between DNA methylation and sequence divergence in recent human evolution

The interplay between DNA methylation and sequence divergence in recent human evolution

Irene Hernando-Herraez , Holger Heyn , Marcos Fernandez-Callejo , Enrique Vidal , Hugo Fernandez-Bellon , Javier Prado-Martinez , Andrew J Sharp , Manel Esteller , Tomas Marques-Bonet

DNA methylation is a key regulatory mechanism in mammalian genomes. Despite the increasing knowledge about this epigenetic modification, the understanding of human epigenome evolution is in its infancy. We used whole genome bisulfite sequencing to study DNA methylation and nucleotide divergence between human and great apes. We identified 360 and 210 differentially hypo- and hypermethylated regions (DMRs) in humans compared to non-human primates and estimated that 20% and 36% of these regions, respectively, were detectable throughout several human tissues. Human DMRs were enriched for specific histone modifications and contrary to expectations, the majority were located distal to transcription start sites, highlighting the importance of regions outside the direct regulatory context. We also found a significant excess of endogenous retrovirus elements in human-specific hypomethylated regions suggesting their association with local epigenetic changes. We also reported for the first time a close interplay between inter-species genetic and epigenetic variation in regions of incomplete lineage sorting, transcription factor binding sites and human differentially hypermethylated regions. Specifically, we observed an excess of human-specific substitutions in transcription factor binding sites located within human DMRs, suggesting that alteration of regulatory motifs underlies some human-specific methylation patterns. We also found that the acquisition of DNA hypermethylation in the human lineage is frequently coupled with a rapid evolution at nucleotide level in the neighborhood of these CpG sites. Taken together, our results reveal new insights into the mechanistic basis of human-specific DNA methylation patterns and the interpretation of inter-species non-coding variation.

Tandem repeat variation in human and great ape populations and its impact on gene expression divergence

Tandem repeat variation in human and great ape populations and its impact on gene expression divergence

Tugce Bilgin Sonay , Tiago Carvalho , Mark Robinson , Maja Greminger , Michael Krützen , David Comas , Gareth Highnam , David Mittelman , Andrew Sharp , Tomas Marques-Bonet , Andreas Wagner

Tandem repeats (TR) are stretches of DNA that are highly variable in length and mutate rapidly, and thus an important source of genetic variation. This variation is highly informative for population and conservation genetics, and has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation have been scarce due to the technical difficulties derived from short-read technology. Here, we explored the genome-wide diversity of TRs in a panel of 83 human and nonhuman great ape genomes, and their impact on gene expression evolution. We found that populations and species diversity patterns can be efficiently captured with short TRs (repeat unit length 1-5 base pairs) with potential applications in conservation genetics. We also examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length 2-50 base pairs). About one third of the 13,035 one-to-one orthologous genes contained TRs within 5 kilobase pairs of their transcription start site, and had higher expression divergence than genes without such TRs. The same observation held for genes with repeats in their 3′ untranslated region, in introns, and in exons. Using our polymorphism data for the shortest TRs, we found that genes with polymorphic repeats in their promoters showed higher expression divergence in humans and chimpanzees compared to genes with fixed or no TRs in the promoters. Our findings highlight the potential contribution of TRs to recent human evolution through gene regulation.

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Bjarni Vilhjalmsson , Jian Yang , Hilary Kiyo Finucane , Alexander Gusev , Sara Lindstrom , Stephan Ripke , Giulio Genovese , Po-Ru Loh , Gaurav Bhatia , Ron Do , Tristian Hayeck , Hong-Hee Won , Schizophrenia Working Group of the Psychiatric Genomics Consortium , the Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study , Sekar Kathiresan , Michele Pato , Carlos Pato , Rulla Tamimi , Eli Stahl , Noah Zaitlen , Bogdan Pasaniuc , Mikkel Schierup , Phillip De Jager , Nikolaos Patsopoulos , Steven A McCarroll , Mark Daly , Shaun Purcell , Daniel Chasman , Benjamin Neale , Mike Goddard , Peter M Visscher , Peter Kraft , Nick J Patterson , Alkes L Price

Polygenic risk scores have shown great promise in predicting complex disease risk, and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves LD-pruning markers and applying a P-value threshold to association statistics, but this discards information and may reduce predictive accuracy. We introduce a new method, LDpred, which infers the posterior mean causal effect size of each marker using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the pruning/thresholding approach, particularly at large sample sizes. Accordingly, prediction R2 increased from 20.1% to 25.3% in a large schizophrenia data set and from 9.8% to 12.0% in a large multiple sclerosis data set. A similar relative improvement in accuracy was observed for three additional large disease data sets and when predicting in non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

Selection and explosive growth may hamper the performance of rare variant association tests

Selection and explosive growth may hamper the performance of rare variant association tests

Lawrence H. Uricchio , John S. Witte , Ryan D. Hernandez

Much recent debate has focused on the role of rare variants in complex phenotypes. However, it is well known that rare alleles can only contribute a substantial proportion of the phenotypic variance when they have much larger effect sizes than common variants, which is most easily explained by natural selection constraining trait-altering alleles to low frequency. It is also plausible that demographic events will influence the genetic architecture of complex traits. Unfortunately, most rare variant association tests do not explicitly model natural selection or non-equilibrium demography. Here, we develop a novel evolutionary model of complex traits. We perform numerical calculations and simulate phenotypes under this model using inferred human demographic and selection parameters. We show that rare variants only contribute substantially to complex traits under very strong assumptions about the relationship between effect size and selection strength. We then assess the performance of state-of-the-art rare variant tests using our simulations across a broad range of model parameters. Counterintuitively, we find that statistical power is lowest when rare variants make the greatest contribution to the additive variance, and that power is substantially lower under our model than previously studied models. While many empirical studies have attempted to identify causal loci using rare variant association methods, few have reported novel associations. Some authors have interpreted this to mean that rare variants contribute little to heritability, but our results show that an alternative explanation is that rare variant tests have less power than previously estimated.

utation rate estimation for 15 autosomal STR loci in a large population from Mainland China

Mutation rate estimation for 15 autosomal STR loci in a large population from Mainland China
Zhuo Zhao , Hua Wang , Jie Zhang , Zhi-Peng Liu , Ming Liu , Yuan Zhang , Li Sun , Hui Zhang

STR, short trandem repeats, is well known as a type of powerful genetic marker and widely used in studying human population genetics. Compared with the conventional genetic markers, the mutation rate of STR is higher. Additionally, the mutations of STR loci do not lead to genetic inconsistencies between the genotypes of parents and children; therefore, the analysis of STR mutation is more suited to assess the population mutation. In this study, we focused on 15 autosomal STR loci (D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818, FGA). DNA samples from a total of 42416 unrelated healthy individuals (19037 trios) from the population of Mainland China collected between Jan 2012 and May 2014 were successfully investigated. In our study, the allele frequencies, paternal mutation rates, maternal mutation rates and average mutation rates were detected in the 15 STR loci. Furthermore, we also investigated the relationship between paternal ages, maternal ages, pregnant time, area and average mutation rate. We found that paternal mutation rate is higher than maternal mutation rate and the paternal, maternal, and average mutation rates have a positive correlation with paternal ages, maternal ages and times respectively. Additionally, the average mutation rates of coastal areas are higher than that of inland areas. Overall, these results suggest that the 15 autosomal STR loci can provide highly informative polymorphic data for population genetic assessment in Mainland China, as well as confirm and extend the application of STR analysis in population genetics.

Recent evolution in Rattus norvegicus is shaped by declining effective population size

Recent evolution in Rattus norvegicus is shaped by declining effective population size
Eva E Deinum , Daniel L Halligan , Rob W Ness , Yao-Hua Zhang , Lin Cong , Jian-Xu Zhang , Peter D Keightley

The brown rat, Rattus norvegicus, is both a notorious pest and a frequently used model in biomedical research. By analysing genome sequences of 12 wild-caught brown rats from their ancestral range in NE China, along with the sequence of a black rat, R. rattus, we investigate the selective and demographic forces shaping variation in the genome. We estimate that the recent effective population size (N_e) of this species = 1.24 x 10^5, based on silent site diversity. We compare patterns of diversity in these genomes with patterns in multiple genome sequences of the house mouse Mus musculus castaneus), which has a much larger N_e. This reveals an important role for variation in the strength of genetic drift in mammalian genome evolution. By a Pairwise Sequentially Markovian Coalescent (PSMC) analysis of demographic history, we infer that there has been a recent population size bottleneck in wild rats, which we date to approximately 20,000 years ago. Consistent with this, wild rat populations have experienced an increased flux of mildly deleterious mutations, which segregate at higher frequencies in protein-coding genes and conserved noncoding elements (CNEs). This leads to negative estimates of the rate of adaptive evolution (alpha) in proteins and CNEs, a result which we discuss in relation to the strongly positive estimates observed in wild house mice. As a consequence of the population bottleneck, wild rats also show a markedly slower decay of linkage disequilibrium with physical distance than wild house mice.

Speciation in Heliconius Butterflies: Minimal Contact Followed by Millions of Generations of Hybridisation

Speciation in Heliconius Butterflies: Minimal Contact Followed by Millions of Generations of Hybridisation
Simon Henry Martin , Anders Eriksson , Krzysztof M. Kozak , Andrea Manica , Chris D. Jiggins

Documenting the full extent of gene flow during speciation poses a challenge, as species ranges change over time and current rates of hybridisation might not reflect historical trends. Theoretical work has emphasized the potential for speciation in the face of ongoing hybridisation, and the genetic mechanisms that might facilitate this process. However, elucidating how the rate of gene flow between species may have changed over time has proved difficult. Here we use Approximate Bayesian Computation (ABC) to fit a model of speciation between the Neotropical butterflies Heliconius melpomene and Heliconius cydno. These species are ecologically divergent, rarely hybridize and display female hybrid sterility. Nevertheless, previous genomic studies suggests pervasive gene flow between them, extending deep into their past, and potentially throughout the speciation process. By modelling the rates of gene flow during early and later stages of speciation, we find that these species have been hybridising for hundreds of thousands of years, but have not done so continuously since their initial divergence. Instead, it appears that gene flow was rare or absent for as long as a million years in the early stages of speciation. Therefore, by dissecting the timing of gene flow between these species, we are able to reject a scenario of purely sympatric speciation in the face of continuous gene flow. We suggest that the period of minimal contact early in speciation may have allowed for the accumulation of genomic changes that later enabled these species to remain distinct despite a dramatic increase in the rate of hybridisation.