Conflations of short IBD blocks can bias inferred length of IBD
Charleston W.K. Chiang, Peter Ralph, John Novembre
Comments: 12 figures, 1 table
Subjects: Populations and Evolution (q-bio.PE)
Identity-by-descent (IBD) is a fundamental concept in genetics with many applications. Often, segments between two haplotypes are said to be IBD if they are inherited from a recent shared common ancestor without intervening recombination. Long IBD blocks (> 1cM) can be efficiently detected by a number of computer programs using high-density SNP array data from a population sample. However, all programs detect IBD based on contiguous segments of identity-by-state, and can therefore be due to the conflation of smaller, nearby IBD blocks. We quantified this effect using coalescent simulations, finding that nearly 40% of inferred blocks 1-2cM long are false conflations of two or more longer blocks, under demographic scenarios typical for modern humans. This biases the inferred IBD block length distribution, and so can affect downstream inferences. We observed this conflation effect universally across different IBD detection programs and human demographic histories, and found inference of segments longer than 2cM to be much more reliable (less than 5% conflation rate). We then present and analyze a novel estimator of the de novo mutation rate using IBD blocks, and demonstrate that the biased length distribution of the IBD segments due to conflation can strongly affect this estimator if the conflation is not modeled. Thus, the conflation effect should be carefully considered, especially as methods to detect shorter IBD blocks using sequencing data are being developed.
Second-generation PLINK: rising to the challenge of larger and richer datasets
Christopher C. Chang, Carson C. Chow, Laurent C.A.M. Tellier, Shashaank Vattikuti, Shaun M. Purcell, James J. Lee
Comments: 2 figures, 1 additional file
Subjects: Genomics (q-bio.GN); Computation (stat.CO)
PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for even faster and more scalable implementations of key functions. In addition, GWAS and population-genetic data now frequently contain probabilistic calls, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1’s primary data format.
To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, O(sqrt(n))-time/constant-space Hardy-Weinberg equilibrium and Fisher’s exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. This will be followed by PLINK 2.0, which will introduce (a) a new data format capable of efficiently representing probabilities, phase, and multiallelic variants, and (b) extensions of many functions to account for the new types of information.
The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
An extended reply to Mendez et al.: The ‘extremely ancient’ chromosome that still isn’t
Eran Elhaik, Tatiana V. Tatarinova, Anatole A. Klyosov, Dan Graur
(Submitted on 15 Oct 2014)
Earlier this year, we published a scathing critique of a paper by Mendez et al. (2013) in which the claim was made that a Y chromosome was 237,000-581,000 years old. Elhaik et al. (2014) also attacked a popular article in Scientific American by the senior author of Mendez et al. (2013), whose title was “Sex with other human species might have been the secret of Homo sapiens’s [sic] success” (Hammer 2013). Five of the 11 authors of Mendez et al. (2013) have now written a “rebuttal,” and we were allowed to reply.
Unfortunately, our reply was censored for being “too sarcastic and inflamed.” References were removed, meanings were castrated, and a dedication in the Acknowledgments was deleted. Now, that the so-called rebuttal by 45% of the authors of Mendez et al. (2013) has been published together with our vasectomized reply, we decided to make public our entire reply to the so called “rebuttal.” In fact, we go one step further, and publish a version of the reply that has not even been self-censored.
Now, that the so-called rebuttal by 45% of the authors of Mendez et al. (2013) has been published together with our vasectomized reply, we decided to make public our entire reply to the so called “rebuttal.” In fact, we go one step further, and publish a version of the reply that has not even been self-censored.
Recent evolution of the mutation rate and spectrum in Europeans
As humans dispersed out of Africa, they adapted to new environmental challenges including changes in exposure to mutagenic solar radiation. This raises the possibility that different populations experienced different selective pressures affecting genome integrity. Prior work has uncovered divergent selection in tropical versus temperate latitudes on eQTLs that regulate the DNA damage response, as well as evidence that the human mutation rate per year has changed at least 2-fold since we shared a common ancestor with chimpanzees. Here, I present evidence that the rate of a particular mutation type has recently increased in the European lineage, rising in frequency by 50% during the 30,000–50,000 years since Europeans diverged from Asians. A comparison of single nucleotide polymorphisms (SNPs) private to Africa, Asia, and Europe in the 1000 Genomes data reveals that private European variation is enriched for the transition 5’-TCC-3’→5’-TTC-3’. Although it is not clear whether UV played a causal role in the changing the European mutational spectrum, 5’-TCC-3’→5’-TTC-3’ is known to be the most common somatic mutation present in melanoma skin cancers, as well as the mutation most frequently induced in vitro by UV. Regardless of its causality, this change indicates that DNA replication fidelity has not remained stable even since the origin of modern humans and might have changed numerous times during our recent evolutionary history.
Massive bursts of transposable element activity in Drosophila
Robert Kofler, Viola Nolte, Christian Schlötterer
The evolutionary dynamics of transposable element (TE) insertions have been of continued interest since TE activity has important implications for genome evolution and adaptation. Here, we infer the transposition dynamics of TEs by comparing their abundance in natural D. melanogaster and D. simulans populations. Sequencing pools of more than 550 South African flies to at least 320-fold coverage, we determined the genome wide TE insertion frequencies in both species. We show that 46 (49%) TE families in D. melanogaster and 44 (47%) in D. simulans experienced a recent burst of activity. The bursts of activity affected different TE families in the two species. While in D. melanogaster retrotransposons predominated, DNA transposons showed higher activity levels in D. simulans. We propose that the observed TE dynamics are the outcome of the demographic history of the two species, with habitat expansion triggering a period of rapid evolution.
Quantification of GC-biased gene conversion in the human genome
Sylvain Glemin, Peter F Arndt, Philipp W Messer, Dmitri Petrov, Nicolas Galtier, Laurent Duret
Many lines of evidence indicate GC-biased gene conversion (gBGC) has a major impact on the evolution of mammalian genomes. However, up to now, this process had not been properly quantified. In principle, the strength of gBGC can be measured from the analysis of derived allele frequency spectra. However, this approach is sensitive to a number of confounding factors. In particular, we show by simulations that the inference is pervasively affected by polymorphism polarization errors, especially at hypermutable sites, and spatial heterogeneity in gBGC strength. Here we propose a new method to quantify gBGC from DAF spectra, incorporating polarization errors and taking spatial heterogeneity into account. This method is very general in that it does not require any prior knowledge about the source of polarization errors and also provides information about mutation patterns. We apply this approach to human polymorphism data from the 1000 genomes project. We show that the strength of gBGC does not differ between hypermutable CpG sites and non-CpG sites, suggesting that in humans gBGC is not caused by the base-excision repair machinery. We further find that the impact of gBGC is concentrated primarily within recombination hotspots: genome-wide, the strength of gBGC is in the nearly neutral area, but 2% of the human genome is subject to strong gBGC, with population-scaled gBGC coefficients above 5. Given that the location of recombination hotspots evolves very rapidly, our analysis predicts that in the long term, a large fraction of the genome is affected by short episodes of strong gBGC.
Fitting the Balding-Nichols model to forensic databases
Rori Rohlfs, Vitor R.C. Aguiar, Kirk E. Lohmueller, Amanda M. Castro, Alessandro C.S. Ferreira, Vanessa C.O. Almeida, Iuri D. Louro, Rasmus Nielsen
AbstractInfo/HistoryMetricsData Supplements Preview PDF
Large forensic databases provide an opportunity to compare observed empirical rates of genotype matching with those expected under forensic genetic models. A number of researchers have taken advantage of this opportunity to validate some forensic genetic approaches, particularly to ensure that estimated rates of genotype matching between unrelated individuals are indeed slight overestimates of those observed. However, these studies have also revealed systematic error trends in genotype probability estimates. In this analysis, we investigate these error trends and show how they result from inappropriate implementation of the Balding-Nichols model in the context of database-wide matching. Specifically, we show that in addition to accounting for increased allelic matching between individuals with recent shared ancestry, studies must account for relatively decreased allelic matching between individuals with more ancient shared ancestry.