Justus M Kebschull, Anthony M Zador
PCR allows the exponential and sequence specific amplification of DNA, even from minute starting quantities. Today, PCR is at the core of the most successful DNA sequencing technologies and is a fundamental step in preparing DNA samples for high throughput sequencing. Despite its importance, we have little comprehensive understanding of the biases and errors that PCR introduces into pools of DNA molecules. Understanding PCRs imperfections and their impact on the amplification of different sequences in a complex mixture is particularly important for a proper understanding of high-throughput sequencing data. We examined the effects of bias, stochasticity, template switches and polymerase errors introduced during PCR on sequence representation in next-generation sequencing libraries. Using Illumina sequencing results of a pool of diverse PCR amplicons with a defined structure, we searched for signatures of each process. We further developed quantitative models for each process and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. PCR errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results will have particular relevance to single cell sequencing, in which sequences are represented by only one or a few molecules.