The distribution and impact of common copy-number variation in the genome of the domesticated apple, Malus x domestica Borkh.
James Boocock, David David Chagné, Tony R Merriman, Mik Black
Background Copy number variation (CNV) is a common feature of eukaryotic genomes, and a growing body of evidence suggests that genes affected by CNV are enriched in processes that are associated with environmental responses. Here we use next generation sequence (NGS) data to detect copy-number variable regions (CNVRs) within the Malus x domestica genome, as well as to examine their distribution and impact. Methods CNVRs were detected using NGS data derived from 30 accessions of M. x domestica analysed using the read-depth method, as implemented in the CNVrd2 software. To improve the reliability of our results, we developed a quality control and analysis procedure that involved checking for organelle DNA, not repeat masking, and the determination of CNVR identity using a permutation testing procedure. Results Overall, we identified 876 CNVRs, which spanned 3.5% of the apple genome. To verify that detected CNVRs were not artefacts, we analysed the B- allele-frequencies (BAF) within a SNP array dataset derived from a screening of 185 individual apple accessions and found the CNVRs were enriched for SNPs having aberrant BAFs (P < 1e-13, Fisher’s Exact test). Putative CNVRs overlapped 845 gene models and were enriched for resistance (R) genes (P < 1e-22, Fisher’s exact test). Of note is a cluster of resistance genes on chromosome 2 near a region containing multiple major gene loci conferring resistance to apple scab. Conclusion We present the first analysis and catalogue of CNVRs in the M. x domestica genome. The enrichment of the CNVRs with R genes and their overlap with gene loci of agricultural significance draw attention to a form of unexplored genetic variation in apple. This research will underpin further investigation of the role that CNV plays within the apple genome.