Gargi Dayama, Sarah B Emery, Jeffrey M Kidd, Ryan E Mills
The transfer of mitochondrial genetic material into the nuclear genomes of eukaryotes is a well-established phenomenon. Many studies over the past decade have utilized reference genome sequences of numerous species to characterize the prevalence and contribution of nuclear mitochondrial insertions to human diseases. The recent advancement of high throughput sequencing technologies has enabled the interrogation of genomic variation at a much finer scale, and now allows for an exploration into the diversity of polymorphic nuclear mitochondrial insertions (NumtS) in human populations. We have developed an approach to discover and genotype previously undiscovered Numt insertions using whole genome, paired-end sequencing data. We have applied this method to almost a thousand individuals in twenty populations from the 1000 Genomes Project and other data sets and identified 138 novel sites of Numt insertions, extending our current knowledge of existing Numt locations in the human genome by almost 20%. Most of the newly identified NumtS were found in less than 1% of the samples we examined, suggesting that they occur infrequently in nature or have been rapidly removed by purifying selection. We find that recent Numt insertions are derived from throughout the mitochondrial genome, including the D-loop, and have integration biases consistent with previous studies on older, fixed NumtS in the reference genome. We have further determined the complete inserted sequence for a subset of these events to define their age and origin of insertion as well as their potential impact on studies of mitochondrial heteroplasmy.