Deep genome sequencing and variation analysis of 13 inbred mouse strains defines candidate phenotypic alleles, private variation, and homozygous truncating mutations
Background The Mouse Genomes Project is an ongoing collaborative effort to sequence the genomes of the common laboratory mouse strains. In 2011, the initial analysis of sequence variation across 17 strains found 56.7M unique SNPs and 8.8M indels. We carried out deep sequencing of 13 additional inbred strains (BUB/BnJ, C57BL/10J, C57BR/cdJ, C58/J, DBA/1J, I/LnJ, KK/HiJ, MOLF/EiJ, NZB/B1NJ, NZW/LacJ, RF/J, SEA/GnJ and ST/bJ) cataloging molecular variation within and across the strains. These strains include important models for immune response, leukemia, age-related hearing loss and rheumatoid arthritis. We now have several examples of fully sequenced closely related strains that are divergent for several disease phenotypes. Results Approximately, 27.4M unique SNPs and 5M indels were identified across these strains compared to the C57BL/6J reference genome (GRCm38). The amount of variation found in the inbred laboratory mouse genome has increased to 71M SNPs and 12M indels. We investigate the genetic basis of highly penetrant cancer susceptibility in RF/J finding private novel missense mutations in DNA damage repair and highly cancer associated genes. We use two highly related strains (DBA/1J and DBA/2J) to investigate the genetic basis of collagen induced arthritis susceptibility. Conclusion This paper significantly expands the catalog of fully sequenced laboratory mouse strains and now contains several examples of highly genetically similar strains with divergent phenotypes. We show how studying private missense mutations can lead to insights into the genetic mechanism for a highly penetrant phenotype.