Application of compressed sensing to genome wide association studies and genomic selection
Shashaank Vattikuti, James J. Lee, Stephen D. H. Hsu, Carson C. Chow
(Submitted on 8 Oct 2013)
We show that the signal-processing paradigm known as compressed sensing (CS) is applicable to genome-wide association studies (GWAS) and genomic selection (GS). The aim of GWAS is to isolate trait-associated loci, whereas GS attempts to predict the phenotypic values of new individuals on the basis of training data. CS addresses a problem common to both endeavors, namely that the number of genotyped markers often greatly exceeds the sample size. We show using CS methods and theory that all loci of nonzero effect can be identified (selected) using an efficient algorithm, provided that they are sufficiently few in number (sparse) relative to sample size. For heritability h2 = 1, there is a sharp phase transition to complete selection as the sample size is increased. For heritability values less than one, complete selection can still occur although the transition is smoothed. The transition boundary is only weakly dependent on the total number of genotyped markers. The crossing of a transition boundary provides an objective means to determine when true effects are being recovered. For h2 = 0.5, we find that a sample size that is thirty times the number of nonzero loci is sufficient for good recovery.