Julia H Wildschutte , Alayna A Baron , Nicolette M Diroff , Jeffrey M Kidd
Alu insertions have contributed to >11% of the human genome. About ~30-35 Alu subfamilies remain actively mobile, and are recognized as major drivers of genetic variation and disease. Sophisticated computational methods permit identification of non-reference insertions based on specific signatures from whole genome sequencing data, but reporting of entire insertion sequences is limited. We build on existing methods and develop an approach that combines Alu detection and de novo assembly of WGS data to reconstruct the full sequence of insertion events. Using this approach, we generate a highly accurate call set of 1,614 completely assembled Alu variants from 53 samples from the Human Genome Diversity Project panel. Experimental validation of 30 sites shows 100% this method produces a highly accurate call set that accurately reconstructs insertion sequence. We utilize the reconstructed alternative insertion haplotypes to genotype 1,010 fully assembled insertions, obtaining >99% accuracy. We find evidence of insertion by non-classical mechanisms and observe 5??? truncation in 16% of AluYa5 and AluYb8 insertions. The sites of truncation coincide with stem-loop structures and SRP9/14 binding sites in the Alu RNA, implicating L1 ORF2p pausing in the generation of 5??? truncations.