This guest post is by Mike Harvey on his (along with coauthors) paper Tilston-Smith and Harvey et al Target capture and massively parallel sequencing of ultraconserved elements (UCEs) for comparative studies at shallow evolutionary time scales arXived here.
This paper is a result of work on developing markers and methods for generating genomic data for species without available genomes (I’ll refer to these as “non-model” species). The work is a collaborative effort between some researchers who are really on top of developments in sequencing technologies (and are also a blast to work with) – Travis Glenn at UGA, Brant Faircloth at UCLA, and John McCormack at Occidental – and our lab here at LSU. We think the marker sets we have been developing (ultraconserved elements) and more generally the method we are using (sequence capture) have the potential to make the genomic revolution more accessible to researchers studying the population genetics of diverse non-model organisms.
Although genomic resources for humans and other model systems are increasing rapidly, the bottleneck for those of us working on the population genetics of non-model systems is simply our ability to generate data. Many of us are still struggling to take advantage of the increase in sequencing capacity provided by next-generation platforms. For many projects, sequencing entire genomes is neither feasible (yet) nor necessary, so researchers have focused on finding reasonable methods of subsampling the genome in a repeatable way such that the same subset of genomic regions can be sampled for many individuals. We often have to do this, however, with little to no prior genomic information from our particular study organism.
Most methods for subsampling the genome thus far have involved “random” sampling from across the genome by using restriction enzymes to digest genomic DNA and then sequencing fragments that fall in a particular part of the fragment size distribution. Drawbacks of these methods include (1) the fact that the researcher has no prior knowledge of where in the genome sequences will be coming from or what function the genomic region might serve, and (2) that the repeatability of the method, specifically the ability to generate data from the same loci across samples, depends on the conservation of the enzyme cut sites, and these often are not conserved at deeper timescales. Sequencing transcriptomes is also a popular method for subsampling the genome, but this simply isn’t an option for those of us working with museum specimens and tissues or old blood samples in which RNA hasn’t been properly preserved.
Sequence capture, a molecular technique involving genome enrichment by hybridization to RNA or DNA ‘probes’, is a flexible alternative that allows researchers to subsample whatever portions of the genome they like. The drawback of sequence capture, however, is that you need enough prior genomic information to design the synthetic oligos used as probes. This is not a problem for e.g. exome capture in humans in which the targeted genes are well characterized, but it is a challenge for non-model systems without sequenced genomes.
This is where ultraconserved elements come in. Ultraconserved elements (UCEs) are short genomic regions that are highly conserved across widely divergent species (e.g. all amniotes). Because they are so conserved, UCE sequences can be easily used as probes for sequence capture in diverse non-model organisms, even if the organisms themselves have little or no genomic information available. If you are not working on amniotes or fishes (for which we have already designed probe arrays), all you may need to find UCEs is a couple of genomes from species that diverged from your study organism within the last few hundred million years. Of course, this general approach is not specific to loci that fall into our narrow definition of UCEs, but is limited merely by the availability of genomic information that can be used to design probes. As additional genomic information becomes available from a given group additional loci, including protein-coding regions, can easily be added to capture arrays.
Our question for this paper – does sequence capture of UCEs work for population genetics?
We have previously used sequence capture of UCEs to understand deeper-level phylogenetic questions. We’ve found that at deep timescales, the flanking regions of UCEs contain a large amount of informative variation. The goals of the present study were (1) to see if sufficient information existed in UCEs to enable studies at shallow evolutionary (read "population genetic or phylogeographic") timescales, and (2) to explore some of the analyses that might be possible with population genetic data from non-model organisms. For our study, we sampled two individuals from each of four populations in five different species of non-model Neotropical birds. We conducted sequence capture using probes designed from 2,386 UCEs shared by amniotes and we sequenced the resulting libraries using an Illumina HiSeq. We then examined the number of loci recovered and the amount of informative variation in those loci for each of the five species. We also conducted some standard analyses – species tree estimation, demographic modeling, and species delimitation – for each species
We were able to recover between 776 and 1,516 UCE regions across the five species, and these contained sufficient variation to conduct population genetic analyses in each species. Species tree estimates, demographic parameters, and species limits mostly corresponded with prior estimates based on morphology or mitochondrial DNA sequences. Confidence intervals around demographic parameter estimates from the UCEs were much narrower than estimates from mitochondrial DNA using similar methods, supporting the idea that larger datasets will allow more precise estimates of species histories.
Pending faster and cheaper methods for sequencing and de novo assembling whole genomes, methods for sampling a subset of the genome will be a practical necessity for population genetic studies in non-model organisms. Sequence capture is both intuitively appealing and practical in that it allows researchers to select a priori the regions of the genome in which they are interested. Ultraconserved elements pair nicely with sequence capture because they allow us to collect data from the same loci shared across a very broad spectrum of organisms (e.g. all amniotes or all fishes). As genomic data for diverse groups increases, UCE capture probes will certainly be augmented with additional genomic regions. In the meantime, sequence capture of UCEs has a lot to offer for population genetic studies of non-model organisms. See our paper for more information, or visit ultraconserved.org, where our probe sets, protocols, code, and other information are available under open-source licenses (BSD-style and Creative Commons) for anyone to use.