Fitting the Balding-Nichols model to forensic databases

Fitting the Balding-Nichols model to forensic databases

Rori Rohlfs, Vitor R.C. Aguiar, Kirk E. Lohmueller, Amanda M. Castro, Alessandro C.S. Ferreira, Vanessa C.O. Almeida, Iuri D. Louro, Rasmus Nielsen
doi: http://dx.doi.org/10.1101/009969
AbstractInfo/HistoryMetricsData Supplements Preview PDF
ABSTRACT

Large forensic databases provide an opportunity to compare observed empirical rates of genotype matching with those expected under forensic genetic models. A number of researchers have taken advantage of this opportunity to validate some forensic genetic approaches, particularly to ensure that estimated rates of genotype matching between unrelated individuals are indeed slight overestimates of those observed. However, these studies have also revealed systematic error trends in genotype probability estimates. In this analysis, we investigate these error trends and show how they result from inappropriate implementation of the Balding-Nichols model in the context of database-wide matching. Specifically, we show that in addition to accounting for increased allelic matching between individuals with recent shared ancestry, studies must account for relatively decreased allelic matching between individuals with more ancient shared ancestry.

The Drosophila Genome Nexus: a population genomic resource of 605 Drosophila melanogaster genomes, including 197 genomes from a single ancestral range population

The Drosophila Genome Nexus: a population genomic resource of 605 Drosophila melanogaster genomes, including 197 genomes from a single ancestral range population

Justin Lack, Charis Cardeno, Marc Crepeau, William Taylor, Russ Corbett-Detig, Kristian Stevens, Charles H. Langley, John Pool
doi: http://dx.doi.org/10.1101/009886

Hundreds of wild-derived D. melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach, and settled on an assembly strategy that utilizes two alignment programs and incorporates both SNPs and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets (previous DPGP releases and the DGRP freeze 2.0), and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 605 consistently aligned genomes, and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets.

Reticulate speciation and adaptive introgression in the Anopheles gambiae species complex

Reticulate speciation and adaptive introgression in the Anopheles gambiae species complex

Jacob Crawford, Michelle M. Riehle, Wamdaogo M. Guelbeogo, Awa Gneme, N’fale Sagnon, Kenneth D. Vernick, Rasmus Nielsen, Brian P. Lazzaro
doi: http://dx.doi.org/10.1101/009837

Species complexes are common, especially among insect disease vectors, and understanding how barriers to gene flow among these populations become established or violated is critical for implementation of vector-targeting disease control. Anopheles gambiae, the primary vector of human malaria in sub-Saharan Africa, exists as a series of ecologically specialized populations that are phylogenetically nested within a species complex. These populations exhibit varying degrees of reproductive isolation, sometimes recognized as distinct subspecies. We have sequenced 32 complete genomes from field-captured individuals of Anopheles gambiae, Anopheles gambiae M form (recently named A. coluzzii), sister species A. arabiensis, and the recently discovered “GOUNDRY” subgroup of A. gambiae that is highly susceptible to Plasmodium. Amidst a backdrop of strong reproductive isolation and adaptive differentiation, we find evidence for adaptive introgression of autosomal chromosomal regions among species and populations. The X chromosome, however, remains strongly differentiated among all of the subpopulations, pointing to a disproportionately large effect of X chromosome genes in driving speciation among anophelines. Strikingly, we find that autosomal introgression has occurred from contemporary hybridization among A. gambiae and A. arabiensis despite strong divergence (~5× higher than autosomal divergence) and isolation on the X chromosome. We find a large region of the X chromosome that has recently swept to fixation in the GOUNDRY subpopulation, which may be an inversion that serves as a partial barrier to gene flow. We also find that the GOUNDRY population is highly inbred, implying increased philopatry in this population. Our results show that ecological speciation in this species complex results in genomic mosaicism of divergence and adaptive introgression that creates a reticulate gene pool connecting vector populations across the speciation continuum with important implications for malaria control efforts.

On the prospect of identifying adaptive loci in recently bottlenecked populations

On the prospect of identifying adaptive loci in recently bottlenecked populations
Yu-Ping Poh, Vera S Domingues, Hopi Hoekstra, Jeffrey Jensen
doi: http://dx.doi.org/10.1101/009456

Identifying adaptively important loci in recently bottlenecked populations—be it natural selection acting on a population following the colonization of novel habitats in the wild, or artificial selection during the domestication of a breed—remains a major challenge. Here we report the results of a simulation study examining the performance of available population-genetic tools for identifying genomic regions under selection. To illustrate our findings, we examined the interplay between selection and demography in two species of Peromyscus mice, for which we have independent evidence of selection acting on phenotype as well as functional evidence identifying the underlying genotype. With this unusual information, we tested whether population-genetic-based approaches could have been utilized to identify the adaptive locus. Contrary to published claims, we conclude that the use of the background site frequency spectrum as a null model is largely ineffective in bottlenecked populations. Results are quantified both for site frequency spectrum and linkage disequilibrium-based predictions, and are found to hold true across a large parameter space that encompasses many species and populations currently under study. These results suggest that the genomic footprint left by selection on both new and standing variation in strongly bottlenecked populations will be difficult, if not impossible, to find using current approaches.

An Approximate Bayesian Computation Approach to Examining the Phylogenetic Relationships among the Four Gibbon Genera using Whole Genome Sequence Data

An Approximate Bayesian Computation Approach to Examining the Phylogenetic Relationships among the Four Gibbon Genera using Whole Genome Sequence Data

Krishna Veeramah, August E Woerner, Laurel Johnstone, Ivo Gut, Marta Gut, Tomas Marques-Bonet, Lucia Carbone, Jeff D Wall, Michael F Hammer
doi: http://dx.doi.org/10.1101/009498

Gibbons are believed to have diverged from the larger great apes ~16.8 Mya and today reside in the rainforests of Southeast Asia. Based on their diploid chromosome number, the family Hylobatidae is divided into four genera, Nomascus, Symphalangus, Hoolock and Hylobates. Genetic studies attempting to elucidate the phylogenetic relationships among gibbons using karyotypes, mtDNA, the Y chromosome, and short autosomal sequences have been inconclusive. To examine the relationships among gibbon genera in more depth, we performed 2nd generation whole genome sequencing to a mean of ~15X coverage in two individuals from each genus. We developed a coalescent-based Approximate Bayesian Computation method incorporating a model of sequencing error generated by high coverage exome validation to infer the branching order, divergence times, and effective population sizes of gibbon taxa. Although Hoolock and Symphalangus are likely sister taxa, we could not confidently resolve a single bifurcating tree despite the large amount of data analyzed. Our combined results support the hypothesis that all four gibbon genera diverged at approximately the same time. Assuming an autosomal mutation rate of 1×10-9/site/year this speciation process occurred ~5 Mya during a period in the Early Pliocene characterized by climatic shifts and fragmentation of the Sunda shelf forests. Whole genome sequencing of additional individuals will be vital for inferring the extent of gene flow among species after the separation of the gibbon genera.

The genetic ancestry of African, Latino, and European Americans across the United States.

The genetic ancestry of African, Latino, and European Americans across the United States.
Katarzyna Bryc, Eric Durand, J Michael Macpherson, David Reich, Joanna Mountain
doi: http://dx.doi.org/10.1101/009340

Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans brought largely by the Trans-Atlantic slave trade, shaping the early history of what became the United States. We studied the genetic ancestry of 5,269 self-described African Americans, 8,663 Latinos, and 148,789 European Americans who are 23andMe customers and show that the legacy of these historical interactions is visible in the genetic ancestry of present-day Americans. We document pervasive mixed ancestry and asymmetrical male and female ancestry contributions in all groups studied. We show that regional ancestry differences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced relocation of Native Americans within the US. This study sheds light on the fine-scale differences in ancestry within and across the United States, and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry.

Inference of Gorilla demographic and selective history from whole genome sequence data

Inference of Gorilla demographic and selective history from whole genome sequence data

Kimberly F. McManus, Joanna L. Kelley, Shiya Song, Krishna Veeramah, August E. Woerner, Laurie S. Stevison, Oliver A. Ryder, , Jeffrey M. Kidd, Jeffrey D. Wall, Carlos D. Bustamante, Michael F. Hammer
doi: http://dx.doi.org/10.1101/009191

While population-level genomic sequence data have been gathered extensively for humans, similar data from our closest living relatives are just beginning to emerge. Examination of genomic variation within great apes offers many opportunities to increase our understanding of the forces that have differentially shaped the evolutionary history of hominid taxa. Here, we expand upon the work of the Great Ape Genome Project by analyzing medium to high coverage whole genome sequences from 14 western lowland gorillas (Gorilla gorilla gorilla), 2 eastern lowland gorillas (G. beringei graueri), and a single Cross River individual (G. gorilla diehli). We infer that the ancestors of western and eastern lowland gorillas diverged from a common ancestor ~261 thousand years ago (kya), and that the ancestors of the Cross River population diverged from the western lowland gorilla lineage ~68 kya. Using a diffusion approximation approach to model the genome-wide site frequency spectrum, we infer a history of western lowland gorillas that includes an ancestral population expansion of ~1.4-fold around ~970 kya and a recent ~5.6-fold contraction in population size ~23 kya. The latter may correspond to a major reduction in African equatorial forests around the Last Glacial Maximum. We also analyze patterns of variation among western lowland gorillas to identify several genomic regions with strong signatures of recent selective sweeps. We find that processes related to taste, pancreatic and saliva secretion, sodium ion transmembrane transport, and cardiac muscle function are overrepresented in genomic regions predicted to have experienced recent positive selection.

Non-crossover gene conversions show strong GC bias and unexpected clustering in humans

Non-crossover gene conversions show strong GC bias and unexpected clustering in humans

Amy Williams, Giulio Geneovese, Thomas Dyer, Katherine Truax, Goo Jun, Nick Patterson, Joanne E. Curran, Ravi Duggirala, John Blangero, David Reich, Molly Przeworski,
doi: http://dx.doi.org/10.1101/009175

Although the past decade has seen tremendous progress in our understanding of fine-scale recombination, little is known about non-crossover (or “gene conversion”) resolutions. We report the first genome-wide study of non-crossover gene conversion events in humans. Using SNP array data from 94 meioses, we identified 107 sites affected by non-crossover events, of which 51/53 were confirmed in sequence data. Our results suggest that a site is involved in a non-crossover event at a rate of 6.7 × 10-6/bp/generation, consistent with results from sperm-typing studies. Observed non-crossover events show strong allelic bias, with 70% (61–79%) of events transmitting GC alleles (P=7.9 × 10-5), and have tracts lengths that vary over more than an order of magnitude. Strikingly, in 4 of 15 regions with available resequencing data, multiple (~2–4) distinct non-crossover events cluster within ~20–30 kb. This pattern has not been reported previously in mammals and is inconsistent with canonical models of double strand break repair.

Characterization of the transcriptome, nucleotide sequence polymorphism, and natural selection in the desert adapted mouse Peromyscus eremicus

Characterization of the transcriptome, nucleotide sequence polymorphism, and natural selection in the desert adapted mouse Peromyscus eremicus

Matthew D MacManes, Michael B Eisen
doi: http://dx.doi.org/10.1101/009134

As a direct result of intense heat and aridity, deserts are thought to be among the most harsh of environments, particularly for their mammalian inhabitants. Given that osmoregulation can be challenging for these animals, with failure resulting in death, strong selection should be observed on genes related to the maintenance of water and solute balance. One such animal, Peromyscus eremicus, is native to the desert regions of the southwest United States and may live its entire life without oral fluid intake. As a first step toward understanding the genetics that underlie this phenotype, we present a characterization of the P. eremicus transcriptome. We assay four tissues (kidney, liver, brain, testes) from a single individual and supplement this with population level renal transcriptome sequencing from 15 additional animals. We identified a set of transcripts undergoing both purifying and balancing selection based on estimates of Tajima’s D. In addition, we used the branch-site test to identify a transcript – Slc2a9, likely related to desert osmoregulation – undergoing enhanced selection in P. eremicus relative to a set of related non-desert rodents.

Population genomic analysis uncovers African and European admixture in Drosophila melanogaster populations from the southeastern United States and Caribbean Islands

Population genomic analysis uncovers African and European admixture in Drosophila melanogaster populations from the southeastern United States and Caribbean Islands

Joyce Y Kao, Asif Zubair, Matthew P Salomon, Sergey V Nuzhdin, Daniel Campo
doi: http://dx.doi.org/10.1101/009092

Genome sequences from North American Drosophila melanogaster populations have become available to the scientific community. Deciphering the underlying population structure of these resources is crucial to make the most of these population genomic resources. Accepted models of North American colonization generally purport that several hundred years ago, flies from Africa and Europe were transported to the east coast United States and the Caribbean Islands respectively and thus current east coast US and Caribbean populations are an admixture of African and European ancestry. Theses models have been constructed based on phenotypes and limited genetic data. In our study, we have sequenced individual whole genomes of flies from populations in the southeast US and Caribbean Islands and examined these populations in conjunction with population sequences from Winters, CA, (USA); Raleigh, NC (USA); Cameroon (Africa); and Montpellier (France) to uncover the underlying population structure of North American populations. We find that west coast US populations are most like European populations likely reflecting a rapid westward expansion upon first settlements into North America. We also find genomic evidence of African and European admixture in east coast US and Caribbean populations, with a clinal pattern of decreasing proportions of African ancestry with higher latitude further supporting the proposed demographic model of Caribbean flies being established by African ancestors. Our genomic analysis of Caribbean flies is the first study that exposes the source of previously reported novel African alleles found in east coast US populations.