Nonparametric inference of the distribution of fitness effects across functional categories in humans

Nonparametric inference of the distribution of fitness effects across functional categories in humans

Fernando Racimo, Joshua G Schraiber

Quantifying the proportion of polymorphic mutations that are deleterious or neutral is of fundamental importance to our understanding of evolution, disease genetics and the maintenance of variation genome-wide. Here, we develop an approximation to the distribution of fitness effects (DFE) of segregating single-nucleotide mutations in humans. Unlike previous methods, we do not assume that synonymous mutations are neutral, or rely on fitting the DFE of new nonsynonymous mutations to a particular parametric probability distribution, which is poorly motivated on a biological level. We rely on a previously developed method that utilizes a variety of published annotations (including conservation scores, protein deleteriousness estimates and regulatory data) to score all mutations in the human genome based on how likely they are to be affected by negative selection, controlling for mutation rate. We map this score to a scale of fitness coefficients via maximum likelihood using diffusion theory and a Poisson random field model. We then use our coefficient mapping to quantify the distribution of all scored single-nucleotide polymorphisms in Yoruba and Europeans. Our method serves to approximate the DFE of any type of segregating mutations, regardless of its genomic consequence, and so allows us to compare the proportion of mutations that are negatively selected or neutral across various genomic categories, including different types of regulatory sites. We observe that the distribution of intergenic polymorphisms is highly leptokurtic, with a strong peak at neutrality, while the distribution of nonsynonymous polymorphisms is bimodal, with a neutral peak and a second peak at s ≈ −10^(−4). Other types of polymorphisms have shapes that fall roughly in between these two.

Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans

Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans
Rebekah L. Rogers, Julie M. Cridland, Ling Shao, Tina T. Hu, Peter Andolfatto, Kevin R. Thornton
(Submitted on 28 Jan 2014)

We have used whole genome paired-end Illumina sequence data to identify tandem duplications in 20 isofemale lines of D. yakuba, and 20 isofemale lines of D. simulans and performed genome wide validation with PacBio long molecule sequencing. We identify 1,415 tandem duplications that are segregating in D. yakuba as well as 975 duplications in D. simulans, indicating greater variation in D. yakuba. Additionally, we observe high rates of secondary deletions at duplicated sites, with 8% of duplicated sites in D. simulans and 17% of sites in D. yakuba modified with deletions. These secondary deletions are consistent with the action of the large loop mismatch repair system acting to remove polymorphic tandem duplication, resulting in rapid dynamics of gain and loss in duplicated alleles and a richer substrate of genetic novelty than has been previously reported. Most duplications are present in only single strains, suggesting deleterious impacts are common. However, we do observe signals consistent with adaptive evolution. D. simulans shows an excess of whole gene duplications and an excess of high frequency variants on the X chromosome, consistent with adaptive evolution through duplications on the D. simulans X. We identify 79 chimeric genes in D. yakuba and 38 chimeric genes in D. simulans, as well as 143 cases of recruited non-coding sequence in D. yakuba and 96 in D. simulans, in agreement with rates of chimeric gene origination in D. melanogaster. Together, these results suggest that tandem duplications often result in complex variation beyond whole gene duplications that offers a rich substrate of standing variation that is likely to contribute both to detrimental phenotypes and disease, as well as to adaptive evolutionary change.

SINGLE NUCLEOTIDE POLYMORPHISMS SHED LIGHT ON CORRELATIONS BETWEEN ENVIRONMENTAL VARIABLES AND ADAPTIVE GENETIC DIVERGENCE AMONG POPULATIONS IN ONCORHYNCHUS KETA

SINGLE NUCLEOTIDE POLYMORPHISMS SHED LIGHT ON CORRELATIONS BETWEEN ENVIRONMENTAL VARIABLES AND ADAPTIVE GENETIC DIVERGENCE AMONG POPULATIONS IN ONCORHYNCHUS KETA

Xilin Deng, Philippe Henry

Identifying the genetic and ecological basis of adaptation is of immense importance in evolutionary biology. In our study, we applied a panel of 58 biallelic single nucleotide polymorphisms (SNPs) for the economically and culturally important salmonid Oncorhynchus keta. Samples included 4164 individuals from 43 populations ranging from Coastal Western Alaska to southern British Colombia and northern Washington. Signatures of natural selection were detected by identifying seven outlier loci using two independent approaches: one based on outlier detection and another based on environmental correlations. Evidence of divergent selection at two candidate SNP loci, Oke_RFC2-168 and Oke_MARCKS-362, indicates significant environmental correlations, particularly with the number of frost-free days (NFFD). Important associations found between environmental variables and outlier loci indicate that those environmental variables could be the major driving forces of allele frequency divergence at the candidate loci. NFFD, in particular, may play an important adaptive role in shaping genetic variation in O. keta. Correlations between divergent selection and local environmental variables will help shed light on processes of natural selection and molecular adaptation to local environmental conditions.

Demography and the age of rare variants

Demography and the age of rare variants
Iain Mathieson, Gil McVean
(Submitted on 16 Jan 2014)

Recently, large whole-genome sequencing projects have provided access to much of the rare variation in human populations. This variation is highly informative about population structure and recent demography. In this paper, we show how the age of rare variants can be estimated from patterns of haplotype sharing and how this information can detect and quantify historical relationships between populations. We investigate the distribution of the age of f2 variants in a worldwide sample sequenced by the 1,000 Genomes Project, revealing enormous variation across populations. The median age of f2 variants shared within continents is 50 to 160 generations for Europe and Asia, and 170 to 320 generations for Africa. Variants shared between continents are much older with median ages ranging from 320 to 670 generations between Europe and Asia, and 1,000 to 2,400 generations between African and Non-African populations. The distribution of the ages of variants shared across populations is informative about their demography, revealing recent bottlenecks, ancient splits, and more modern connections between populations. We see the signature of selection in the observation that functional variants are significantly younger than nonfunctional variants of the same frequency. This approach is relatively insensitive to mutation rate and complements other nonparametric methods for demographic inference.

Population genomics of Saccharomyces cerevisiae human isolates: passengers, colonizers, invaders.
Carlotta De Filippo, Monica Di Paola, Irene Stefanini, Lisa Rizzetto, Luisa Berná, Matteo Ramazzotti, Leonardo Dapporto, Damariz Rivero, Ivo G Gut, Marta Gut, Mónica Bayés, Jean-Luc Legras, Roberto Viola, Cristina Massi-Benedetti, Antonella De Luca, Luigina Romani, Paolo Lionetti, Duccio Cavalieri

The quest for the ecological niches of Saccharomyces cerevisiae ranged from wineries to oaks and more recently to the gut of Crabro Wasps. Here we propose the role of the human gut in shaping S. cerevisiae evolution, presenting the genetic structure of a previously unknown population of yeasts, associated with Crohn?s disease, providing evidence for clonal expansion within human?s gut. To understand the role of immune function in the human-yeast interaction we classified strains according to their immunomodulatory properties, discovering a set of genetically homogeneous isolates, capable of inducing anti-inflammatory signals via regulatory T cells proliferation, and on the contrary, a positive association between strain mosaicism and ability to elicit inflammatory, IL-17 driven, immune responses. The approach integrating genomics with immune phenotyping showed selection on genes involved in sporulation and cell wall remodeling as central for the evolution of S. cerevisiae Crohn?s strains from passengers to commensals to potential pathogens.

Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences

Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences
Sebastian Lippold, Hongyang Xu, Albert Ko, Mingkun Li, Gabriel Renaud, Anne Butthof, Roland Schroeder, Mark Stoneking

To investigate in detail the paternal and maternal demographic histories of humans, we obtained ~500 kb of non-recombining Y chromosome (NRY) sequences and complete mtDNA genome sequences from 623 males from 51 populations in the CEPH Human Genome Diversity Panel (HGDP). Our results: confirm the controversial assertion that genetic differences between human populations on a global scale are bigger for the NRY than for mtDNA; suggest very small ancestral effective population sizes (<100) for the out-of-Africa migration as well as for many human populations; and indicate that the ratio of female effective population size to male effective population size (Nf/Nm) has been greater than one throughout the history of modern humans, and has recently increased due to faster growth in Nf. However, we also find substantial differences in patterns of mtDNA vs. NRY variation in different regional groups; thus, global patterns of variation are not necessarily representative of specific geographic regions.

Happy New Year Homo erectus? More evidence for interbreeding with archaics predating the modern human/Neanderthal split

Happy New Year Homo erectus? More evidence for interbreeding with archaics predating the modern human/Neanderthal split
Peter J. Waddell
(Submitted on 30 Dec 2013)

A range of a priori hypotheses about the evolution of modern and archaic genomes are further evaluated and tested. In addition to the well-known splits/introgressions involving Neanderthal genes into out-of- Africa people, or Denisovan genes into Oceanians, a further series of archaic splits and hypotheses proposed in Waddell et al. (2011) are considered in detail. These include signals of Denisovans with something markedly more archaic and possibly something more archaic into Papuans as well. These are compared and contrasted with some well-advertised introgressions such as Denisovan genes across East Asia, archaic genes into San or non-tree mixing between Oceanians, East Asians and Europeans. The general result is that these less appreciated and surprising archaic splits have just as much or more support in genome sequence data. Further, evaluation confirms the hypothesis that archaic genes are much rarer on modern X chromosomes, and may even be near totally absent, suggesting strong selection against their introgression. Modeling of relative split weights allows an inference of the proportion of the genome the Denisovan seems to have gotten from an older archaic, and the best estimate is around 2%. Using a mix of quantitative and qualitative morphological data and novel phylogenetic methods, robust support is found for multiple distinct middle Pleistocene lineages. Of these, fossil hominids such as SH5, Petralona, and Dali, in particular, look like prime candidates for contributing pre-Neanderthal/Modern archaic genes to Denisovans, while the Jinniu-Shan fossil looks like the best candidate for a close relative of the Denisovan. That the Papuans might have received some truly archaic genes appears a good possibility and they might even be from Homo erectus.

Sequence Capture Versus Restriction Site Associated DNA Sequencing for Phylogeography

Sequence Capture Versus Restriction Site Associated DNA Sequencing for Phylogeography
Michael G. Harvey, Brian Tilston Smith, Travis C. Glenn, Brant C. Faircloth, Robb T. Brumfield
(Submitted on 22 Dec 2013)

Genomic datasets generated with massively parallel sequencing methods have the potential to propel systematics in new and exciting directions, but selecting appropriate markers and methods is not straightforward. We applied two approaches with particular promise for systematics, restriction site associated DNA sequencing (RAD-Seq) and sequence capture (Seq-cap) of ultraconserved elements (UCEs), to the same set of samples from a non-model, Neotropical bird. We found that both RAD-Seq and Seq-cap produced genomic datasets containing thousands of loci and SNPs and that the inferred population assignments and species trees were concordant between datasets. However, model-based estimates of demographic parameters differed between datasets, particularly when we estimated the parameters using a method based on allele frequency spectra. The differences we observed may result from differences in assembly, alignment, and filtering of sequence data between methods, and our findings suggest that caution is warranted when using allele frequencies to estimate parameters from low-coverage sequencing data. We further explored the differences between methods using simulated Seq-cap- and RAD-Seq-like datasets. Analyses of simulated data suggest that increasing the number of loci from 500 to 5000 increased phylogenetic concordance factors and the accuracy and precision of demographic parameter estimates, but increasing the number of loci past 5000 resulted in minimal gains. Increasing locus length from 64 bp to 500 bp improved phylogenetic concordance factors and minimal gains were observed with loci longer than 500 bp, but locus length did not influence the accuracy and precision of demographic parameter estimates. We discuss our results relative to the diversity of data collection methods available, and we provide advice for harnessing next-generation sequencing for systematics research.

Ancient human genomes suggest three ancestral populations for present-day Europeans

Ancient human genomes suggest three ancestral populations for present-day Europeans
Iosif Lazaridis, Nick Patterson, Alissa Mittnik, Gabriel Renaud, Swapan Mallick, Peter H. Sudmant, Joshua G. Schraiber, Sergi Castellano, Karola Kirsanow, Christos Economou, Ruth Bollongino, Qiaomei Fu, Kirsten Bos, Susanne Nordenfelt, Cesare de Filippo, Kay Prüfer, Susanna Sawyer, Cosimo Posth, Wolfgang Haak, Fredrik Hallgren, Elin Fornander, George Ayodo, Hamza A. Babiker, Elena Balanovska, Oleg Balanovsky, Haim Ben-Ami, Judit Bene, Fouad Berrada, Francesca Brisighelli, George B.J. Busby, Francesco Cali, Mikhail Churnosov, David E.C. Cole, Larissa Damba, Dominique Delsate, George van Driem, Stanislav Dryomov, Sardana A. Fedorova, Michael Francken, Irene Gallego Romero, Marina Gubina, Jean-Michel Guinet, Michael Hammer, Brenna Henn, Tor Helvig, Ugur Hodoglugil, Aashish R. Jha, Rick Kittles, Elza Khusnutdinova, Toomas Kivisild, Vaidutis Kučinskas, Rita Khusainova, Alena Kushniarevich, Leila Laredj, Sergey Litvinov, Robert W. Mahley, Béla Melegh, Ene Metspalu, Joanna Mountain, Thomas Nyambo, Ludmila Osipova, Jüri Parik, Fedor Platonov, Olga L. Posukh, Valentino Romano, Igor Rudan, Ruslan Ruizbakiev, Hovhannes Sahakyan, Antonio Salas, Elena B. Starikovskaya, Ayele Tarekegn, Draga Toncheva, Shahlo Turdikulova, Ingrida Uktveryte, Olga Utevska, Mikhail Voevoda, Joachim Wahl, Pierre Zalloua, Levon Yepiskoposyan, Tatijana Zemunik, Alan Cooper, Cristian Capelli, Mark G. Thomas, Sarah A. Tishkoff, Lalji Singh, Kumarasamy Thangaraj, Richard Villems, David Comas, Rem Sukernik, Mait Metspalu, Matthias Meyer, Evan E. Eichler, Joachim Burger, Montgomery Slatkin, Svante Pääbo, Janet Kelso, David Reich, Johannes Krause

Analysis of ancient DNA can reveal historical events that are difficult to discern through study of present-day individuals. To investigate European population history around the time of the agricultural transition, we sequenced complete genomes from a ~7,500 year old early farmer from the Linearbandkeramik (LBK) culture from Stuttgart in Germany and an ~8,000 year old hunter-gatherer from the Loschbour rock shelter in Luxembourg. We also generated data from seven ~8,000 year old hunter-gatherers from Motala in Sweden. We compared these genomes and published ancient DNA to new data from 2,196 samples from 185 diverse populations to show that at least three ancestral groups contributed to present-day Europeans. The first are Ancient North Eurasians (ANE), who are more closely related to Upper Paleolithic Siberians than to any present-day population. The second are West European Hunter-Gatherers (WHG), related to the Loschbour individual, who contributed to all Europeans but not to Near Easterners. The third are Early European Farmers (EEF), related to the Stuttgart individual, who were mainly of Near Eastern origin but also harbored WHG-related ancestry. We model the deep relationships of these populations and show that about ~44% of the ancestry of EEF derived from a basal Eurasian lineage that split prior to the separation of other non-Africans.

Selection signatures in worldwide Sheep populations

Selection signatures in worldwide Sheep populations
Maria-Ines FarielloBertrand ServinGwenola Tosser-KloppRachelle RuppCarole MorenoMagali San Cristobals imon boitard

The diversity of populations in domestic species offer great opportunities to study genome response to selection. The recently published Sheep Hapmap dataset is a great example of characterization of the world wide genetic diversity in the Sheep. In this study, we re-analyzed the Sheep Hapmap dataset to identify selection signatures in worldwide Sheep populations. Compared to previous analyses, we make use of statistical methods that (i) take account of the hierarchical structure of sheep populations, (ii) make use of Linkage Disequilibrium information and (iii) focus specifically on either recent or older selection signatures. We show that this allows to pinpoint several new selection signatures in the sheep genome and to distinguish those related to modern breeding objectives and to earlier post-domestication constraints. The newly identified regions, together with the one previously identified, reveal the extensive genome response to selection on morphology, color and adaptation to new environments.