Successful asexual lineages of the Irish potato Famine pathogen are triploidYing Li, Qian Zhou, Kun Qian, Theo van der Lee, Sanwen Huang
doi: http://dx.doi.org/10.1101/024596
The oomycete Phytophthora infestans was the causal agent of the Irish Great Famine and is a recurring threat to global food security. The pathogen can reproduce both sexually and asexually and has a potential to adapt both abiotic and biotic environment. Although in many regions the A1 and A2 mating types coexist, the far majority of isolates belong to few clonal, asexual lineages. As other oomycetes, P. infestans is thought to be diploid during the vegetative phase of its life cycle, but it was observed that trisomy correlated with virulence and mating type locus and that polyploidy can occur in some isolates. It remains unknown about the frequency of polyploidy occurrence in nature and the relationship between ploidy level and sexuality. Here we discovered that the sexuality of P. infestans isolates correlates with ploidy by comparison of microsatellite fingerprinting, genome-wide polymorphism, DNA quantity, and chromosome numbers. The sexual progeny of P. infestans in nature are diploid, whereas the asexual lineages are mostly triploids, including successful clonal lineages US-1 and 13_A2. This study reveals polyploidization as an extra evolutionary risk to this notorious plant destroyer.
Yearly Archives: 2015
S/HIC: Robust identification of soft and hard sweeps using machine learning
S/HIC: Robust identification of soft and hard sweeps using machine learningDaniel R Schrider, Andrew D Kern
doi: http://dx.doi.org/10.1101/024547
Detecting the targets of adaptive natural selection from whole genome sequencing data is a central problem for population genetics. However, to date most methods have shown sub-optimal performance under realistic demographic scenarios. Moreover, over the past decade there has been a renewed interest in determining the importance of selection from standing variation in adaptation of natural populations, yet very few methods for inferring this model of adaptation at the genome scale have been introduced. Here we introduce a new method, S/HIC, which uses supervised machine learning to precisely infer the location of both hard and soft selective sweeps. We show that S/HIC has unrivaled accuracy for detecting sweeps under demographic histories that are relevant to human populations, and distinguishing sweeps from linked as well as neutrally evolving regions. Moreover we show that S/HIC is uniquely robust among its competitors to model misspecification. Thus even if the true demographic model of a population differs catastrophically from that specified by the user, S/HIC still retains impressive discriminatory power. Finally we apply S/HIC to the case of resequencing data from human chromosome 18 in a European population sample and demonstrate that we can reliably recover selective sweeps that have been identified earlier using less specific and sensitive methods.
Genomic study of the Ket: a Paleo-Eskimo-related ethnic group with significant ancient North Eurasian ancestry
Genomic study of the Ket: a Paleo-Eskimo-related ethnic group with significant ancient North Eurasian ancestryPavel Flegontov, Piya Changmai, Anastassiya Zidkova, Maria D. Logacheva, Olga Flegontova, Mikhail S. Gelfand, Evgeny S. Gerasimov, Ekaterina V. Khrameeva, Olga P. Konovalova, Tatiana Neretina, Yuri V. Nikolsky, George Starostin, Vita V. Stepanova, Igor V. Travinsky, Martin Tříska, Petr Tříska, Tatiana V. Tatarinova
doi: http://dx.doi.org/10.1101/024554
The Kets, an ethnic group in the Yenisei River basin, Russia, are considered the last nomadic hunter-gatherers of Siberia, and Ket language has no transparent affiliation with any language family. We investigated connections between the Kets and Siberian and North American populations, with emphasis on the Mal’ta and Paleo-Eskimo ancient genomes, using original data from 46 unrelated samples of Kets and 42 samples of their neighboring ethnic groups (Uralic-speaking Nganasans, Enets, and Selkups). We genotyped over 130,000 autosomal SNPs, determined mitochondrial and Y-chromosomal haplogroups, and performed high-coverage genome sequencing of two Ket individuals. We established that the Kets belong to the cluster of Siberian populations related to Paleo-Eskimos. Unlike other members of this cluster (Nganasans, Ulchi, Yukaghirs, and Evens), Kets and closely related Selkups have a high degree of Mal’ta ancestry. Implications of these findings for the linguistic hypothesis uniting Ket and Na-Dene languages into a language macrofamily are discussed.
Genomic analysis of allele-specific expression in the mouse liver
Genomic analysis of allele-specific expression in the mouse liverAshutosh K Pandey, Robert W Williams
doi: http://dx.doi.org/10.1101/024588
Genetic differences in gene expression contribute significantly to phenotypic diversity and differences in disease susceptibility. In fact, the great majority of causal variants highlighted by genome-wide association are in non-coding regions that modulate expression. In order to quantify the extent of allelic differences in expression, we analyzed liver transcriptomes of isogenic F1 hybrid mice. Allele-specific expression (ASE) effects are pervasive and are detected in over 50% of assayed genes. Genes with strong ASE do not differ from those with no ASE with respect to their length or promoter complexity. However, they have a higher density of sequence variants, higher functional redundancy, and lower evolutionary conservation compared to genes with no ASE. Fifty percent of genes with no ASE are categorized as house-keeping genes. In contrast, the high ASE set may be critical in phenotype canalization. There is significant overlap between genes that exhibit ASE and those that exhibit strong cis expression quantitative trait loci (cis eQTLs) identified using large genetic expression data sets. Eighty percent of genes with cis eQTLs also have strong ASE effects. Conversely, 40% of genes with ASE effects are associated with strong cis eQTLs. Cis-acting variation detected at the protein level is also detected at the transcript level, but the converse is not true. ASE is a highly sensitive and direct method to quantify cis-acting variation in gene expression and complements and extends classic cis eQTL analysis. ASE differences can be combined with coding variants to produce a key resource of functional variants for precision medicine and genome-to-phenome mapping.
Fast and efficient QTL mapper for thousands of molecular phenotypes
Fast and efficient QTL mapper for thousands of molecular phenotypes
Halit Ongen, Alfonso Buil, Andrew Brown, Emmanouil Dermitzakis, Olivier Delaneau
doi: http://dx.doi.org/10.1101/022301
Motivation: In order to discover quantitative trait loci (QTLs), multi-dimensional genomic data sets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes with millions of genetic variants while appropriately controlling for multiple testing. Results: We have developed FastQTL, a method that implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing. The outcome of permutations is modeled using beta distributions trained from a few permutations and from which adjusted p-values can be estimated at any level of significance with little computational cost. The Geuvadis & GTEx pilot data sets can be now easily analyzed an order of magnitude faster than previous approaches. Availability: Source code, binaries and comprehensive documentation of FastQTL are freely available to download at http://fastqtl.sourceforge.net/
Impact of the X chromosome and sex on regulatory variation
Impact of the X chromosome and sex on regulatory variation
Kimberly R Kukurba, Princy Parsana, Kevin S Smith, Zachary Zappala, David A Knowles, Marie-Julie Favé, Xin Li, Xiaowei Zhu, James B Potash, Myrna M Weissman, Jianxin Shi, Anshul Kundaje, Douglas F Levinson, Philip Awadalla, Sara Mostafavi, Alexis Battle, Stephen B Montgomery
doi: http://dx.doi.org/10.1101/024117
The X chromosome, with its unique mode of inheritance, contributes to differences between the sexes at a molecular level, including sex-specific gene expression and sex-specific impact of genetic variation. We have conducted an analysis of the impact of both sex and the X chromosome on patterns of gene expression identified through transcriptome sequencing of whole blood from 922 individuals. We identified that genes on the X chromosome are more likely to have sex-specific expression compared to the autosomal genes. Furthermore, we identified a depletion of regulatory variants on the X chromosome, especially among genes under high selective constraint. In contrast, we discovered an enrichment of sex-specific regulatory variants on the X chromosome. To resolve the molecular mechanisms underlying such effects, we generated and connected sex-specific chromatin accessibility to sex-specific expression and regulatory variation. As sex-specific regulatory variants can inform sex differences in genetic disease prevalence, we have integrated our data with genome-wide association study data for multiple immune traits and to identify traits with significant sex biases. Together, our study provides genome-wide insight into how the X chromosome and sex shape human gene regulation and disease.
Genome variation and meiotic recombination in Plasmodium falciparum: insights from deep sequencing of genetic crosses
Alistair Miles, Zamin Iqbal, Paul Vauterin, Richard Pearson, Susana Campino, Michel Theron, Kelda Gould, Daniel Mead, Eleanor Drury, John O’Brien, Valentin Ruano Rubio, Bronwyn MacInnis, Jonathan Mwangi, Upeka Samarakoon, Lisa Ranford-Cartwright, Michael Ferdig, Karen Hayton, Xinzhuan Su, Thomas Wellems, Julian Rayner, Gil McVean, Dominic Kwiatkowski
doi: http://dx.doi.org/10.1101/024182
The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable, and thus difficult to analyse using short read technologies. Here we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first integrated analysis of SNP, INDEL and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that INDELs are exceptionally abundant and the dominant mode of polymorphism within the core genome. We analyse patterns of meiotic recombination, including the relative contribution of crossover and non-crossover events, and we observe several instances of recombination that modify copy number variants associated with drug resistance. We describe a novel web application that allows these data to be explored in detail.
Isolation-By-Distance-and-Time in a stepping-stone model
Isolation-By-Distance-and-Time in a stepping-stone model
Nicolas Duforet-Frebourg, Montgomery Slatkin
doi: http://dx.doi.org/10.1101/024133
With the great advances in ancient DNA extraction, population genetics data are now made of geographically separated individuals from both present and ancient times. However, population genetics theory about the joint effect of space and time has not been thoroughly studied. Based on the classical stepping–stone model, we develop the theory of Isolation by Distance and Time. We derive the correlation of allele frequencies between demes in the case where ancient samples are present in the data, and investigate the impact of edge effects with forward-in-time simulations. We also derive results about coalescent times in circular/toroidal models. As one of the most common way to investigate population structure is to apply principal component analysis, we evaluate the impact of this theory on plots of principal components. Our results demonstrate that time between samples is a non-negligible factor that requires new attention in population genetics.
Integrative approaches for large-scale transcriptome-wide association studies
Integrative approaches for large-scale transcriptome-wide association studies
Alexander Gusev, Arthur Ko, Huwenbo Shi, Gaurav Bhatia, Wonil Chung, Brenda WJ Penninx, Rick Jansen, Eco JC de Geus, Dorret I Boomsma, Fred A Wright, Patrick F Sullivan, Elina Nikkola, Marcus Alvarez, Mete Civelek, Aldonis J Lusis, Terho Lehtimaki, Emma Raitoharju, Mika Kahonen, Ilkka Seppala, Olli Raitakari, Johanna Kuusisto, Markku Laakso, Alkes L Price, Paivi Pajukanta, Bogdan Pasaniuc
doi: http://dx.doi.org/10.1101/024083
Many genetic variants influence complex traits by modulating gene expression, thus altering the abundance levels of one or multiple proteins. In this work we introduce a powerful strategy that integrates gene expression measurements with large-scale genome-wide association data to identify genes whose cis-regulated expression is associated to complex traits. We use a relatively small reference panel of individuals for which both genetic variation and gene expression have been measured to impute gene expression into large cohorts of individuals and identify expression-trait associations. We extend our methods to allow for indirect imputation of the expression-trait association from summary association statistics of large-scale GWAS1-3. We applied our approaches to expression data from blood and adipose tissue measured in ~3,000 individuals overall. We then imputed gene expression into GWAS data from over 900,000 phenotype measurements4-6 to identify 69 novel genes significantly associated to obesity-related traits (BMI, lipids, and height). Many of the novel genes were associated with relevant phenotypes in the Hybrid Mouse Diversity Panel. Overall our results showcase the power of integrating genotype, gene expression and phenotype to gain insights into the genetic basis of complex traits.
Cross-species transmission and differential fate of an endogenous retrovirus in three mammal lineages
Xiaoyu Zhuo, Cedric Feschotte
doi: http://dx.doi.org/10.1101/024190
Endogenous retroviruses (ERVs) arise from retroviruses chromosomally integrated in the host germline. ERVs are common in vertebrate genomes and provide a valuable fossil record of past retroviral infections to investigate the biology and evolution of retroviruses over a deep time scale, including cross-species transmission events. Here we took advantage of a catalog of ERVs we recently produced for the bat Myotis lucifugus to seek evidence for infiltration of these retroviruses in other mammalian species (>100) currently represented in the genome sequence database. We provide multiple lines of evidence for the cross-ordinal transmission of a gammaretrovirus endogenized independently in the lineages of vespertilionid bats, felid cats and pangolin ~13-25 million years ago. Following its initial introduction, the ERV amplified extensively in parallel in both bat and cat lineages, generating hundreds of species-specific insertions throughout evolution. However, despite being derived from the same viral species, phylogenetic and selection analyses suggest that the ERV experienced different amplification dynamics in the two mammalian lineages. In the cat lineage, the ERV appears to have expanded primarily by retrotransposition of a single proviral progenitor that lost infectious capacity shortly after endogenization. In the bat lineage, the ERV followed a more complex path of germline invasion characterized by both retrotransposition and multiple infection events. The results also suggest that some of the bat ERVs have maintained infectious capacity for extended period of time and may be still infectious today. This study provides one of the most rigorously documented cases of cross-ordinal transmission of a mammalian retrovirus. It also illustrates how the same retrovirus species has transitioned multiple times from an infectious pathogen to a genomic parasite (i.e. retrotransposon), yet experiencing different invasion dynamics in different mammalian hosts.