Extraordinarily wide genomic impact of a selective sweep associated with the evolution of sex ratio distorter suppression

Extraordinarily wide genomic impact of a selective sweep associated with the evolution of sex ratio distorter suppression
Emily A Hornett, Bruce Moran, Louise A Reynolds, Sylvain Charlat, Samuel Tazzyman, Nina Wedell, Chris D Jiggins, Gregory Hurst

Symbionts that distort their host?s sex ratio by favouring the production and survival of females are common in arthropods. Their presence produces intense Fisherian selection to return the sex ratio to parity, typified by the rapid spread of host ?suppressor? loci that restore male survival/development. In this study, we investigated the genomic impact of a selective event of this kind in the butterfly Hypolimnas bolina. Through linkage mapping we first identified a genomic region that was necessary for males to survive Wolbachia-induced killing. We then investigated the genomic impact of the rapid spread of suppression that converted the Samoan population of this butterfly from a 100:1 female-biased sex ratio in 2001, to a 1:1 sex ratio by 2006. Models of this process revealed the potential for a chromosome-wide selective sweep. To measure the impact directly, the pattern of genetic variation before and after the episode of selection was compared. Significant changes in allele frequencies were observed over a 25cM region surrounding the suppressor locus, alongside generation of linkage disequilibrium. The presence of novel allelic variants in 2006 suggests that the suppressor was introduced via immigration rather than through de novo mutation. In addition, further sampling in 2010 indicated that many of the introduced variants were lost or had reduced in frequency since 2006. We hypothesise that this loss may have resulted from a period of purifying selection – removing deleterious material that introgressed during the initial sweep. Our observations of the impact of suppression of sex ratio distorting activity reveal an extraordinarily wide genomic imprint, reflecting its status as one of the strongest selective forces in nature.

inPHAP: Interactive visualization of genotype and phased haplotype data

inPHAP: Interactive visualization of genotype and phased haplotype data
Günter Jäger, Alexander Peltzer, Kay Nieselt
Comments: BioVis 2014 conference
Subjects: Graphics (cs.GR); Genomics (q-bio.GN)

Background: To understand individual genomes it is necessary to look at the variations that lead to changes in phenotype and possibly to disease. However, genotype information alone is often not sufficient and additional knowledge regarding the phase of the variation is needed to make correct interpretations. Interactive visualizations, that allow the user to explore the data in various ways, can be of great assistance in the process of making well informed decisions. But, currently there is a lack for visualizations that are able to deal with phased haplotype data. Results: We present inPHAP, an interactive visualization tool for genotype and phased haplotype data. inPHAP features a variety of interaction possibilities such as zooming, sorting, filtering and aggregation of rows in order to explore patterns hidden in large genetic data sets. As a proof of concept, we apply inPHAP to the phased haplotype data set of Phase 1 of the 1000 Genomes Project. Thereby, inPHAP’s ability to show genetic variations on the population as well as on the individuals level is demonstrated for several disease related loci. Conclusions: As of today, inPHAP is the only visual analytical tool that allows the user to explore unphased and phased haplotype data interactively. Due to its highly scalable design, inPHAP can be applied to large datasets with up to 100 GB of data, enabling users to visualize even large scale input data. inPHAP closes the gap between common visualization tools for unphased genotype data and introduces several new features, such as the visualization of phased data.

V genes in primates from whole genome shotgun data

V genes in primates from whole genome shotgun data
David N Olivieri, Francisco Gambon-Deza

The adaptive immune system uses V genes for antigen recognition. The evolutionary diversification and selection processes within and across species and orders are poorly understood. Here, we studied the amino acid (AA) sequences obtained of translated in-frame V exons of immunoglobulins (IG) and T cell receptors (TR) from 16 primate species whose genomes have been sequenced. Multi-species comparative analysis supports the hypothesis that V genes in the IG loci undergo birth/death processes, thereby permitting rapid adaptability over evolutionary time. We also show that multiple cladistic groupings exist in the TRA (35 clades) and TRB (25 clades) V gene loci and that each primate species typically contributes at least one V gene to each of these clade. The results demonstrate that IG V genes and TR V genes have quite different evolutionary pathways; multiple duplications can explain the IG loci results, while co-evolutionary pressures can explain the phylogenetic results, as seen in genes of the TR loci. We describe how each of the 35 V genes clades of the TRA locus and 25 clades of the TRB locus must have specific and necessary roles for the viability of the species.

Improved genome inference in the MHC using a population reference graph

Improved genome inference in the MHC using a population reference graph
Alexander Dilthey, Charles J Cox, Zamin Iqbal, Matthew R Nelson, Gil McVean

In humans and many other species, while much is known about the extent and structure of genetic variation, such information is typically not used in assembling novel genomes. Rather, a single reference is used against which to map reads, which can lead to poor characterisation of regions of high sequence or structural diversity. Here, we introduce a population reference graph, which combines multiple reference sequences as well as catalogues of SNPs and short indels. The genomes of novel samples are reconstructed as paths through the graph using an efficient hidden Markov Model, allowing for recombination between different haplotypes and variants. By applying the method to the 4.5Mb extended MHC region on chromosome 6, combining eight assembled haplotypes, sequences of known classical HLA alleles and 87,640 SNP variants from the 1000 Genomes Project, we demonstrate, using simulations, SNP genotyping, short-read and long-read data, how the method improves the accuracy of genome inference. Moreover, the analysis reveals regions where the current set of reference sequences is substantially incomplete, particularly within the Class II region, indicating the need for continued development of reference-quality genome sequences.

Convergent Evolution During Local Adaptation to Patchy Landscapes

Convergent Evolution During Local Adaptation to Patchy Landscapes
Peter L. Ralph, Graham Coop

Species often encounter, and adapt to, many patches of locally similar environmental conditions across their range. Such adaptation can occur through convergent evolution as different alleles arise and spread in different patches, or through the spread of alleles by migration acting to synchronize adaptation across the species. The tension between the two reflects the degree of constraint imposed on evolution by the underlying genetic architecture versus how how effectively selection acts to inhibit the geographic spread of locally adapted alleles. This paper studies a model of the balance between these two routes to adaptation in continuous environments with patchy selection pressures. We address the following questions: How long does it take for a new, locally adapted allele to appear in a patch of habitat where it is favored through new mutation? Or, through migration from another, already adapted patch? Which is more likely to occur, as a function of distance between the patches? How can we tell which has occurred, i.e.\ what population genetic signal is left by the spread of migrant alleles and how long does this signal persist for? To answer these questions we decompose the migration–selection equilibrium surrounding an already adapted patch into families of migrant alleles, in particular treating those rare families that reach new patches as spatial branching processes. This provides a way to understand the role of geographic separation between patches in promoting convergent adaptation and the genomic signals it leaves behind. We illustrate these ideas using the convergent evolution of cryptic coloration in the rock pocket mouse, Chaetodipus intermedius, as an empirical example.

Purifying selection, drift and reversible mutation with arbitrarily high mutation rates


Purifying selection, drift and reversible mutation with arbitrarily high mutation rates

Brian Charlesworth, Kavita Jain
Comments: Supplementary Information available on request
Subjects: Populations and Evolution (q-bio.PE)

Some species exhibit very high levels of DNA sequence variability; there is also evidence for the existence of heritable epigenetic variants that experience state changes at a much higher rate than sequence variants. In both cases, the resulting high diversity levels within a population (hyperdiversity) mean that standard population genetics methods are not trustworthy. We analyze a population genetics model that incorporates purifying selection, reversible mutations and genetic drift, assuming a stationary population size. We derive analytical results for both population parameters and sample statistics, and discuss their implications for studies of natural genetic and epigenetic variation. In particular, we find that (1) many more intermediate frequency variants are expected than under standard models, even with moderately strong purifying selection (2) rates of evolution under purifying selection may be close to, or even exceed, neutral rates. These findings are related to empirical studies of sequence and epigenetic variation.

Phylogenetics and the human microbiome

Phylogenetics and the human microbiome
Frederick A Matsen IV
Comments: to appear in Systematic Biology
Subjects: Populations and Evolution (q-bio.PE); Genomics (q-bio.GN)

The human microbiome is the ensemble of genes in the microbes that live inside and on the surface of humans. Because microbial sequencing information is now much easier to come by than phenotypic information, there has been an explosion of sequencing and genetic analysis of microbiome samples. Much of the analytical work for these sequences involves phylogenetics, at least indirectly, but methodology has developed in a somewhat different direction than for other applications of phylogenetics. In this paper I review the field and its methods from the perspective of a phylogeneticist, as well as describing current challenges for phylogenetics coming from this type of work.

Bayesian Coalescent Epidemic Inference: Comparison of Stochastic and Deterministic SIR Population Dynamics


Bayesian Coalescent Epidemic Inference: Comparison of Stochastic and Deterministic SIR Population Dynamics

Alex Popinga, Tim Vaughan, Tanja Stadler, Alexei Drummond
Comments: Submitted
Subjects: Populations and Evolution (q-bio.PE)

Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingmans coalescent as well as from birth death branching processes. The development of alternative approaches merits investigation of their characteristics and differences. Here we use recently described coalescent theory for epidemic dynamics to develop stochastic and deterministic coalescent SIR tree priors. We implement these in a Bayesian phylogenetic inference framework to permit joint estimation of SIR epidemic parameters and the sample genealogy. We assess the models performance and contrast results obtained with a recently published birth death sampling model for epidemic inference. Comparisons are made by analyzing sets of genealogies simulated under precisely known epidemiological parameters. We also compare results of analyses using published HIV1 sequence data obtained from known UK infection clusters. We show that the coalescent SIR model is effective at estimating epidemiological parameters from data with large fundamental reproductive number R0 and large population size S0. We find that the stochastic variant generally outperforms its deterministic counterpart. However, each of these Bayesian estimators are shown to have undesirable properties in certain circumstances, especially for epidemic outbreaks with R0 close to one or with small susceptible populations.

Single haplotype assembly of the human genome from a hydatidiform mole

Single haplotype assembly of the human genome from a hydatidiform mole

Karyn Meltz Steinberg, Valerie K Schneider, Tina A Graves-Lindsay, Robert S Fulton, Richa Agarwala, John Huddleston, Sergey A Shiryayev, Aleksandr Morgulis, Urvashi Surti, Wesley C Warren, Deanna M Church, Evan E Eichler, Richard K Wilson

An accurate and complete reference human genome sequence assembly is essential for accurately interpreting individual genomes and associating sequence variation with disease phenotypes. While the current reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can help overcome these problems, even the longest available reads do not resolve all regions of the human genome. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones, an optical map, and 100X whole genome shotgun (WGS) sequence coverage using short (Illumina) read pairs. We used the WGS sequence and the GRCh37 reference assembly to create a sequence assembly of the CHM1 genome. We subsequently incorporated 382 finished CHORI-17 BAC clone sequences to generate a second draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene and repeat content show this assembly to be of excellent quality and contiguity, and comparisons to ClinVar and the NHGRI GWAS catalog show that the CHM1 genome does not harbor an excess of deleterious alleles. However, comparison to assembly-independent resources, such as BAC clone end sequences and long reads generated by a different sequencing technology (PacBio), indicate misassembled regions. The great majority of these regions is enriched for structural variation and segmental duplication, and can be resolved in the future by sequencing BAC clone tiling paths. This publicly available first generation assembly will be integrated into the Genome Reference Consortium (GRC) curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.

Sequencing of the human IG light chain loci from a hydatidiform mole BAC library reveals locus-specific signatures of genetic diversity

Sequencing of the human IG light chain loci from a hydatidiform mole BAC library reveals locus-specific signatures of genetic diversity

Corey T Watson, Karyn Meltz Steinberg, Tina A Graves-Lindsay, Rene L Warren, Maika Malig, Jacqueline E Schein, Richard K Wilson, Rob Holt, Evan Eichler, Felix Breden

Germline variation at immunoglobulin gene (IG) loci is critical for pathogen-mediated immunity, but establishing complete reference sequences in these regions is problematic because of segmental duplications and somatically rearranged source DNA. We sequenced BAC clones from the essentially haploid hydatidiform mole, CHM1, across the light chain IG loci, kappa (IGK) and lambda (IGL), creating single haplotype representations of these regions. The IGL haplotype is 1.25Mb of contiguous sequence with four novel V gene and one novel C gene alleles and an 11.9kbp insertion. The IGK haplotype consists of two 644kbp proximal and 466kbp distal contigs separated by a gap also present in the reference genome sequence. Our effort added an additional 49kbp of unique sequence extending into this gap. The IGK haplotype contains six novel V gene and one novel J gene alleles and a 16.7kbp region with increased sequence identity between the two IGK contigs, exhibiting signatures of interlocus gene conversion. Our data facilitated the first comparison of nucleotide diversity between the light and IG heavy (IGH) chain haplotypes within a single genome, revealing a three to six fold enrichment in the IGH locus, supporting the theory that the heavy chain may be more important in determining antigenic specificity.