The complex hybrid origins of the root knot nematodes revealed through comparative genomics

Posted on June 27, 2013 by Joe Pickrell

The complex hybrid origins of the root knot nematodes revealed through comparative genomics
David H Lunt, Sujai Kumar, Georgios Koutsovoulos, Mark L Blaxter
(Submitted on 26 Jun 2013)

Meloidogyne root knot nematodes (RKN) can infect most of the world’s agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by mitotic parthenogenesis and are suggested to originate by interspecific hybridizations between unknown parental taxa. We sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that it was involved in the hybrid origin of the tropical mitotic parthenogen M. incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species M. hapla was used to trace the evolutionary history of these species’ genomes, demonstrating that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and agricultural success may be related to this hybridization, producing transgressive variation on which natural selection acts. Studying RKN variation via individual marker loci may fail due to the species’ convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally.

The impact of population demography and selection on the genetic architecture of complex traits

Posted on June 25, 2013 by Joe Pickrell

The impact of population demography and selection on the genetic architecture of complex traits
Kirk E. Lohmueller
(Submitted on 21 Jun 2013)

Studies of thousands of individuals have found genetic evidence for dramatic population growth in recent human history. These studies have also documents high numbers of amino acid changing polymorphisms that are likely evolutionarily important and may be of medic relevance. Here I use population genetic models to demonstrate how the recent population growth has directly led to the accumulation of deleterious amino acid changing polymorphism. I show that recent growth increases the proportion of non synonymous SNPs and that the average mutation is more deleterious in an expanding population than in a non-exanded population. However, population growth does not affect the genetic load of the population. Additionally, I investigate the consequences of recent population growth on the architecture of complex traits. If a mutation’s effect on disease status is correlated with its effect on fitness, then rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, recent growth can increase the expected number of causal variants for a disease. Such heterogeneity will likely reduce the power of commonly used rare variants association tests. Finally, recent population growth also reduces the causal allele frequency in cases at single mutations, which could decrease the power of single-marker association tests. These findings suggest careful consideration of recent population history will be essential for designing optimal association studies for low-frequency and rare variants.

Native climate uniformly influences temperature-dependent growth rate in Drosophila embryos

Posted on June 25, 2013 by Joe Pickrell

Native climate uniformly influences temperature-dependent growth rate in Drosophila embryos
Steven G. Kuntz, Michael B. Eisen
(Submitted on 22 Jun 2013)

It is well known that temperature affects both the timing and outcome of animal development, and there is considerable evidence that species have adapted so that their embryos develop appropriately in the climates in which they live. There have, however, been relatively few studies comparing development in related species with different optimal developmental temperatures. To determine the species-specific impact of temperature on the rate, order, and proportionality of major stages of embryonic development, we used time-lapse imaging to track the developmental progress of embryos in 11 Drosophila species at seven precisely maintained temperatures between 17.5C and 32.5C, and used a combination of automated and manual annotation to determine the timing of 34 milestones during embryogenesis. Developmental timing is highly temperature-dependent in all species. Tropical species, including cosmopolitan species of tropical origin like D. melanogaster, accelerate development with increasing temperature up to 27.5C, above which growth slowing from heat-stress becomes increasingly significant. D. mojavensis, a sub-tropical fly, exhibits an amplified slow-down with lower temperatures, while D. virilis, a temperate fly, exhibits slower growth than tropical species at all temperatures. The alpine species D. persimilis and D. pseudoobscura grow as rapidly as tropical flies at cooler temperatures, but exhibit diminished acceleration above 22.5C and have drastically slowed development by 30C. Though the fractional developmental time of major events is affected by heat-shock, developmental stages are otherwise uniformly affected by temperature, independent of species. Our results suggest that climate has a major effect on developmental timing and comparisons should be performed based on developmental stage rather than time.

Genome-wide inference of ancestral recombination graphs

Posted on June 24, 2013 by Joe Pickrell

Genome-wide inference of ancestral recombination graphs
Matthew D. Rasmussen, Adam Siepel
(Submitted on 21 Jun 2013)

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of all coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are extremely computationally intensive, depend on fairly crude approximations, or are limited to small numbers of samples. As a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to be applied on the scale of dozens of complete human genomes. The key idea of our approach is to sample an ARG of n chromosomes conditional on an ARG of n-1 chromosomes, an operation we call “threading”. Using techniques based on hidden Markov models, this threading operation can be performed exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated applications of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG, for twenty or more sequences generated under realistic parameters for human populations. We also report initial results from applications of ARGweaver to high-coverage individual human genome sequences from Complete Genomics. Work is in progress on further applications of these methods to genome-wide sequence data.

Conservation of nuclear SSR loci reveals high affinity of Quercus infectoria ssp. veneris A. Kern (Fagaceae) to section Robur

Posted on June 24, 2013 by Joe Pickrell

Conservation of nuclear SSR loci reveals high affinity of Quercus infectoria ssp. veneris A. Kern (Fagaceae) to section Robur
Charalambos Neophytou, Aikaterini Dounavi, Filippos A. Aravanopoulos
(Submitted on 21 Jun 2013)

Conservation of 16 nuclear microsatellite loci, originally developed for Quercus macrocarpa (section Albae), Q. petraea, Q. robur (section Robur) and Q. myrsinifolia, (subgenus Cyclobalanopsis) was tested in a Q. infectoria ssp. veneris population from Cyprus. All loci could be amplified successfully and displayed allele size and diversity patterns that match those of oak species belonging to the section Robur. At least in one case, limited amplification and high levels of homozygosity support the occurrence of ‘null alleles’, caused by a possible mutation in the highly conserved primer areas, thus hindering PCR. The sampled population exhibited high levels of diversity despite the very limited distribution of this species in Cyprus and extended population fragmentation. Allele sizes of Q. infectoria at locus QpZAG9 partially match those of Q. alnifolia and Q. coccifera from neighboring populations. However, sequencing showed homoplasy, excluding a case of interspecific introgression with the latter, phylogenetically remote species. Q. infectoria ssp. veneris sequences at this locus were concordant to those of other species of section Robur, while sequences of Quercus alnifolia and Quercus coccifera were almost identical to Q. cerris.

The equilibrium allele frequency distribution for a population with reproductive skew

Posted on June 21, 2013 by Joe Pickrell

The equilibrium allele frequency distribution for a population with reproductive skew
Ricky Der, Joshua B. Plotkin
(Submitted on 20 Jun 2013)

We study the population genetics of two neutral alleles under reversible mutation in the \Lambda-processes, a population model that features a skewed offspring distribution. We describe the shape of the equilibrium allele frequency distribution as a function of the model parameters. We show that the mutation rates can be uniquely identified from the equilibrium distribution, but that the form of the offspring distribution itself cannot be uniquely identified. We also introduce an infinite-sites version of the \Lambda-process, and we use it to study how reproductive skew influences standing genetic diversity in a population. We derive asymptotic formulae for the expected number of segregating sizes as a function of sample size. We find that the Wright-Fisher model minimizes the equilibrium genetic diversity, for a given mutation rate and variance effective population size, compared to all other \Lambda-processes.

Efficient Two-Stage Group Testing Algorithms for Genetic Screening

Posted on June 20, 2013 by Joe Pickrell

Efficient Two-Stage Group Testing Algorithms for Genetic Screening
Michael Huber
(Submitted on 19 Jun 2013)

Efficient two-stage group testing algorithms that are particularly suited for rapid and less-expensive DNA library screening and other large scale biological group testing efforts are investigated in this paper. The main focus is on novel combinatorial constructions in order to minimize the number of individual tests at the second stage of a two-stage disjunctive testing procedure. Building on recent work by Levenshtein (2003) and Tonchev (2008), several new infinite classes of such combinatorial designs are presented.

Reconstructing Native American Migrations from Whole-genome and Whole-exome Data

Posted on June 19, 2013 by cooplab

Reconstructing Native American Migrations from Whole-genome and Whole-exome Data
Simon Gravel, Fouad Zakharia, Jake K Byrnes, Marina Muzzio, Andres Moreno-Estrada, Juan L. Rodriguez-Flores, Eimear E. Kenny, Christopher R. Gignoux, Brian K. Maples, Wilfried Guiblet, Julie Dutil, Karla Sandoval, Gabriel Bedoya, The 1000 Genomes Project, Taras K Oleksyk, Andres Ruiz-Linares, Esteban G Burchard, Juan Carlos Martinez-Cruzado, Carlos D. Bustamante
(Submitted on 17 Jun 2013)

There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR appears most closely related to Equatorial-Tucanoan-speaking populations, supporting a Southern America ancestry of the Taino people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a three-population demographic model. The ancestral populations to the three groups likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features a Mexican population of 62,000, a Colombian population of 8,700, and a Puerto Rican population of 1,900. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest size and the earlier migration from Europe.

Differential meta-analysis of RNA-seq data from multiple studies

Posted on June 18, 2013 by cooplab

Differential meta-analysis of RNA-seq data from multiple studies
Andrea Rau (GABI), Guillemette Marot (INRIA Lille – Nord Europe, CERIM), Florence Jaffrézic (GABI)
(Submitted on 16 Jun 2013)

High-throughput sequencing is now regularly used for studies of the transcriptome (RNA-seq), particularly for comparisons among experimental conditions. For the time being, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential expression. As their cost continues to decrease, it is likely that additional follow-up studies will be conducted to re-address the same biological question. We demonstrate how p-value combination techniques previously used for microarray meta-analyses can be used for the differential analysis of RNA-seq data from multiple related studies. These techniques are compared to a negative binomial generalized linear model (GLM) including a fixed study effect on simulated data and real data on human melanoma cell lines. The GLM with fixed study effect performed well for low inter-study variation and small numbers of studies, but was outperformed by the meta-analysis methods for moderate to large inter-study variability and larger numbers of studies. To conclude, the p-value combination techniques illustrated here are a valuable tool to perform differential meta-analyses of RNA-seq data by appropriately accounting for biological and technical variability within studies as well as additional study-specific effects. An R package metaRNASeq is available on the R Forge.

Our paper: Sashimi plots: Quantitative visualization of RNA sequencing read alignments

Posted on June 18, 2013 by cooplab

This is a guest post by Yarden Katz [@yardenkatz] on his paper (along with coauthors): katz et al. Sashimi plots: Quantitative visualization of RNA sequencing read alignments arXived here

A first draft of our paper Sashimi plots: Quantitative visualization of RNA sequencing read alignments is now available. Sashimi plots are a simple visualization of RNA sequencing data, intended to make it easier to detect differentially spliced exons across multiple RNA-Seq samples. In a Sashimi plot, RNA-Seq reads are summarized as read densities, and junction reads are collapsed into arcs whose width is proportional to the number of reads spanning the exons connected by the arc. See the paper for examples.

We call it a Sashimi plot in part because of the impeccable resemblance of bumpy RNA-Seq read densities in exons to small pieces of Sashimi, and also because we tried to keep the plots as close to the “raw” data as possible. While Sashimi plots can display estimates of isoform abundance levels from programs like MISO, the goal here was to summarize the read alignments as they are, without further processing or inference, so that conclusions from probabilistic models can be visually verified.

The original Sashimi plot program is a command line utility that makes customizable Sashimi plots using Python (using the matplotlib library). Recently, the IGV genome browser team implemented a version of Sashimi plots in their browser (see installation instructions.) This allows Sashimi plots to be made dynamically for any genomic region of interest, at a resolution set by the zoom in/out features of the browser. The plot can be made for all or a subset of the tracks loaded, and the scales can be adjusted by the user as in the main IGV window. Both the static, Python-based version of Sashimi plots and the dynamic version within IGV are available and actively maintained, and code bases for both are available on GitHub.

Sashimi plots still have important limitations. First, the junction arcs can get messy for genes with many alternative isoforms. This can be partially addressed by looking at simplified event annotations (e.g. ones containing only two isoforms, or a handful of isoforms, as in these annotations) rather than making plots for the full set of isoforms of a gene. The second limitation is that sometimes subtle differences are not readily seen from junction arc widths. We’re considering alternative representations (such as circle area or diameter) for quantitatively representing junction read counts.

The paper is meant primarily as advertisement for the software. We hope that other members of the RNA processing/sequencing community will find this useful and come up with their own variants of these plots.

Relevant links:

The Sashimi plot manual is here: http://genes.mit.edu/burgelab/miso/docs/sashimi.html
GitHub repository for IGV/IGV-Sashimi: IGV at GitHub
GitHub repository for Python, static Sashimi plots: Sashimi plot at GitHub

Haldane's Sieve

Discussing preprints in population and evolutionary genetics

Yearly Archives: 2013

The complex hybrid origins of the root knot nematodes revealed through comparative genomics

The impact of population demography and selection on the genetic architecture of complex traits

Native climate uniformly influences temperature-dependent growth rate in Drosophila embryos

Genome-wide inference of ancestral recombination graphs

Conservation of nuclear SSR loci reveals high affinity of Quercus infectoria ssp. veneris A. Kern (Fagaceae) to section Robur

The equilibrium allele frequency distribution for a population with reproductive skew

Efficient Two-Stage Group Testing Algorithms for Genetic Screening

Reconstructing Native American Migrations from Whole-genome and Whole-exome Data

Differential meta-analysis of RNA-seq data from multiple studies

Our paper: Sashimi plots: Quantitative visualization of RNA sequencing read alignments

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: