Bound to succeed: Transcription factor binding site prediction and its contribution to understanding virulence and environmental adaptation in bacterial plant pathogens

Bound to succeed: Transcription factor binding site prediction and its contribution to understanding virulence and environmental adaptation in bacterial plant pathogens
Surya Saha, Magdalen Lindeberg
(Submitted on 26 Jun 2013)

Bacterial plant pathogens rely on a battalion of transcription factors to fine-tune their response to changing environmental conditions and marshal the genetic resources required for successful pathogenesis. Prediction of transcription factor binding sites represents an important tool for elucidating regulatory networks, and has been conducted in multiple genera of plant pathogenic bacteria for the purpose of better understanding mechanisms of survival and pathogenesis. The major categories of transcription factor binding sites that have been characterized are reviewed here with emphasis on in silico methods used for site identification and challenges therein, their applicability to different types of sequence datasets, and insights into mechanisms of virulence and survival that have been gained through binding site mapping. An improved strategy for establishing E value cutoffs when using existing models to screen uncharacterized genomes is also discussed.


A Bayesian approach to inferring the phylogenetic structure of communities from metagenomic data

A Bayesian approach to inferring the phylogenetic structure of communities from metagenomic data
John O’Brien, Xavier Didelot, Zamin Iqbal, LucasAmenga-Etego, Bartu Ahiska, Daniel Falush
(Submitted on 26 Jun 2013)

Metagenomics provides a powerful new tool set for investigating evolutionary interactions with the environment. However, an absence of model-based statistical methods means that researchers are often not able to make full use of this complex information. We present a Bayesian method for inferring the phylogenetic relationship among related organisms found within metagenomic samples. Our approach exploits variation in the frequency of taxa among samples to simultaneously infer each lineage haplotype, the phylogenetic tree connecting them, and their frequency within each sample. Applications of the algorithm to simulated data show that our method can recover a substantial fraction of the phylogenetic structure even in the presence of strong mixing among samples. We provide examples of the method applied to data from green sulfur bacteria recovered from an Antarctic lake, plastids from mixed Plasmodium falciparum infections, and virulent Neisseria meningitidis samples.

The complex hybrid origins of the root knot nematodes revealed through comparative genomics

The complex hybrid origins of the root knot nematodes revealed through comparative genomics
David H Lunt, Sujai Kumar, Georgios Koutsovoulos, Mark L Blaxter
(Submitted on 26 Jun 2013)

Meloidogyne root knot nematodes (RKN) can infect most of the world’s agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by mitotic parthenogenesis and are suggested to originate by interspecific hybridizations between unknown parental taxa. We sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that it was involved in the hybrid origin of the tropical mitotic parthenogen M. incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species M. hapla was used to trace the evolutionary history of these species’ genomes, demonstrating that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and agricultural success may be related to this hybridization, producing transgressive variation on which natural selection acts. Studying RKN variation via individual marker loci may fail due to the species’ convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally.

The impact of population demography and selection on the genetic architecture of complex traits

The impact of population demography and selection on the genetic architecture of complex traits
Kirk E. Lohmueller
(Submitted on 21 Jun 2013)

Studies of thousands of individuals have found genetic evidence for dramatic population growth in recent human history. These studies have also documents high numbers of amino acid changing polymorphisms that are likely evolutionarily important and may be of medic relevance. Here I use population genetic models to demonstrate how the recent population growth has directly led to the accumulation of deleterious amino acid changing polymorphism. I show that recent growth increases the proportion of non synonymous SNPs and that the average mutation is more deleterious in an expanding population than in a non-exanded population. However, population growth does not affect the genetic load of the population. Additionally, I investigate the consequences of recent population growth on the architecture of complex traits. If a mutation’s effect on disease status is correlated with its effect on fitness, then rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, recent growth can increase the expected number of causal variants for a disease. Such heterogeneity will likely reduce the power of commonly used rare variants association tests. Finally, recent population growth also reduces the causal allele frequency in cases at single mutations, which could decrease the power of single-marker association tests. These findings suggest careful consideration of recent population history will be essential for designing optimal association studies for low-frequency and rare variants.

Native climate uniformly influences temperature-dependent growth rate in Drosophila embryos

Native climate uniformly influences temperature-dependent growth rate in Drosophila embryos
Steven G. Kuntz, Michael B. Eisen
(Submitted on 22 Jun 2013)

It is well known that temperature affects both the timing and outcome of animal development, and there is considerable evidence that species have adapted so that their embryos develop appropriately in the climates in which they live. There have, however, been relatively few studies comparing development in related species with different optimal developmental temperatures. To determine the species-specific impact of temperature on the rate, order, and proportionality of major stages of embryonic development, we used time-lapse imaging to track the developmental progress of embryos in 11 Drosophila species at seven precisely maintained temperatures between 17.5C and 32.5C, and used a combination of automated and manual annotation to determine the timing of 34 milestones during embryogenesis. Developmental timing is highly temperature-dependent in all species. Tropical species, including cosmopolitan species of tropical origin like D. melanogaster, accelerate development with increasing temperature up to 27.5C, above which growth slowing from heat-stress becomes increasingly significant. D. mojavensis, a sub-tropical fly, exhibits an amplified slow-down with lower temperatures, while D. virilis, a temperate fly, exhibits slower growth than tropical species at all temperatures. The alpine species D. persimilis and D. pseudoobscura grow as rapidly as tropical flies at cooler temperatures, but exhibit diminished acceleration above 22.5C and have drastically slowed development by 30C. Though the fractional developmental time of major events is affected by heat-shock, developmental stages are otherwise uniformly affected by temperature, independent of species. Our results suggest that climate has a major effect on developmental timing and comparisons should be performed based on developmental stage rather than time.

Genome-wide inference of ancestral recombination graphs

Genome-wide inference of ancestral recombination graphs
Matthew D. Rasmussen, Adam Siepel
(Submitted on 21 Jun 2013)

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the “ancestral recombination graph” (ARG), a complete record of all coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are extremely computationally intensive, depend on fairly crude approximations, or are limited to small numbers of samples. As a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to be applied on the scale of dozens of complete human genomes. The key idea of our approach is to sample an ARG of n chromosomes conditional on an ARG of n-1 chromosomes, an operation we call “threading”. Using techniques based on hidden Markov models, this threading operation can be performed exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated applications of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG, for twenty or more sequences generated under realistic parameters for human populations. We also report initial results from applications of ARGweaver to high-coverage individual human genome sequences from Complete Genomics. Work is in progress on further applications of these methods to genome-wide sequence data.

Conservation of nuclear SSR loci reveals high affinity of Quercus infectoria ssp. veneris A. Kern (Fagaceae) to section Robur

Conservation of nuclear SSR loci reveals high affinity of Quercus infectoria ssp. veneris A. Kern (Fagaceae) to section Robur
Charalambos Neophytou, Aikaterini Dounavi, Filippos A. Aravanopoulos
(Submitted on 21 Jun 2013)

Conservation of 16 nuclear microsatellite loci, originally developed for Quercus macrocarpa (section Albae), Q. petraea, Q. robur (section Robur) and Q. myrsinifolia, (subgenus Cyclobalanopsis) was tested in a Q. infectoria ssp. veneris population from Cyprus. All loci could be amplified successfully and displayed allele size and diversity patterns that match those of oak species belonging to the section Robur. At least in one case, limited amplification and high levels of homozygosity support the occurrence of ‘null alleles’, caused by a possible mutation in the highly conserved primer areas, thus hindering PCR. The sampled population exhibited high levels of diversity despite the very limited distribution of this species in Cyprus and extended population fragmentation. Allele sizes of Q. infectoria at locus QpZAG9 partially match those of Q. alnifolia and Q. coccifera from neighboring populations. However, sequencing showed homoplasy, excluding a case of interspecific introgression with the latter, phylogenetically remote species. Q. infectoria ssp. veneris sequences at this locus were concordant to those of other species of section Robur, while sequences of Quercus alnifolia and Quercus coccifera were almost identical to Q. cerris.

The equilibrium allele frequency distribution for a population with reproductive skew

The equilibrium allele frequency distribution for a population with reproductive skew
Ricky Der, Joshua B. Plotkin
(Submitted on 20 Jun 2013)

We study the population genetics of two neutral alleles under reversible mutation in the \Lambda-processes, a population model that features a skewed offspring distribution. We describe the shape of the equilibrium allele frequency distribution as a function of the model parameters. We show that the mutation rates can be uniquely identified from the equilibrium distribution, but that the form of the offspring distribution itself cannot be uniquely identified. We also introduce an infinite-sites version of the \Lambda-process, and we use it to study how reproductive skew influences standing genetic diversity in a population. We derive asymptotic formulae for the expected number of segregating sizes as a function of sample size. We find that the Wright-Fisher model minimizes the equilibrium genetic diversity, for a given mutation rate and variance effective population size, compared to all other \Lambda-processes.

Efficient Two-Stage Group Testing Algorithms for Genetic Screening

Efficient Two-Stage Group Testing Algorithms for Genetic Screening
Michael Huber
(Submitted on 19 Jun 2013)

Efficient two-stage group testing algorithms that are particularly suited for rapid and less-expensive DNA library screening and other large scale biological group testing efforts are investigated in this paper. The main focus is on novel combinatorial constructions in order to minimize the number of individual tests at the second stage of a two-stage disjunctive testing procedure. Building on recent work by Levenshtein (2003) and Tonchev (2008), several new infinite classes of such combinatorial designs are presented.

Reconstructing Native American Migrations from Whole-genome and Whole-exome Data

Reconstructing Native American Migrations from Whole-genome and Whole-exome Data
Simon Gravel, Fouad Zakharia, Jake K Byrnes, Marina Muzzio, Andres Moreno-Estrada, Juan L. Rodriguez-Flores, Eimear E. Kenny, Christopher R. Gignoux, Brian K. Maples, Wilfried Guiblet, Julie Dutil, Karla Sandoval, Gabriel Bedoya, The 1000 Genomes Project, Taras K Oleksyk, Andres Ruiz-Linares, Esteban G Burchard, Juan Carlos Martinez-Cruzado, Carlos D. Bustamante
(Submitted on 17 Jun 2013)

There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR appears most closely related to Equatorial-Tucanoan-speaking populations, supporting a Southern America ancestry of the Taino people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a three-population demographic model. The ancestral populations to the three groups likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features a Mexican population of 62,000, a Colombian population of 8,700, and a Puerto Rican population of 1,900. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest size and the earlier migration from Europe.