Genomic and gene-expression comparisons among phage-resistant type-IV pilus mutants of Pseudomonas syringae pathovar phaseolicola

Posted on September 2, 2015 by schraib

Mark J Sistrom, Derek Park, Heath O’Brien, Zheng Wang, David Guttman, Jeffrey P. Townsend, Paul Turner

bioRxiv doi: http://dx.doi.org/10.1101/025106

Pseudomonas syringae pv. phaseolicola (Pph) is a significant bacterial pathogen of agricultural crops, and phage φ6 and other members of the dsRNA virus family Cystoviridae undergo lytic (virulent) infection of Pph, using the type IV pilus as the initial site of cellular attachment. Despite the popularity of Pph/phage φ6 as a model system in evolutionary biology, Pph resistance to phage φ6 remains poorly characterized. To investigate differences between phage φ6 resistant Pph strains, we examined genomic and gene expression variation among three bacterial genotypes that differ in the number of type IV pili expressed per cell: ordinary (wild-type), non-piliated, and super-piliated. Genome sequencing of non-piliated and super-piliated Pph identified few mutations that separate these genotypes from wild type Pph – and none present in genes known to be directly involved in type IV pilus expression. Expression analysis revealed that 81.1% of GO terms up-regulated in the non-piliated strain were down-regulated in the super-piliated strain. This differential expression is particularly prevalent in genes associated with respiration — specifically genes in the tricarboxylic acid cycle (TCA) cycle, aerobic respiration, and acetyl-CoA metabolism. The expression patterns of the TCA pathway appear to be generally up and down-regulated, in non-piliated and super-piliated Pph respectively. As pilus retraction is mediated by an ATP motor, loss of retraction ability might lead to a lower energy draw on the bacterial cell, leading to a different energy balance than wild type. The lower metabolic rate of the super-piliated strain is potentially a result of its loss of ability to retract.

Phylogenetic community structure metrics and null models: a review with new methods and software

Posted on September 2, 2015 by schraib

Phylogenetic community structure metrics and null models: a review with new methods and software

Eliot T Miller, Damien R Farine, Christopher H Trisos

bioRxiv doi: http://dx.doi.org/10.1101/025726

Competitive exclusion and habitat filtering are believed to have an important influence on the assembly of ecological communities, but ecologists and evolutionary biologists have not reached a consensus on how to quantify patterns that would reveal the action of these processes. No fewer than 22 phylogenetic community structure metrics and nine null models can be combined, providing 198 approaches to test for such patterns. Choosing statistically appropriate approaches is currently a daunting task. First, given random community assembly, we assessed similarities among metrics and among null models in their behavior across communities varying in species richness. Second, we developed spatially explicit, individual-based simulations where communities were assembled either at random, by competitive exclusion or by habitat filtering. Third, we quantified the performance (type I and II error rates) of all 198 approaches against each of the three assembly processes. Many metrics and null models are functionally equivalent, more than halving the number of unique approaches. Moreover, an even smaller subset of metric and null model combinations is suitable for testing community assembly patterns. Metrics like mean pairwise phylogenetic distance and phylogenetic diversity were better able to detect simulated community assembly patterns than metrics like phylogenetic abundance evenness. A null model that simulates regional dispersal pressure on the community of interest outperformed all others. We introduce a flexible new R package, metricTester, to facilitate robust analyses of method performance. The package is programmed in parallel to readily accommodate integration of new row-wise matrix calculations (metrics) and matrix-wise randomizations (null models) to generate expectations and quantify error rates of proposed methods.

Construction of relatedness matrices using genotyping-by-sequencing data

Posted on September 2, 2015 by schraib

Construction of relatedness matrices using genotyping-by-sequencing data

Ken G Dodds, John C McEwan, Rudiger Brauning, Rayna M Anderson, Tracey C van Stijn, Theodor Kristjánsson, Shannon M Clarke

bioRxiv doi: http://dx.doi.org/10.1101/025379

Background Genotyping-by-sequencing (GBS) is becoming an attractive alternative to array-based methods for genotyping individuals for a large number of single nucleotide polymorphisms (SNPs). Costs can be lowered by reducing the mean sequencing depth, but this results in genotype calls of lower quality. A common analysis strategy is to filter SNPs to just those with sufficient depth, thereby greatly reducing the number of SNPs available. We investigate methods for estimating relatedness using GBS data, including results of low depth, using theoretical calculation, simulation and application to a real data set. Results We show that unbiased estimates of relatedness can be obtained by using only those SNPs with genotype calls in both individuals. The expected value of this estimator is independent of the SNP depth in each individual, under a model of genotype calling that includes the special case of the two alleles being read at random. In contrast, the estimator of self-relatedness does depend on the SNP depth, and we provide a modification to provide unbiased estimates of self-relatedness. We refer to these methods of estimation as kinship using GBS with depth adjustment (KGD). The estimators can be calculated using matrix methods, which allow efficient computation. Simulation results were consistent with the methods being unbiased, and suggest that the optimal sequencing depth is around 2-4 for relatedness between individuals and 5-10 for self-relatedness. Application to a real data set revealed that some SNP filtering may still be necessary, for the exclusion of SNPs which did not behave in a Mendelian fashion. A simple graphical method (a ‘fin plot’) is given to illustrate this issue and to guide filtering parameters. Conclusion We provide a method which gives unbiased estimates of relatedness, based on SNPs assayed by GBS, which accounts for the depth (including zero depth) of the genotype calls. This allows GBS to be applied at read depths which can be chosen to optimise the information obtained. SNPs with excess heterozygosity, often due to (partial) polyploidy or other duplications can be filtered based on a simple graphical method.

Population genomics of the Anthropocene: urbanization reduces the evolutionary potential of small mammal populations

Posted on September 2, 2015 by schraib

Population genomics of the Anthropocene: urbanization reduces the evolutionary potential of small mammal populations

Jason Munshi-South, Christine P Zolnik, Stephen E Harris

bioRxiv doi: http://dx.doi.org/10.1101/025007

Urbanization results in pervasive habitat fragmentation and reduces standing genetic variation through genetic drift. Loss of genome-wide variation may ultimately reduce the evolutionary potential of animal populations experiencing rapidly changing conditions. In this study, we examined genome-wide variation among 23 white-footed mouse (Peromyscus leucopus) populations sampled along an urbanization gradient in the New York City metropolitan area. Genome-wide variation was estimated as a proxy for evolutionary potential using more than 10,000 SNP markers generated by ddRAD-Seq. We found that genome-wide variation is inversely related to urbanization as measured by percent impervious surface cover, and to a lesser extent, human population density. We also report that urbanization results in enhanced genome-wide differentiation between populations in cities. There was no pattern of isolation by distance among these populations, but an isolation by resistance model based on impervious surface significantly explained patterns of genetic differentiation. Isolation by environment modeling also indicated that urban populations deviate much more strongly from global allele frequencies than suburban or rural populations. This study is the first to examine evolutionary potential along an urban-to-rural gradient and quantify urbanization as a driver of population genomics patterns.

The mysterious orphans of Mycoplasmataceae

Posted on September 2, 2015 by schraib

The mysterious orphans of Mycoplasmataceae

Tatiana Tatarinova, Inna Lysnyansky, Yuri Nikolsky, Alexander Bolshoy

bioRxiv doi: http://dx.doi.org/10.1101/025700

Background: The length of a protein sequence is largely determined by its function, i.e. each functional group is associated with an optimal size. However, comparative genomics revealed that proteins’ length may be affected by additional factors. In 2002 it was shown that in bacterium Escherichia coli and the archaeon Archaeoglobus fulgidus, protein sequences with no homologs are, on average, shorter than those with homologs [1]. Most experts now agree that the length distributions are distinctly different between protein sequences with and without homologs in bacterial and archaeal genomes. In this study, we examine this postulate by a comprehensive analysis of all annotated prokaryotic genomes and focusing on certain exceptions. Results: We compared lengths’ distributions of “having homologs proteins” (HHPs) and “non-having homologs proteins” (orphans or ORFans) in all currently annotated completely sequenced prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed, are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80% of atypical genomes. Conclusions: We confirmed on a ubiquitous set of genomes the previous observation that HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have very distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon.

Phylogeographic Inference Using Approximate Likelihoods

Posted on September 2, 2015 by schraib

Phylogeographic Inference Using Approximate Likelihoods

Brian C O’Meara, Nathan D Jackson, Ariadna E Morales-Garcia, Bryan C Carstens

bioRxiv doi: http://dx.doi.org/10.1101/025353

The demographic history of most species is complex, with multiple evolutionary processes combining to shape the observed patterns of genetic diversity. To infer this history, the discipline of phylogeography has (to date) used models that simplify the historical demography of the focal organism, for example by assuming or ignoring ongoing gene flow between populations or by requiring a priori specification of divergence history. Since no single model incorporates every possible evolutionary process, researchers rely on intuition to choose the models that they use to analyze their data. Here, we develop an approach to circumvent this reliance on intuition. PHRAPL allows users to calculate the probability of a large number of demographic histories given their data, enabling them to identify the optimal model and produce accurate parameter estimates for a given system. Using PHRAPL, we reanalyze data from 19 recent phylogeographic investigations. Results indicate that the optimal models for most datasets parameterize both gene flow and population divergence, and suggest that species tree methods (which do not consider gene flow) are overly simplistic for most phylogeographic systems. These results highlight the importance of phylogeographic model selection, and reinforce the role of phylogeography as a bridge between population genetics and phylogenetics.

A simple approach for maximizing the overlap of phylogenetic and comparative data

Posted on September 2, 2015 by schraib

A simple approach for maximizing the overlap of phylogenetic and comparative data

Matthew W. Pennell, Richard G. FitzJohn, William K. Cornwell

bioRxiv doi: http://dx.doi.org/10.1101/024992

Biologists are increasingly using curated, public data sets to conduct phylogenetic comparative analyses. Unfortunately, there is often a mismatch between species for which there is phylogenetic data and those for which other data is available. As a result, researchers are commonly forced to either drop species from analyses entirely or else impute the missing data. Here we outline a simple solution to increase the overlap while avoiding potential the biases introduced by imputing data. If some external topological or taxonomic information is available, this can be used to maximize the overlap between the data and the phylogeny. We develop an algorithm that replaces a species lacking data with a species that has data. This swap can be made because for those two species, all phylogenetic relationships are exactly equivalent. We have implemented our method in a new R package phyndr, which will allow researchers to apply our algorithm to empirical data sets. It is relatively efficient such that taxon swaps can be quickly computed, even for large trees. To facilitate the use of taxonomic knowledge we created a separate data package taxonlookup; it contains a curated, versioned taxonomic lookup for land plants and is interoperable with phyndr. Emerging online databases and statistical advances are making it possible for researchers to investigate evolutionary questions at unprecedented scales. However, in this effort species mismatch among data sources will increasingly be a problem; evolutionary informatics tools, such as phyndr and taxonlookup, can help alleviate this issue.

The site-frequency spectrum associated with Xi-coalescents

Posted on September 2, 2015 by schraib

The site-frequency spectrum associated with Xi-coalescents

Jochen Blath, Mathias C Cronjager, Bjarki Eldon, Matthias Hammer

bioRxiv doi: http://dx.doi.org/10.1101/025684

We give recursions for the expected site-frequency spectrum associated with Xi-coalescents, that is exchangeable coalescents which admit simultaneous multiple mergers of ancestral lineages. Xi-coalescents arise, for example, in association with population models of skewed offspring distributions with diploidy, recurrent advantageous mutations, or strong bottlenecks. In contrast, Lambda-coalescents admit multiple mergers of lineages, but at most one such merger each time. Xi-coalescents, as well as Lambda-coalescents, can predict an excess of singletons, compared to the Kingman coalescent. We compare estimates of coalescent parameters when Xi-coalescent models are applied to data obtained from Lambda-coalescents, and vice versa. In general, Xi-coalescents predict fewer singletons than corresponding Lambda-coalescents, but higher count of mutations of `size’ larger than singletons. We analyse unfolded site-frequency spectra obtained for nuclear loci of the diploid Atlantic cod, and obtain different coalescent parameter estimates than previously obtained with Lambda-coalescents. Our results provide new inference tools, and suggest that for nuclear population genetic data from diploid or polyploid highly fecund populations who may have skewed offspring distributions, one should not apply Lambda-coalescents, but Xi-coalescents.

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia

Posted on September 2, 2015 by schraib

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia

Kevin J Galinsky, Gaurav Bhatia, Po-Ru Loh, Stoyan Georgiev, Sayan Mukherjee, Nick J Patterson, Alkes L Price

bioRxiv doi: http://dx.doi.org/10.1101/018143

Searching for genetic variants with unusual differentiation between subpopulations is an established approach for identifying signals of natural selection. However, existing methods generally require discrete subpopulations. We introduce a method that infers selection using principal components (PCs) by identifying variants whose differentiation along top PCs is significantly greater than the null distribution of genetic drift. To enable the application of this method to large data sets, we developed the FastPCA software, which employs recent advances in random matrix theory to accurately approximate top PCs while reducing time and memory cost from quadratic to linear in the number of individuals, a computational improvement of many orders of magnitude. We apply FastPCA to a cohort of 54,734 European Americans, identifying 5 distinct subpopulations spanning the top 4 PCs. Using the PC-based test for natural selection, we replicate previously known selected loci and identify three new genome-wide significant signals of selection, including selection in Europeans at the ADH1B gene. The derived allele of the coding variant rs1229984 has previously been associated to a decreased risk of alcoholism and shown to be under selection in East Asians; we show that it is a rare example of independent evolution on two continents. We also detect new selection signals at IGFBP3 and IGH, which have also previously been associated to human disease.

Using Ancient Samples in Projection Analysis

Posted on September 2, 2015 by schraib

Using Ancient Samples in Projection Analysis

Melinda A Yang, Montgomery Slatkin

bioRxiv doi: http://dx.doi.org/10.1101/025015

Projection analysis is a useful tool for understanding the relationship of two populations. It compares a test genome to a set of genomes from a reference population. The projection’s shape depends on the historical relationship of the test genome’s population to the reference population. Here, we explore the effects on the projection when ancient samples are included in the analysis. First, we conduct a series of simulations in which the ancient sample is directly ancestral to a present-day population (one-population model) or the ancient sample is ancestral to a sister population that diverged before the time of sampling (two-population model). We find that there are characteristic differences between the projections for the one-population and two-population models, which indicate that the projection can be used to determine whether a test genome is directly ancestral to a present day population or not. Second, we compute projections for several published ancient genomes. We compare three Neanderthals, the Denisovan and three ancient human genomes to European, Han Chinese and Yoruba reference panels. We use a previously constructed demographic model and insert these seven ancient genomes and assess how well the observed projections are recovered.

Haldane's Sieve

Discussing preprints in population and evolutionary genetics

Category Archives: Uncategorized

Genomic and gene-expression comparisons among phage-resistant type-IV pilus mutants of Pseudomonas syringae pathovar phaseolicola

Phylogenetic community structure metrics and null models: a review with new methods and software

Construction of relatedness matrices using genotyping-by-sequencing data

Population genomics of the Anthropocene: urbanization reduces the evolutionary potential of small mammal populations

The mysterious orphans of Mycoplasmataceae

Phylogeographic Inference Using Approximate Likelihoods

A simple approach for maximizing the overlap of phylogenetic and comparative data

The site-frequency spectrum associated with Xi-coalescents

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia

Using Ancient Samples in Projection Analysis

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: