Population genomics of the Anthropocene: urbanization reduces the evolutionary potential of small mammal populations

Posted on September 2, 2015 by schraib

Jason Munshi-South, Christine P Zolnik, Stephen E Harris

bioRxiv doi: http://dx.doi.org/10.1101/025007

Urbanization results in pervasive habitat fragmentation and reduces standing genetic variation through genetic drift. Loss of genome-wide variation may ultimately reduce the evolutionary potential of animal populations experiencing rapidly changing conditions. In this study, we examined genome-wide variation among 23 white-footed mouse (Peromyscus leucopus) populations sampled along an urbanization gradient in the New York City metropolitan area. Genome-wide variation was estimated as a proxy for evolutionary potential using more than 10,000 SNP markers generated by ddRAD-Seq. We found that genome-wide variation is inversely related to urbanization as measured by percent impervious surface cover, and to a lesser extent, human population density. We also report that urbanization results in enhanced genome-wide differentiation between populations in cities. There was no pattern of isolation by distance among these populations, but an isolation by resistance model based on impervious surface significantly explained patterns of genetic differentiation. Isolation by environment modeling also indicated that urban populations deviate much more strongly from global allele frequencies than suburban or rural populations. This study is the first to examine evolutionary potential along an urban-to-rural gradient and quantify urbanization as a driver of population genomics patterns.

The mysterious orphans of Mycoplasmataceae

Posted on September 2, 2015 by schraib

The mysterious orphans of Mycoplasmataceae

Tatiana Tatarinova, Inna Lysnyansky, Yuri Nikolsky, Alexander Bolshoy

bioRxiv doi: http://dx.doi.org/10.1101/025700

Background: The length of a protein sequence is largely determined by its function, i.e. each functional group is associated with an optimal size. However, comparative genomics revealed that proteins’ length may be affected by additional factors. In 2002 it was shown that in bacterium Escherichia coli and the archaeon Archaeoglobus fulgidus, protein sequences with no homologs are, on average, shorter than those with homologs [1]. Most experts now agree that the length distributions are distinctly different between protein sequences with and without homologs in bacterial and archaeal genomes. In this study, we examine this postulate by a comprehensive analysis of all annotated prokaryotic genomes and focusing on certain exceptions. Results: We compared lengths’ distributions of “having homologs proteins” (HHPs) and “non-having homologs proteins” (orphans or ORFans) in all currently annotated completely sequenced prokaryotic genomes. As expected, the HHPs and ORFans have strikingly different length distributions in almost all genomes. As previously established, the HHPs, indeed, are, on average, longer than the ORFans, and the length distributions for the ORFans have a relatively narrow peak, in contrast to the HHPs, whose lengths spread over a wider range of values. However, about thirty genomes do not obey these rules. Practically all genomes of Mycoplasma and Ureaplasma have atypical ORFans distributions, with the mean lengths of ORFan larger than the mean lengths of HHPs. These genera constitute over 80% of atypical genomes. Conclusions: We confirmed on a ubiquitous set of genomes the previous observation that HHPs and ORFans have different gene length distributions. We also showed that Mycoplasmataceae genomes have very distinctive distributions of ORFans lengths. We offer several possible biological explanations of this phenomenon.

Phylogeographic Inference Using Approximate Likelihoods

Posted on September 2, 2015 by schraib

Phylogeographic Inference Using Approximate Likelihoods

Brian C O’Meara, Nathan D Jackson, Ariadna E Morales-Garcia, Bryan C Carstens

bioRxiv doi: http://dx.doi.org/10.1101/025353

The demographic history of most species is complex, with multiple evolutionary processes combining to shape the observed patterns of genetic diversity. To infer this history, the discipline of phylogeography has (to date) used models that simplify the historical demography of the focal organism, for example by assuming or ignoring ongoing gene flow between populations or by requiring a priori specification of divergence history. Since no single model incorporates every possible evolutionary process, researchers rely on intuition to choose the models that they use to analyze their data. Here, we develop an approach to circumvent this reliance on intuition. PHRAPL allows users to calculate the probability of a large number of demographic histories given their data, enabling them to identify the optimal model and produce accurate parameter estimates for a given system. Using PHRAPL, we reanalyze data from 19 recent phylogeographic investigations. Results indicate that the optimal models for most datasets parameterize both gene flow and population divergence, and suggest that species tree methods (which do not consider gene flow) are overly simplistic for most phylogeographic systems. These results highlight the importance of phylogeographic model selection, and reinforce the role of phylogeography as a bridge between population genetics and phylogenetics.

A simple approach for maximizing the overlap of phylogenetic and comparative data

Posted on September 2, 2015 by schraib

A simple approach for maximizing the overlap of phylogenetic and comparative data

Matthew W. Pennell, Richard G. FitzJohn, William K. Cornwell

bioRxiv doi: http://dx.doi.org/10.1101/024992

Biologists are increasingly using curated, public data sets to conduct phylogenetic comparative analyses. Unfortunately, there is often a mismatch between species for which there is phylogenetic data and those for which other data is available. As a result, researchers are commonly forced to either drop species from analyses entirely or else impute the missing data. Here we outline a simple solution to increase the overlap while avoiding potential the biases introduced by imputing data. If some external topological or taxonomic information is available, this can be used to maximize the overlap between the data and the phylogeny. We develop an algorithm that replaces a species lacking data with a species that has data. This swap can be made because for those two species, all phylogenetic relationships are exactly equivalent. We have implemented our method in a new R package phyndr, which will allow researchers to apply our algorithm to empirical data sets. It is relatively efficient such that taxon swaps can be quickly computed, even for large trees. To facilitate the use of taxonomic knowledge we created a separate data package taxonlookup; it contains a curated, versioned taxonomic lookup for land plants and is interoperable with phyndr. Emerging online databases and statistical advances are making it possible for researchers to investigate evolutionary questions at unprecedented scales. However, in this effort species mismatch among data sources will increasingly be a problem; evolutionary informatics tools, such as phyndr and taxonlookup, can help alleviate this issue.

The site-frequency spectrum associated with Xi-coalescents

Posted on September 2, 2015 by schraib

The site-frequency spectrum associated with Xi-coalescents

Jochen Blath, Mathias C Cronjager, Bjarki Eldon, Matthias Hammer

bioRxiv doi: http://dx.doi.org/10.1101/025684

We give recursions for the expected site-frequency spectrum associated with Xi-coalescents, that is exchangeable coalescents which admit simultaneous multiple mergers of ancestral lineages. Xi-coalescents arise, for example, in association with population models of skewed offspring distributions with diploidy, recurrent advantageous mutations, or strong bottlenecks. In contrast, Lambda-coalescents admit multiple mergers of lineages, but at most one such merger each time. Xi-coalescents, as well as Lambda-coalescents, can predict an excess of singletons, compared to the Kingman coalescent. We compare estimates of coalescent parameters when Xi-coalescent models are applied to data obtained from Lambda-coalescents, and vice versa. In general, Xi-coalescents predict fewer singletons than corresponding Lambda-coalescents, but higher count of mutations of `size’ larger than singletons. We analyse unfolded site-frequency spectra obtained for nuclear loci of the diploid Atlantic cod, and obtain different coalescent parameter estimates than previously obtained with Lambda-coalescents. Our results provide new inference tools, and suggest that for nuclear population genetic data from diploid or polyploid highly fecund populations who may have skewed offspring distributions, one should not apply Lambda-coalescents, but Xi-coalescents.

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia

Posted on September 2, 2015 by schraib

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia

Kevin J Galinsky, Gaurav Bhatia, Po-Ru Loh, Stoyan Georgiev, Sayan Mukherjee, Nick J Patterson, Alkes L Price

bioRxiv doi: http://dx.doi.org/10.1101/018143

Searching for genetic variants with unusual differentiation between subpopulations is an established approach for identifying signals of natural selection. However, existing methods generally require discrete subpopulations. We introduce a method that infers selection using principal components (PCs) by identifying variants whose differentiation along top PCs is significantly greater than the null distribution of genetic drift. To enable the application of this method to large data sets, we developed the FastPCA software, which employs recent advances in random matrix theory to accurately approximate top PCs while reducing time and memory cost from quadratic to linear in the number of individuals, a computational improvement of many orders of magnitude. We apply FastPCA to a cohort of 54,734 European Americans, identifying 5 distinct subpopulations spanning the top 4 PCs. Using the PC-based test for natural selection, we replicate previously known selected loci and identify three new genome-wide significant signals of selection, including selection in Europeans at the ADH1B gene. The derived allele of the coding variant rs1229984 has previously been associated to a decreased risk of alcoholism and shown to be under selection in East Asians; we show that it is a rare example of independent evolution on two continents. We also detect new selection signals at IGFBP3 and IGH, which have also previously been associated to human disease.

Using Ancient Samples in Projection Analysis

Posted on September 2, 2015 by schraib

Using Ancient Samples in Projection Analysis

Melinda A Yang, Montgomery Slatkin

bioRxiv doi: http://dx.doi.org/10.1101/025015

Projection analysis is a useful tool for understanding the relationship of two populations. It compares a test genome to a set of genomes from a reference population. The projection’s shape depends on the historical relationship of the test genome’s population to the reference population. Here, we explore the effects on the projection when ancient samples are included in the analysis. First, we conduct a series of simulations in which the ancient sample is directly ancestral to a present-day population (one-population model) or the ancient sample is ancestral to a sister population that diverged before the time of sampling (two-population model). We find that there are characteristic differences between the projections for the one-population and two-population models, which indicate that the projection can be used to determine whether a test genome is directly ancestral to a present day population or not. Second, we compute projections for several published ancient genomes. We compare three Neanderthals, the Denisovan and three ancient human genomes to European, Han Chinese and Yoruba reference panels. We use a previously constructed demographic model and insert these seven ancient genomes and assess how well the observed projections are recovered.

DNA-metabarcoding uncovers the diversity of soil-inhabiting fungi in the tropical island of Puerto Rico

Posted on September 2, 2015 by schraib

DNA-metabarcoding uncovers the diversity of soil-inhabiting fungi in the tropical island of Puerto Rico

Hector Urbina, Douglas G. Scofield, Matias Cafaro, Anna Rosling

bioRxiv doi: http://dx.doi.org/10.1101/025668

Soil fungal communities in tropical regions remain poorly understood. In order to increase the knowledge of diversity of soil-inhabiting fungi, we extracted total DNA from top-organic soil collected in seven localities dominated by four major ecosystems in the tropical island of Puerto Rico. In order to comprehensively characterize the fungal community, we PCR-amplified the ITS2 fungal barcode using newly designed degenerated primers and varying annealing temperatures to minimize primer bias. Sequencing results, obtained using Ion Torrent technology, comprised a total of 566,613 sequences after quality filtering. These sequences were clustered into 4,140 molecular operational taxonomic units (MOTUs) after removing low frequency sequences and rarefaction to account for differences in read depth between samples. Our results demonstrate that soil fungal communities in Puerto Rico are structured by ecosystem. Ascomycota, followed by Basidiomycota, dominates the diversity of fungi in soil. Amongst Ascomycota, the recently described soil-inhabiting class Archaeorhizomycetes was present in all the localities and taxa in this class were among the most commonly observed MOTUs. The Basidiomycota community was dominated by soil decomposers and ectomycorrhizal fungi with a distribution strongly affected by local variation to a greater degree than Ascomycota.

Selection against maternal microRNA target sites in maternal transcripts

Posted on September 2, 2015 by schraib

Selection against maternal microRNA target sites in maternal transcripts

Antonio Marco

bioRxiv doi: http://dx.doi.org/10.1101/012757

In animals, before the zygotic genome is expressed, the egg already contains gene products deposited by the mother. These maternal products are crucial during the initial steps of development. In Drosophila melanogaster a large number of maternal products are found in the oocyte, some of which are indispensable. Many of these products are RNA molecules, such as gene transcripts and ribosomal RNAs. Recently, microRNAs ? small RNA gene regulators ? have been detected early during development and are important in these initial steps. The presence of some microRNAs in unfertilized eggs has been reported, but whether they have a functional impact in the egg or early embryo has not being explored. I have extracted and sequenced small RNAs from Drosophila unfertilized eggs. The unfertilized egg is rich in small RNAs and contains multiple microRNA products. Maternal microRNAs are often encoded within the intron of maternal genes, suggesting that many maternal microRNAs are the product of transcriptional hitch-hiking. Comparative genomics analyses suggest that maternal transcripts tend to avoid target sites for maternal microRNAs. I also developed a microRNA target mutation model to study the functional impact of polymorphisms at microRNA target sites. The analysis of Drosophila populations suggests that there is selection against maternal microRNA target sites in maternal transcripts. A potential role of the maternal microRNA mir-9c in maternal-to-zygotic transition is also discussed. In conclusion, maternal microRNAs in Drosophila have a functional impact in maternal protein-coding transcripts.

Implications of simplified linkage equilibrium SNP simulation

Posted on September 2, 2015 by schraib

Implications of simplified linkage equilibrium SNP simulation

Sang Hong Lee

bioRxiv doi: http://dx.doi.org/10.1101/025619

In a recent paper published in PNAS (Golan et al. 2014), residual maximum likelihood (REML) seriously underestimated genetic variance explained by genomewide single nucleotide polymorphism when using a case-control design. It was concluded that Haseman–Elston regression (denoted as PCGC in their paper) should be used instead of REML. Their conclusions were based on results from simplified linkage equilibrium SNP simulation (SLES), which the authors acknowledged may be unrealistic. We found that their simulation, SLES, unrealistically inflated the correlation between the eigenvectors of the genomic relationship matrix and disease status to values that are rarely observed in real data analyses. With a more realistic simulation that the authors failed to carry out (as they noted in their paper), we showed that there was no such inflated correlation between the eigenvectors of the genomic relationship matrix and disease status. Because REML uses the eigensystem of covariance structure, the inflated correlation artefactually constrained its estimates. We compared SNP-heritabilities from SLES and a more realistic simulation, showing that there was a substantial difference between the REML estimates from the two simulation strategies. Finally, we presented that there was no difference between REML and PCGC in real data analyses. This pattern from real data results differed strikingly from the pattern in the simulation study of Golan et al. One needs to be cautious of results drawn from SLES.

Haldane's Sieve

Discussing preprints in population and evolutionary genetics

Category Archives: Uncategorized

Population genomics of the Anthropocene: urbanization reduces the evolutionary potential of small mammal populations

The mysterious orphans of Mycoplasmataceae

Phylogeographic Inference Using Approximate Likelihoods

A simple approach for maximizing the overlap of phylogenetic and comparative data

The site-frequency spectrum associated with Xi-coalescents

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia

Using Ancient Samples in Projection Analysis

DNA-metabarcoding uncovers the diversity of soil-inhabiting fungi in the tropical island of Puerto Rico

Selection against maternal microRNA target sites in maternal transcripts

Implications of simplified linkage equilibrium SNP simulation

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: