Urbanization results in pervasive habitat fragmentation and reduces standing genetic variation through genetic drift. Loss of genome-wide variation may ultimately reduce the evolutionary potential of animal populations experiencing rapidly changing conditions. In this study, we examined genome-wide variation among 23 white-footed mouse (Peromyscus leucopus) populations sampled along an urbanization gradient in the New York City metropolitan area. Genome-wide variation was estimated as a proxy for evolutionary potential using more than 10,000 SNP markers generated by ddRAD-Seq. We found that genome-wide variation is inversely related to urbanization as measured by percent impervious surface cover, and to a lesser extent, human population density. We also report that urbanization results in enhanced genome-wide differentiation between populations in cities. There was no pattern of isolation by distance among these populations, but an isolation by resistance model based on impervious surface significantly explained patterns of genetic differentiation. Isolation by environment modeling also indicated that urban populations deviate much more strongly from global allele frequencies than suburban or rural populations. This study is the first to examine evolutionary potential along an urban-to-rural gradient and quantify urbanization as a driver of population genomics patterns.
Category Archives: Uncategorized
The mysterious orphans of Mycoplasmataceae
The mysterious orphans of Mycoplasmataceae
Phylogeographic Inference Using Approximate Likelihoods
Phylogeographic Inference Using Approximate Likelihoods
The demographic history of most species is complex, with multiple evolutionary processes combining to shape the observed patterns of genetic diversity. To infer this history, the discipline of phylogeography has (to date) used models that simplify the historical demography of the focal organism, for example by assuming or ignoring ongoing gene flow between populations or by requiring a priori specification of divergence history. Since no single model incorporates every possible evolutionary process, researchers rely on intuition to choose the models that they use to analyze their data. Here, we develop an approach to circumvent this reliance on intuition. PHRAPL allows users to calculate the probability of a large number of demographic histories given their data, enabling them to identify the optimal model and produce accurate parameter estimates for a given system. Using PHRAPL, we reanalyze data from 19 recent phylogeographic investigations. Results indicate that the optimal models for most datasets parameterize both gene flow and population divergence, and suggest that species tree methods (which do not consider gene flow) are overly simplistic for most phylogeographic systems. These results highlight the importance of phylogeographic model selection, and reinforce the role of phylogeography as a bridge between population genetics and phylogenetics.
A simple approach for maximizing the overlap of phylogenetic and comparative data
A simple approach for maximizing the overlap of phylogenetic and comparative data
Biologists are increasingly using curated, public data sets to conduct phylogenetic comparative analyses. Unfortunately, there is often a mismatch between species for which there is phylogenetic data and those for which other data is available. As a result, researchers are commonly forced to either drop species from analyses entirely or else impute the missing data. Here we outline a simple solution to increase the overlap while avoiding potential the biases introduced by imputing data. If some external topological or taxonomic information is available, this can be used to maximize the overlap between the data and the phylogeny. We develop an algorithm that replaces a species lacking data with a species that has data. This swap can be made because for those two species, all phylogenetic relationships are exactly equivalent. We have implemented our method in a new R package phyndr, which will allow researchers to apply our algorithm to empirical data sets. It is relatively efficient such that taxon swaps can be quickly computed, even for large trees. To facilitate the use of taxonomic knowledge we created a separate data package taxonlookup; it contains a curated, versioned taxonomic lookup for land plants and is interoperable with phyndr. Emerging online databases and statistical advances are making it possible for researchers to investigate evolutionary questions at unprecedented scales. However, in this effort species mismatch among data sources will increasingly be a problem; evolutionary informatics tools, such as phyndr and taxonlookup, can help alleviate this issue.
The site-frequency spectrum associated with Xi-coalescents
The site-frequency spectrum associated with Xi-coalescents
Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia
Searching for genetic variants with unusual differentiation between subpopulations is an established approach for identifying signals of natural selection. However, existing methods generally require discrete subpopulations. We introduce a method that infers selection using principal components (PCs) by identifying variants whose differentiation along top PCs is significantly greater than the null distribution of genetic drift. To enable the application of this method to large data sets, we developed the FastPCA software, which employs recent advances in random matrix theory to accurately approximate top PCs while reducing time and memory cost from quadratic to linear in the number of individuals, a computational improvement of many orders of magnitude. We apply FastPCA to a cohort of 54,734 European Americans, identifying 5 distinct subpopulations spanning the top 4 PCs. Using the PC-based test for natural selection, we replicate previously known selected loci and identify three new genome-wide significant signals of selection, including selection in Europeans at the ADH1B gene. The derived allele of the coding variant rs1229984 has previously been associated to a decreased risk of alcoholism and shown to be under selection in East Asians; we show that it is a rare example of independent evolution on two continents. We also detect new selection signals at IGFBP3 and IGH, which have also previously been associated to human disease.
Using Ancient Samples in Projection Analysis
Using Ancient Samples in Projection Analysis
Projection analysis is a useful tool for understanding the relationship of two populations. It compares a test genome to a set of genomes from a reference population. The projection’s shape depends on the historical relationship of the test genome’s population to the reference population. Here, we explore the effects on the projection when ancient samples are included in the analysis. First, we conduct a series of simulations in which the ancient sample is directly ancestral to a present-day population (one-population model) or the ancient sample is ancestral to a sister population that diverged before the time of sampling (two-population model). We find that there are characteristic differences between the projections for the one-population and two-population models, which indicate that the projection can be used to determine whether a test genome is directly ancestral to a present day population or not. Second, we compute projections for several published ancient genomes. We compare three Neanderthals, the Denisovan and three ancient human genomes to European, Han Chinese and Yoruba reference panels. We use a previously constructed demographic model and insert these seven ancient genomes and assess how well the observed projections are recovered.
DNA-metabarcoding uncovers the diversity of soil-inhabiting fungi in the tropical island of Puerto Rico
Selection against maternal microRNA target sites in maternal transcripts
Selection against maternal microRNA target sites in maternal transcripts
In animals, before the zygotic genome is expressed, the egg already contains gene products deposited by the mother. These maternal products are crucial during the initial steps of development. In Drosophila melanogaster a large number of maternal products are found in the oocyte, some of which are indispensable. Many of these products are RNA molecules, such as gene transcripts and ribosomal RNAs. Recently, microRNAs ? small RNA gene regulators ? have been detected early during development and are important in these initial steps. The presence of some microRNAs in unfertilized eggs has been reported, but whether they have a functional impact in the egg or early embryo has not being explored. I have extracted and sequenced small RNAs from Drosophila unfertilized eggs. The unfertilized egg is rich in small RNAs and contains multiple microRNA products. Maternal microRNAs are often encoded within the intron of maternal genes, suggesting that many maternal microRNAs are the product of transcriptional hitch-hiking. Comparative genomics analyses suggest that maternal transcripts tend to avoid target sites for maternal microRNAs. I also developed a microRNA target mutation model to study the functional impact of polymorphisms at microRNA target sites. The analysis of Drosophila populations suggests that there is selection against maternal microRNA target sites in maternal transcripts. A potential role of the maternal microRNA mir-9c in maternal-to-zygotic transition is also discussed. In conclusion, maternal microRNAs in Drosophila have a functional impact in maternal protein-coding transcripts.
Implications of simplified linkage equilibrium SNP simulation
Implications of simplified linkage equilibrium SNP simulation
In a recent paper published in PNAS (Golan et al. 2014), residual maximum likelihood (REML) seriously underestimated genetic variance explained by genomewide single nucleotide polymorphism when using a case-control design. It was concluded that Haseman–Elston regression (denoted as PCGC in their paper) should be used instead of REML. Their conclusions were based on results from simplified linkage equilibrium SNP simulation (SLES), which the authors acknowledged may be unrealistic. We found that their simulation, SLES, unrealistically inflated the correlation between the eigenvectors of the genomic relationship matrix and disease status to values that are rarely observed in real data analyses. With a more realistic simulation that the authors failed to carry out (as they noted in their paper), we showed that there was no such inflated correlation between the eigenvectors of the genomic relationship matrix and disease status. Because REML uses the eigensystem of covariance structure, the inflated correlation artefactually constrained its estimates. We compared SNP-heritabilities from SLES and a more realistic simulation, showing that there was a substantial difference between the REML estimates from the two simulation strategies. Finally, we presented that there was no difference between REML and PCGC in real data analyses. This pattern from real data results differed strikingly from the pattern in the simulation study of Golan et al. One needs to be cautious of results drawn from SLES.