A simple biophysical model predicts more rapid accumulation of hybrid incompatibilities in small populations
Bhavin S. Khatri, Richard A. Goldstein
Comments: 13 pages, 6 figures
Subjects: Populations and Evolution (q-bio.PE)
Speciation is fundamental to the huge diversity of life on Earth. Evidence suggests reproductive isolation arises most commonly in allopatry with a higher speciation rate in small populations. Current theory does not address this dependence in the important weak mutation regime. Here, we examine a biophysical model of speciation based on the binding of a protein transcription factor to a DNA binding site, and how their independent co-evolution, in a stabilizing landscape, of two allopatric lineages leads to incompatibilities. Our results give a new prediction for the monomorphic regime of evolution, consistent with data, that smaller populations should develop incompatibilities more quickly. This arises as: 1) smaller populations having a greater initial drift load, as there are more sequences that bind poorly than well, so fewer substitutions are needed to reach incompatible regions of phenotype space; 2) slower divergence when the population size is larger than the inverse of discrete differences in fitness. Further, we find longer sequences develop incompatibilities more quickly at small population sizes, but more slowly at large population sizes. The biophysical model thus represents a robust mechanism of rapid reproductive isolation for small populations and large sequences, that does not require peak-shifts or positive selection.
Repeatability of evolution on epistatic landscapes
Benedikt Bauer , Chaitanya S Gokhale
Evolution is a dynamic process. The two classical forces of evolution are mutation and selection. Assuming small mutation rates, evolution can be predicted based solely on the fitness differences between phenotypes. Predicting an evolutionary process under varying mutation rates as well as varying fitness is still an open question. Experimental procedures, however, do include these complexities along with fluctuating population sizes and stochastic events such as extinctions. We investigate the mutational path probabilities of systems having epistatic effects on both fitness and mutation rates using a theoretical and computational framework. In contrast to previous models, we do not limit ourselves to the typical strong selection, weak mutation (SSWM)-regime or to fixed population sizes. Rather we allow epistatic interactions to also affect mutation rates. This can lead to qualitatively non-trivial dynamics. Pathways, that are negligible in the SSWM-regime, can overcome fitness valleys and become accessible. This finding has the potential to extend the traditional predictions based on the SSWM foundation and bring us closer to what is observed in experimental systems.
Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads
Hung-I Harry Chen , Yuanhang Liu , Yi Zou , Zhao Lai , Devanand Sarkar , Yufei Huang , Yidong Chen
Background RNA sequencing (RNA-seq) is a powerful tool for genome-wide expression profiling of biological samples with the advantage of high-throughput and high resolution. There are many existing algorithms nowadays for quantifying expression levels and detecting differential gene expression, but none of them takes the misaligned reads that are mapped to non-exonic regions into account. We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes. Results We implemented our novel XBSeq algorithm and evaluated it by using a set of simulated expression datasets under different conditions, using a combination of negative binomial and Poisson distributions with parameters derived from real RNA-seq data. We compared the performance of our method with other commonly used differential expression analysis algorithms. We also evaluated the changes in true and false positive rates with variations in biological replicates, differential fold changes, and expression levels in non-exonic regions. We also tested the algorithm on a set of real RNA-seq data where the common and different detection results from different algorithms were reported. Conclusions In this paper, we proposed a novel XBSeq, a differential expression analysis algorithm for RNA-seq data that takes non-exonic mapped reads into consideration. When background noise is at baseline level, the performance of XBSeq and DESeq are mostly equivalent. However, our method surpasses DESeq and other algorithms with the increase of non-exonic mapped reads. Only in very low read count condition XBSeq had a slightly higher false discovery rate, which may be improved by adjusting the background noise effect in this situation. Taken together, by considering non-exonic mapped reads, XBSeq can provide accurate expression measurement and thus detect differential expressed genes even in noisy conditions.
Diversity of Mycobacterium tuberculosis across evolutionary scales
Mary B O’Neill, Tatum D Mortimer, Caitlin S Pepperell
Tuberculosis (TB) is a global public health emergency. Increasingly drug resistant strains of Mycobacterium tuberculosis (M.tb) continue to emerge and spread, highlighting the adaptability of this pathogen. Most studies of M.tb evolution have relied on ‘between-host’ samples, in which each person with TB is represented by a single M.tb isolate. However, individuals with TB commonly harbor populations of M.tb numbering in the billions. Here, we use analyses of M.tb diversity found within and between hosts to gain insight into the adaptation of this pathogen. We find that the amount of M.tb genetic diversity harbored by individuals with TB is similar to that of global between-host surveys of TB patients. This suggests that M.tb genetic diversity is generated within hosts and then lost as the infection is transmitted. In examining genomic data from M.tb samples within and between hosts with TB, we find that genes involved in the regulation, synthesis, and transportation of immunomodulatory cell envelope lipids appear repeatedly in the extremes of various statistical measures of diversity. Polyketide synthase and Mycobacterial membrane protein Large (mmpL) genes are particularly notable in this regard. In addition, we observe identical mutations emerging across samples from different TB patients. Taken together, our observations suggest that M.tb cell envelope lipids are targets of selection within hosts. These lipids are specific to pathogenic mycobacteria and, in some cases, human-pathogenic mycobacteria. We speculate that rapid adaptation of cell envelope lipids is facilitated by functional redundancy, flexibility in their metabolism, and their roles mediating interactions with the host.
RNA-guided gene drives can efficiently bias inheritance in wild yeast
James E DiCarlo, Alejandro Chavez, Sven L Dietz, Kevin M Esvelt, George M Church
Inheritance-biasing elements known as “gene drives” may be capable of spreading genomic alterations made in laboratory organisms through wild populations. We previously considered the potential for RNA-guided gene drives based on the versatile CRISPR/Cas9 genome editing system to serve as a general method of altering populations. Here we report molecularly contained gene drive constructs in the yeast Saccharomyces cerevisiae that are typically copied at rates above 99% when mated to wild yeast. We successfully targeted both non-essential and essential genes and showed that the inheritance of an unrelated “cargo” gene could be biased by an adjacent drive. Our results demonstrate that RNA-guided gene drives are capable of efficiently biasing inheritance when mated to wild-type organisms over successive generations.
Dissecting phylogenetic signal and accounting for bias in whole-genome data sets: a case study of the Metazoa
Marek L Borowiec, Ernest K Lee, Joanna C Chiu, David C Plachetzki
Transcriptome-enabled phylogenetic analyses have dramatically improved our understanding of metazoan phylogeny in recent years, although several important questions remain. The branching order near the base of the tree is one such outstanding issue. To address this question we assemble a novel data set comprised of 1,080 orthologous loci derived from 36 publicly available genomes and dissect the phylogenetic signal present in each individual partition. The size of this data set allows for a closer look at the potential biases and sources of non-phylogenetic signal. We assessed a range of measures for each data partition including information content, saturation, rate of evolution, long-branch score, and taxon occupancy and explored how each of these characteristics impacts phylogeny estimation. We then used these data to prepare a reduced set of partitions that fit an optimal set of criteria and are amenable to the most appropriate and computationally intensive analyses using site-heterogeneous models of sequence evolution. We also employed several strategies to examine the potential for long-branch attraction to bias our inferences. All of our analyses support Ctenophora as the sister lineage to other Metazoa, although support for this relationship varies among analyses. We find no support for the traditional view uniting the ctenophores and Cnidaria (jellies, anemones, corals, and kin). We also examine phylogenetic placement of myriapods (centipedes and millipedes) and find it more sensitive to the type of analysis and data used. Our study provides a workflow for minimizing systematic bias in whole genome-based phylogenetic analyses.
An annotated consensus genetic map for Pinus taeda L. and extent of linkage disequilibrium in three genotype-phenotype discovery populations
Jared W. Westbrook, Vikram E. Chhatre, Le-Shin Wu, Srikar Chamala, Leandro Gomide Neves, Patricio Muñoz, Pedro J Martínez-García, David B. Neale, Matias Kirst, Keithanne Mockaitis, C. Dana Nelson, Gary F. Peter, John M. Davis, Craig S. Echt
A consensus genetic map for Pinus taeda (loblolly pine) was constructed by merging three previously published maps with a map from a pseudo-backcross between P. taeda and P. elliottii (slash pine). The consensus map positioned 4981 markers via genotyping of 1251 individuals from four pedigrees. It is the densest linkage map for a conifer to date. Average marker spacing was 0.48 centiMorgans and total map length was 2372 centiMorgans. Functional predictions for 4762 markers for expressed sequence tags were improved by alignment to full-length P. taeda transcripts. Alignments to the P. taeda genome mapped 4225 scaffold sequences onto linkage groups. The consensus genetic map was used to compare the extent of genome-wide linkage disequilibrium in an association population of distantly related P. taeda individuals (ADEPT2), a multiple-family pedigree used for genomic selection studies (CCLONES), and a full-sib quantitative trait locus mapping population (BC1). Weak linkage disequilibrium was observed in CCLONES and ADEPT2. Average squared correlations, R2, between genotypes at SNPs less than one centiMorgan apart was less than 0.05 in both populations and R2 did not decay substantially with genetic distance. By contrast, strong and extended linkage disequilibrium was observed among BC1 full-sibs where average R2 decayed from 0.8 to less than 0.1 over 53 centiMorgans. The consensus map and analysis of linkage disequilibrium establish a foundation for comparative association and quantitative trait locus mapping between genotype-phenotype discovery populations.