Bayesian analyses of Yemeni mitochondrial genomes suggest multiple migration events with Africa and Western Eurasia

Bayesian analyses of Yemeni mitochondrial genomes suggest multiple migration events with Africa and Western Eurasia
Deven Nikunj Vyas, Andrew Kitchen, Aida Teresa Miró-Herrans, Laurel Nichole Pearson, Ali Al-Meeri, Connie Jo Mulligan
doi: http://dx.doi.org/10.1101/010629

Anatomically modern humans (AMHs) left Africa ~60,000 years ago, marking the first of multiple dispersal events by AMH between Africa and the Arabian Peninsula. The southern dispersal route (SDR) out of Africa (OOA) posits that early AMHs crossed the Bab el-Mandeb strait from the Horn of Africa into what is now Yemen and followed the coast of the Indian Ocean into eastern Eurasia. If AMHs followed the SDR and left modern descendants in situ, Yemeni populations should retain old autochthonous mitogenome lineages. Alternatively, if AMHs did not follow the SDR or did not leave modern descendants in the region, only young autochthonous lineages will remain as evidence of more recent dispersals. We sequenced 113 whole mitogenomes from multiple Yemeni regions with a focus on haplogroups M, N, and L3(xM,N) as they are considered markers of the initial OOA migrations. We performed Bayesian evolutionary analyses to generate time-measured phylogenies calibrated by Neanderthal and Denisovan mitogenome sequences in order to determine the age of Yemeni-specific clades in our dataset. Our results indicate that the M1, N1, and L3(xM,N) sequences in Yemen are the product of recent migration from Africa and western Eurasia. Although these data suggest that modern Yemeni mitogenomes are not markers of the original OOA migrants, we hypothesize that recent population dynamics may obscure any genetic signature of an ancient SDR migration.

A general condition for adaptive genetic polymorphism in temporally and spatially heterogeneous environments

A general condition for adaptive genetic polymorphism in temporally and spatially heterogeneous environments
Hannes Svardal, Claus Rueffler, Joachim Hermisson
Comments: Accepted for publication in Theoretical Population Biology
Subjects: Populations and Evolution (q-bio.PE)

Both evolution and ecology have long been concerned with the impact of variable environmental conditions on observed levels of genetic diversity within and between species. We model the evolution of a quantitative trait under selection that fluctuates in space and time, and derive an analytical condition for when these fluctuations promote genetic diversification. As ecological scenario we use a generalized island model with soft selection within patches in which we incorporate generation overlap. We allow for arbitrary fluctuations in the environment including spatio-temporal correlations and any functional form of selection on the trait. Using the concepts of invasion fitness and evolutionary branching, we derive a simple and transparent condition for the adaptive evolution and maintenance of genetic diversity. This condition relates the strength of selection within patches to expectations and variances in the environmental conditions across space and time. Our results unify, clarify, and extend a number of previous results on the evolution and maintenance of genetic variation under fluctuating selection. Individual-based simulations show that our results are independent of the details of the genetic architecture and on whether reproduction is clonal or sexual. The onset of increased genetic variance is predicted accurately also in small populations in which alleles can go extinct due to environmental stochasticity.

The developmental transcriptome of contrasting Arctic charr (Salvelinus alpinus) morphs

The developmental transcriptome of contrasting Arctic charr (Salvelinus alpinus) morphs
Jóhannes Gudbrandsson, Ehsan P Ahi, Kalina H Kapralova, Sigrídur R Franzdottir, Bjarni K Kristjánsson, Sophie S Steinhaeuser, Ísak M Jóhannesson, Valerie H Maier, Sigurdur S Snorrason, Zophonías O Jónsson, Arnar Pálsson
doi: http://dx.doi.org/10.1101/011361

Species showing repeated evolution of similar traits can help illuminate the molecular and developmental basis of diverging traits and specific adaptations. Following the last glacial period, dwarfism and specialized bottom feeding morphology evolved rapidly in several landlocked Arctic charr (Salvelinus alpinus) populations in Iceland. In order to study the genetic divergence between small benthic morphs and larger morphs with limnetic morphotype, we conducted an RNA-seq transcriptome analysis of developing charr. We sequenced mRNA from whole embryos at four stages in early development of two stocks with very different morphologies, the small benthic (SB) charr from Lake Thingvallavatn and Holar aquaculture (AC) charr. The data reveal significant differences in expression of several biological pathways during charr development. There is also a difference between SB- and AC-charr in mitochondrial genes involved in energy metabolism and blood coagulation genes. We confirmed expression difference of five genes in whole embryos with qPCR, including lysozyme and natterin which was previously identified as a fish-toxin of a lectin family that may be a putative immunopeptide. We verified differential expression of 7 genes in developing heads, and the expression associated consistently with benthic v.s. limnetic charr (studied in 4 morphs total). Comparison of Single nucleotide polymorphism (SNP) frequencies reveals extensive genetic differentiation between the SB- and AC-charr (60 fixed SNPs and around 1300 differing more than 50% in frequency). In SB-charr the high frequency derived SNPs are in genes related to translation and oxidative processes. Curiously, several derived SNPs reside in the 12s and 16s mitochondrial ribosomal RNA genes, including a base highly conserved among fishes. The data implicate multiple genes and molecular pathways in divergence of small benthic charr and/or the response of aquaculture charr to domestication. Functional, genetic and population genetic studies on more freshwater and anadromous populations are needed to confirm the specific loci and mutations relating to specific ecological or domestication traits in Arctic charr.

A Composite Genome Approach to Identify Phylogenetically Informative Data from Next-Generation Sequencing

A Composite Genome Approach to Identify Phylogenetically Informative Data from Next-Generation Sequencing
Rachel S. Schwartz, Kelly Harkins, Anne C. Stone, Reed A. Cartwright
(Submitted on 16 May 2013 (v1), last revised 12 Nov 2014 (this version, v3))

We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, genome-genome alignment, and annotation. For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered phylogenies from multiple datasets that were consistent with previous conflicting estimates of the relationships among mammals. SISRS is open source and freely available at this https URL

Resolving microbial microdiversity with high accuracy full length 16S rRNA Illumina sequencing

Resolving microbial microdiversity with high accuracy full length 16S rRNA Illumina sequencing
Catherine Burke, Aaron E Darling
doi: http://dx.doi.org/10.1101/010967

We describe a method for sequencing full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform. The resulting sequences have about 100-fold higher accuracy than standard Illumina reads and are chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection. We demonstrate that the data provides fine scale phylogenetic resolution not available from Illumina amplicon methods targeting smaller variable regions of the 16S rRNA gene.

Epidemiological and evolutionary analysis of the 2014 Ebola virus outbreak

Epidemiological and evolutionary analysis of the 2014 Ebola virus outbreak
Marta Łuksza, Trevor Bedford, Michael Lässig
Subjects: Populations and Evolution (q-bio.PE)

The 2014 epidemic of the Ebola virus is governed by a genetically diverse viral population. In the early Sierra Leone outbreak, a recent study has identified new mutations that generate genetically distinct sequence clades. Here we find evidence that major Sierra Leone clades have systematic differences in growth rate and reproduction number. If this growth heterogeneity remains stable, it will generate major shifts in clade frequencies and influence the overall epidemic dynamics on time scales within the current outbreak. Our method is based on simple summary statistics of clade growth, which can be inferred from genealogical trees with an underlying clade-specific birth-death model of the infection dynamics. This method can be used to perform realtime tracking of an evolving epidemic and identify emerging clades of epidemiological or evolutionary significance.

Annotating RNA motifs in sequences and alignments

Annotating RNA motifs in sequences and alignments
Paul P Gardner, Hisham Eldai
doi: http://dx.doi.org/10.1101/011197

RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterise RNA motifs, which are the central building blocks of RNA structure. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterised RNAs. Moreover, we introduce a new profile-based database of RNA motifs – RMfam – and illustrate its application for investigating the evolution and functional characterisation of RNA. All the data and scripts associated with this work is available from: https://github.com/ppgardne/RMfam

GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands.

GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands.
Florent Lassalle, Séverine Périan, Thomas Bataillon, Xavier Nesme, Laurent Duret, Vincent Daubin
doi: http://dx.doi.org/10.1101/011023

The characterization of functional elements in genomes relies on the identification of the footprints of natural selection. In this quest, taking into account neutral evolutionary processes such as mutation and genetic drift is crucial because these forces can generate patterns that may obscure or mimic signatures of selection. In mammals, and probably in many eukaryotes, another such confounding factor called GC-Biased Gene Conversion (gBGC) has been documented. This mechanism generates patterns identical to what is expected under selection for higher GC-content, specifically in highly recombining genomic regions. Recent results have suggested that a mysterious selective force favouring higher GC-content exists in Bacteria but the possibility that it could be gBGC has been excluded. Here, we show that gBGC is probably at work in most if not all bacterial species. First we find a consistent positive relationship between the GC-content of a gene and evidence of intra-genic recombination throughout a broad spectrum of bacterial clades. Second, we show that the evolutionary force responsible for this pattern is acting independently from selection on codon usage, and could potentially interfere with selection in favor of optimal AU-ending codons. A comparison with data from human populations shows that the intensity of gBGC in Bacteria is comparable to what has been reported in mammals. We propose that gBGC is not restricted to sexual Eukaryotes but also widespread among Bacteria and could therefore be an ancestral feature of cellular organisms. We argue that if gBGC occurs in bacteria, it can account for previously unexplained observations, such as the apparent non-equilibrium of base substitution patterns and the heterogeneity of gene composition within bacterial genomes. Because gBGC produces patterns similar to positive selection, it is essential to take this process into account when studying the evolutionary forces at work in bacterial genomes.

Ancestries of a Recombining Diploid Population

Ancestries of a Recombining Diploid Population,
R Sainudiin, B. Thatte and A. Veber, UCDMS Research Report 2014/3, 42 pages, 2014

We derive the exact one-step transition probabilities of the number of lineages
that are ancestral to a random sample from the current generation of a bi-parental
population that is evolving under the discrete Wright-Fisher model with n diploid
individuals. Our model allows for a per-generation recombination probability of
r. When r = 1, our model is equivalent to Chang’s model [4] for the karyotic
pedigree. When r = 0, our model is equivalent to Kingman’s discrete coalescent
model [16] for the cytoplasmic tree or sub-karyotic tree containing a DNA locus that
is free of intra-locus recombination. When 0 < r < 1 our model can be thought to
track a sub-karyotic ancestral graph containing a DNA sequence from an autosomal
chromosome that has an intra-locus recombination probability r. Thus, our family
of models indexed by r 2 [0; 1] connects Kingman's discrete coalescent to Chang's
pedigree in a continuous way as r goes from 0 to 1. For large populations, we
also study three properties of the r-specific ancestral process: the time Tn to a
most recent common ancestor (MRCA) of the population, the time Un at which all
individuals are either common ancestors to all present day individuals or ancestral
to none of them, and the fraction of individuals that are common ancestors at time
Un. These results generalize the three main results in [4]. When we appropriately
rescale time and recombination probability by the population size, our model leads
to the continuous time Markov chain called the ancestral recombination graph of
Hudson [12] and Griffiths [9].

Tackling drug resistant infection outbreaks of global pandemic Escherichia coli ST131 using evolutionary and epidemiological genomics

Tackling drug resistant infection outbreaks of global pandemic Escherichia coli ST131 using evolutionary and epidemiological genomics
Tim Downing
(Submitted on 4 Nov 2014)

High-throughput molecular approaches are required to investigate the origin and diffusion of antimicrobial resistance in rapidly radiating pathogen outbreaks. The most frequent cause of human infection is Escherichia coli, which is dominated by ST131, a single pandemic clone. This epidemic subtype possesses an extensive array of virulence elements and tolerates many drugs. Frequent global sweeps of new dominant ST131 varieties necessitate deep genomic scrutiny of their spread, evolution and lateral transfer of drug resistance genes. Phylogenetic methods that decipher past events can predict future patterns of virulence and transmission based on genetic signatures of adaptation and recombination. Antibiotic tolerance is controlled by natural variation in gene expression levels, which can initiate delayed cell growth. This dormancy allows survival despite drug exposure, and yet may only be present in part of the infecting cell population. Consequently, genomic epidemiology needs to explore the scale of phenotypic regulatory control acting on RNA. A multi-faceted approach can comprehensively assess antimicrobial resistance in E. coli ST131 in terms of within-host genetic heterogeneity, regulation of gene expression, and transmission dynamics between hosts to achieve a goal of pre-empting resistance before it emerges by optimising drug treatment protocols.