Analysis and rejection sampling of Wright-Fisher diffusion bridges

Analysis and rejection sampling of Wright-Fisher diffusion bridges
Joshua G. Schraiber, Robert C. Griffiths, Steven N. Evans
(Submitted on 14 Jun 2013)

We investigate the properties of a Wright-Fisher diffusion process started from frequency x at time 0 and conditioned to be at frequency y at time T. Such a process is called a bridge. Bridges arise naturally in the analysis of selection acting on standing variation and in the inference of selection from allele frequency time series. We establish a number of results about the distribution of neutral Wright-Fisher bridges and develop a novel rejection sampling scheme for bridges under selection that we use to study their behavior.

Predicting the loss of phylogenetic diversity under non-stationary diversification models

Predicting the loss of phylogenetic diversity under non-stationary diversification models
Amaury Lambert, Mike Steel
(Submitted on 12 Jun 2013)

For many taxa, the current high rates of extinction are likely to result in a significant loss of biodiversity. The evolutionary heritage of biodiversity is frequently quantified by a measure called phylogenetic diversity (PD). We predict the loss of PD under a wide class of phylogenetic tree models, where speciation rates and extinction rates may be time-dependent, and assuming independent random species extinctions at the present. We study the loss of PD when $K$ contemporary species are selected uniformly at random from the $N$ extant species as the surviving taxa, while the remaining $N-K$ become extinct. We consider two models of species sampling, the so-called field of bullets model, where each species independently survives the extinction event at the present with probability $p$, and a model for which the number of surviving species is fixed.
We provide explicit formulae for the expected remaining PD in both models, conditional on $N=n$, conditional on $K=k$, or conditional on both events. When $N=n$ is fixed, we show the convergence to an explicit deterministic limit of the ratio of new to initial PD, as $n\to\infty$, both under the field of bullets model, and when $K=k_n$ is fixed and depends on $n$ in such a way that $k_n/n$ converges to $p$. We also prove the convergence of this ratio as $T\to\infty$ in the supercritical, time-homogeneous case, where $N$ simultaneously goes to $\infty$, thereby strengthening previous results of Mooers et al. (2012).

The Moran model with selection: Fixation probabilities, ancestral lines, and an alternative particle representation

The Moran model with selection: Fixation probabilities, ancestral lines, and an alternative particle representation
Sandra Kluth, Ellen Baake
(Submitted on 12 Jun 2013)

We reconsider the Moran model in continuous time with population size $N$, two types, and selection. We introduce a new particle representation, which we call labelled Moran model, and which has the same empirical type distribution as the original Moran model, provided the initial values are chosen appropriately. In the new model, individuals are labelled $1,2, \dots, N$; neutral resampling events may take place between arbitrary labels, whereas selective events only occur in the direction of increasing labels. With the help of elementary methods only, we do not only recover fixation probabilities, but obtain detailed insight into the number and nature of the selective events that play a role in the fixation process forward in time.

Efficient Exploration of the Space of Reconciled Gene Trees

Efficient Exploration of the Space of Reconciled Gene Trees
Gergely J. Szöllősi, Wojciech Rosikiewicz, Bastien Boussau, Eric Tannier, Vincent Daubin
(Submitted on 10 Jun 2013)

Gene trees record the combination of gene level events, such as duplication, transfer and loss, and species level events, such as speciation and extinction. Gene tree-species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree-species tree reconciliation.
Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of trees. We implement ALE in the context of a reconciliation model, which allows for the duplication, transfer and loss of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood.
We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic topologies, branch lengths and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with 24%, 59% and 46% percent reductions in the mean numbers of duplications, transfers and losses.

Upper Rhine Valley: A migration crossroads of middle European oaks

Upper Rhine Valley: A migration crossroads of middle European oaks
Charalambos Neophytou, Hans-Gerhard Michiels
(Submitted on 10 Jun 2013)

The indigenous oak species (Quercus spp.) of the Upper Rhine Valley have migrated to their current distribution range in the area after the transition to the Holocene interglacial. Since post-glacial recolonization, they have been subjected to ecological changes and human impact. By using chloroplast microsatellite markers (cpSSRs), we provide detailed phylogeographic information and we address the contribution of natural and human-related factors to the current pattern of chloroplast DNA (cpDNA) variation. 626 individual trees from 86 oak stands including all three indigenous oak species of the region were sampled. In order to verify the refugial origin, reference samples from refugial areas and DNA samples from previous studies with known cpDNA haplotypes (chlorotypes) were used. Chlorotypes belonging to three different maternal lineages, corresponding to the three main glacial refugia, were found in the area. These were spatially structured and highly introgressed among species, reflecting past hybridization which involved all three indigenous oak species. Site condition heterogeneity was found among groups of populations which differed in terms of cpDNA variation. This suggests that different biogeographic subregions within the Upper Rhine Valley were colonized during separate post-glacial migration waves. Genetic variation was higher in Quercus robur than in Quercus petraea, which is probably due to more efficient seed dispersal and the more pronounced pioneer character of the former species. Finally, stands of Q. robur established in the last 70 years were significantly more diverse, which can be explained by the improved transportation ability of seeds and seedlings for artificial regeneration of stands during this period.

Evolutionary accessibility of modular fitness landscapes

Evolutionary accessibility of modular fitness landscapes
Benjamin Schmiegelt, Joachim Krug
(Submitted on 8 Jun 2013)

A fitness landscape is a mapping from the space of genetic sequences, which is modeled here as a binary hypercube of dimension $L$, to the real numbers. We consider random models of fitness landscapes, where fitness values are assigned according to some probabilistic rule, and study the statistical properties of pathways to the global fitness maximum along which fitness increases monotonically. Such paths are important for evolution because they are the only ones that are accessible to an adapting population when mutations occur at a low rate. The focus of this work is on the block model introduced by A.S. Perelson and C.A. Macken [Proc. Natl. Acad. Sci. USA 92:9657 (1995)] where the genome is decomposed into disjoint sets of loci (`modules’) that contribute independently to fitness, and fitness values within blocks are assigned at random. We show that the number of accessible paths can be written as a product of the path numbers within the blocks, which provides a detailed analytic description of the path statistics. The block model can be viewed as a special case of Kauffman’s NK-model, and we compare the analytic results to simulations of the NK-model with different genetic architectures. We find that the mean number of accessible paths in the different versions of the model are quite similar, but the distribution of the path number is qualitatively different in the block model due to its multiplicative structure. A similar statement applies to the number of local fitness maxima in the NK-models, which has been studied extensively in previous works. The overall evolutionary accessibility of the landscape, as quantified by the probability to find at least one accessible path to the global maximum, is dramatically lowered by the modular structure.

Spin models inferred from patient data faithfully describe HIV fitness landscapes and enable rational vaccine design

Spin models inferred from patient data faithfully describe HIV fitness landscapes and enable rational vaccine design
Karthik Shekhar, Claire F. Ruberman, Andrew L. Ferguson, John P. Barton, Mehran Kardar, Arup K. Chakraborty
(Submitted on 9 Jun 2013)

Mutational escape from vaccine induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus’ fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of non-equilibrium viral evolution driven by patient-specific immune responses, and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory \'{a} la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.

mendelFix: a Perl script for checking Mendelian errors in high density SNP data of trio designs

mendelFix: a Perl script for checking Mendelian errors in high density SNP data of trio designs
Yuri Tani Utsunomiya, Rodrigo Vitorio Alonso, Adriana Santana do Carmo, Francine Campagnari, José Antonio Vinsintin, José Fernando Garcia
(Submitted on 10 Jun 2013)

Here we present mendelFix, a Perl script for checking Mendelian errors in genome-wide SNP data of trio designs. The program takes 12-recoded PLINK PED and MAP files as input to calculate a series of summary statistics for Mendelian errors, sets missing offspring genotypes that present Mendelian inconsistencies, and implements a simplistic procedure to infer missing genotypes using parent information. The program can be easily incorporated in any pipeline for family-based SNP data analysis, and is distributed as free software under the GNU General Public License.

Enhancement of a Novel Method for Mutational Disease Prediction using Bioinformatics Techniques and Backpropagation Algorithm

Enhancement of a Novel Method for Mutational Disease Prediction using Bioinformatics Techniques and Backpropagation Algorithm
Ayad Ghany Ismaeel, Anar Auda Ablahad
(Submitted on 7 Jun 2013)

The noval method for mutational disease prediction using bioinformatics tools and datasets for diagnosis the malignant mutations with powerful Artificial Neural Network (Backpropagation Network) for classifying these malignant mutations are related to gene(s) (like BRCA1 and BRCA2) cause a disease (breast cancer). This noval method did not take in consideration just like adopted for dealing, analyzing and treat the gene sequences for extracting useful information from the sequence, also exceeded the environment factors which play important roles in deciding and calculating some of genes features in order to view its functional parts and relations to diseases. This paper is proposed an enhancement of a novel method as a first way for diagnosis and prediction the disease by mutations considering and introducing multi other features show the alternations, changes in the environment as well as genes, comparing sequences to gain information about the structure or function of a query sequence, also proposing optimal and more accurate system for classification and dealing with specific disorder using backpropagation with mean square rate 0.000000001. Index Terms (Homology sequence, GC content and AT content, Bioinformatics, Backpropagation Network, BLAST, DNA Sequence, Protein Sequence)

Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers

Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers
Charalambos Neophytou, Filippos A. Aravanopoulos, Siegfried Fink, Aikaterini Dounavi
(Submitted on 10 Jun 2013)

Genetic analysis was carried out in order to provide insights into differentiation among populations of two interfertile oak species, Quercus petraea and Quercus robur. Gene flow between the two species, local adaptations and speciation processes in general, may leave differential molecular signatures across the genome. Three interspecific pairs of natural populations from three ecologically different regions, one in central Europe (SW Germany) and two in the Balkan Peninsula (Greece and Bulgaria) were sampled. Grouping of highly informative SSR loci was made according to the component of variation they express – interspecific or provenance specific. Species and provenance discriminant loci were characterized based on FSTs. Locus specific FSTs were tested for deviation from the neutral expectation both within and between species. Data were then treated separately in a Bayesian analysis of genetic structure. By using three species discriminant loci, high membership probability to inferred species groups was achieved. On the other hand, analysis of genetic structure based on five provenance discriminant loci was correlated with geographic region and revealed shared genetic variation between neighbouring Q. petraea and Q. robur. Small sets of highly variable nuclear SSRs were sufficient to discriminate, either between species or between provenances. Thus, an effective tool is provided for molecular identification of both species and provenances. Furthermore, data suggest that a combination of gene flow and natural selection forms these diversity patterns. Species discriminant loci might represent genome regions affected by directional selection, which maintains species identity. Provenance specific loci might represent genome regions with high interspecific gene flow and common adaptive patterns to local environmental factors.