Interfertile oaks in an island environment. II. Limited hybridization between Quercus alnifolia Poech and Q. coccifera L. in a mixed stand

Interfertile oaks in an island environment. II. Limited hybridization between Quercus alnifolia Poech and Q. coccifera L. in a mixed stand
Charalambos Neophytou, Filippos A. Aravanopoulos, Siegfried Fink, Aikaterini Dounavi
(Submitted on 11 Jun 2013)

Hybridization and introgression between Quercus alnifolia Poech and Q. coccifera L. is studied by analyzing morphological traits, nuclear and chloroplast DNA markers. The study site is a mixed stand on Troodos Mountains (Cyprus) and the analyzed material includes both adult trees and progenies of specific mother trees. Multivariate analysis of morphological traits shows that the two species can be well distinguished using simple leaf morphometric parameters. A lower genetic diversity in Q. alnifolia than in Q. coccifera and a high interspecific differentiation between the two species are supported by an analysis of nuclear and chloroplast microsatellites. The intermediacy of the four designated hybrids is verified by both leaf morphometric and genetic data. Analysis of progeny arrays provides evidence that interspecific crossings are rare. This finding is further supported by limited introgression of chloroplast genomes. Reproductive barriers (e.g. asynchronous phenology, post-zygotic incompatibilities) might account for this result. A directionality of interspecific gene flow is indicated by a genetic assignment analysis of effective pollen clouds with Q. alnifolia acting as pollen donor. Differences in flowering phenology and species distribution in the stand may have influenced the direction of gene flow and the genetic differentiation among effective pollen clouds of different mother trees within species.


Efficient Exploration of the Space of Reconciled Gene Trees

Efficient Exploration of the Space of Reconciled Gene Trees
Gergely J. Szöllősi, Wojciech Rosikiewicz, Bastien Boussau, Eric Tannier, Vincent Daubin
(Submitted on 10 Jun 2013)

Gene trees record the combination of gene level events, such as duplication, transfer and loss, and species level events, such as speciation and extinction. Gene tree-species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree-species tree reconciliation.
Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of trees. We implement ALE in the context of a reconciliation model, which allows for the duplication, transfer and loss of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood.
We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic topologies, branch lengths and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with 24%, 59% and 46% percent reductions in the mean numbers of duplications, transfers and losses.

Upper Rhine Valley: A migration crossroads of middle European oaks

Upper Rhine Valley: A migration crossroads of middle European oaks
Charalambos Neophytou, Hans-Gerhard Michiels
(Submitted on 10 Jun 2013)

The indigenous oak species (Quercus spp.) of the Upper Rhine Valley have migrated to their current distribution range in the area after the transition to the Holocene interglacial. Since post-glacial recolonization, they have been subjected to ecological changes and human impact. By using chloroplast microsatellite markers (cpSSRs), we provide detailed phylogeographic information and we address the contribution of natural and human-related factors to the current pattern of chloroplast DNA (cpDNA) variation. 626 individual trees from 86 oak stands including all three indigenous oak species of the region were sampled. In order to verify the refugial origin, reference samples from refugial areas and DNA samples from previous studies with known cpDNA haplotypes (chlorotypes) were used. Chlorotypes belonging to three different maternal lineages, corresponding to the three main glacial refugia, were found in the area. These were spatially structured and highly introgressed among species, reflecting past hybridization which involved all three indigenous oak species. Site condition heterogeneity was found among groups of populations which differed in terms of cpDNA variation. This suggests that different biogeographic subregions within the Upper Rhine Valley were colonized during separate post-glacial migration waves. Genetic variation was higher in Quercus robur than in Quercus petraea, which is probably due to more efficient seed dispersal and the more pronounced pioneer character of the former species. Finally, stands of Q. robur established in the last 70 years were significantly more diverse, which can be explained by the improved transportation ability of seeds and seedlings for artificial regeneration of stands during this period.

Evolutionary accessibility of modular fitness landscapes

Evolutionary accessibility of modular fitness landscapes
Benjamin Schmiegelt, Joachim Krug
(Submitted on 8 Jun 2013)

A fitness landscape is a mapping from the space of genetic sequences, which is modeled here as a binary hypercube of dimension $L$, to the real numbers. We consider random models of fitness landscapes, where fitness values are assigned according to some probabilistic rule, and study the statistical properties of pathways to the global fitness maximum along which fitness increases monotonically. Such paths are important for evolution because they are the only ones that are accessible to an adapting population when mutations occur at a low rate. The focus of this work is on the block model introduced by A.S. Perelson and C.A. Macken [Proc. Natl. Acad. Sci. USA 92:9657 (1995)] where the genome is decomposed into disjoint sets of loci (`modules’) that contribute independently to fitness, and fitness values within blocks are assigned at random. We show that the number of accessible paths can be written as a product of the path numbers within the blocks, which provides a detailed analytic description of the path statistics. The block model can be viewed as a special case of Kauffman’s NK-model, and we compare the analytic results to simulations of the NK-model with different genetic architectures. We find that the mean number of accessible paths in the different versions of the model are quite similar, but the distribution of the path number is qualitatively different in the block model due to its multiplicative structure. A similar statement applies to the number of local fitness maxima in the NK-models, which has been studied extensively in previous works. The overall evolutionary accessibility of the landscape, as quantified by the probability to find at least one accessible path to the global maximum, is dramatically lowered by the modular structure.

Spin models inferred from patient data faithfully describe HIV fitness landscapes and enable rational vaccine design

Spin models inferred from patient data faithfully describe HIV fitness landscapes and enable rational vaccine design
Karthik Shekhar, Claire F. Ruberman, Andrew L. Ferguson, John P. Barton, Mehran Kardar, Arup K. Chakraborty
(Submitted on 9 Jun 2013)

Mutational escape from vaccine induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus’ fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of non-equilibrium viral evolution driven by patient-specific immune responses, and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory \'{a} la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.

mendelFix: a Perl script for checking Mendelian errors in high density SNP data of trio designs

mendelFix: a Perl script for checking Mendelian errors in high density SNP data of trio designs
Yuri Tani Utsunomiya, Rodrigo Vitorio Alonso, Adriana Santana do Carmo, Francine Campagnari, José Antonio Vinsintin, José Fernando Garcia
(Submitted on 10 Jun 2013)

Here we present mendelFix, a Perl script for checking Mendelian errors in genome-wide SNP data of trio designs. The program takes 12-recoded PLINK PED and MAP files as input to calculate a series of summary statistics for Mendelian errors, sets missing offspring genotypes that present Mendelian inconsistencies, and implements a simplistic procedure to infer missing genotypes using parent information. The program can be easily incorporated in any pipeline for family-based SNP data analysis, and is distributed as free software under the GNU General Public License.

Enhancement of a Novel Method for Mutational Disease Prediction using Bioinformatics Techniques and Backpropagation Algorithm

Enhancement of a Novel Method for Mutational Disease Prediction using Bioinformatics Techniques and Backpropagation Algorithm
Ayad Ghany Ismaeel, Anar Auda Ablahad
(Submitted on 7 Jun 2013)

The noval method for mutational disease prediction using bioinformatics tools and datasets for diagnosis the malignant mutations with powerful Artificial Neural Network (Backpropagation Network) for classifying these malignant mutations are related to gene(s) (like BRCA1 and BRCA2) cause a disease (breast cancer). This noval method did not take in consideration just like adopted for dealing, analyzing and treat the gene sequences for extracting useful information from the sequence, also exceeded the environment factors which play important roles in deciding and calculating some of genes features in order to view its functional parts and relations to diseases. This paper is proposed an enhancement of a novel method as a first way for diagnosis and prediction the disease by mutations considering and introducing multi other features show the alternations, changes in the environment as well as genes, comparing sequences to gain information about the structure or function of a query sequence, also proposing optimal and more accurate system for classification and dealing with specific disorder using backpropagation with mean square rate 0.000000001. Index Terms (Homology sequence, GC content and AT content, Bioinformatics, Backpropagation Network, BLAST, DNA Sequence, Protein Sequence)

Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers

Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers
Charalambos Neophytou, Filippos A. Aravanopoulos, Siegfried Fink, Aikaterini Dounavi
(Submitted on 10 Jun 2013)

Genetic analysis was carried out in order to provide insights into differentiation among populations of two interfertile oak species, Quercus petraea and Quercus robur. Gene flow between the two species, local adaptations and speciation processes in general, may leave differential molecular signatures across the genome. Three interspecific pairs of natural populations from three ecologically different regions, one in central Europe (SW Germany) and two in the Balkan Peninsula (Greece and Bulgaria) were sampled. Grouping of highly informative SSR loci was made according to the component of variation they express – interspecific or provenance specific. Species and provenance discriminant loci were characterized based on FSTs. Locus specific FSTs were tested for deviation from the neutral expectation both within and between species. Data were then treated separately in a Bayesian analysis of genetic structure. By using three species discriminant loci, high membership probability to inferred species groups was achieved. On the other hand, analysis of genetic structure based on five provenance discriminant loci was correlated with geographic region and revealed shared genetic variation between neighbouring Q. petraea and Q. robur. Small sets of highly variable nuclear SSRs were sufficient to discriminate, either between species or between provenances. Thus, an effective tool is provided for molecular identification of both species and provenances. Furthermore, data suggest that a combination of gene flow and natural selection forms these diversity patterns. Species discriminant loci might represent genome regions affected by directional selection, which maintains species identity. Provenance specific loci might represent genome regions with high interspecific gene flow and common adaptive patterns to local environmental factors.

Our paper: Effect of Genetic Variation in a Drosophila Model of Misfolded Human Proinsulin

This guest post is by Bin He on two preprints, Genetic Complexity in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin and Effect of Genetic Variation in a Drosophila Model of Misfolded Human Proinsulin, arXived here and here, respectively. This is a cross-post from Bin’s blog

Here we describe a pair of papers, both of which have been posted by Joe on this blog in the past month. But since they are intimately connected, we would like to write an additional post to explain the rationales behind them and the major findings therein.

The central questions in these two papers concern the genetic architecture of complex traits, such as those in human common disorders. We took a model organism approach in order to complement human studies, which are getting more and more powerful because of the successful community collaboration, but are still limited in several aspects, including mapping resolution and the ability to perform experimental validations.

Another important thinking underlying this project is the idea that decanalization of a trait may have caused a release of genetic variation, which subsequently contributed to the disease variability we see today. To this end, our fly model of misfolded human proinsulin may be viewed as an external perturbation, which, by exhausting the organism’s buffering capacity, reveals normally cryptic genetic variation. Under this view, our model will have general relevance in many human disorders.

To perform this study, we first established a fly model of a disease-associated human mutant proinsulin, which was the subject of our first paper “Genetic Complexity in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin”.

We’d like to bring out several points. First, regarding the etiology of the disease phenotype in our fly model, we believe it is mainly due to the physical property of the mutant protein, rather than the biological function of the human proinsulin. Although Drosophila also has insulin-like proteins, their sequence similarity and functions differ substantially from the human homolog. Consistent with this view, when we made a transgenic fly expressing the wild-type human proinsulin, what we observed is that, at both phenotype and transcription level, expressing the wild-type human proinsulin in developing eye and other imaginal discs do not cause any visible changes. We thus propose that our fly model is for a general class of human disease associated with unfolded or misfolded protein.

In the first paper, we also described the phenomenon of variable phenotypic severity when put on different wild-derived genetic background. A series of experiments ruled out possible confounding factors, such as correlations induced by natural variability in eye size, or different levels of transgene expression.

We were exploring the idea of using natural variation in the fly to identify associated loci underlying a complex disease trait. We did so by crossing the transgenic, Mendelian disease carrying line to a panel of wild-derived inbred lines, and asked whether the severity of the disease is dependent on the genetic background. The answer is a definite yes: the range of phenotype quantified by the size of the eye span from 10% to 80% of wildtype (the mutant human proinsulin was expressed in the eye disc during development, causing neurodegeneration. We used eye because it is dispensable in lab conditions, and easy to measure the phenotype). We then conducted a GWAS, which led to the identification of sfl, as described above, and also the HS biosynthetic pathway by genetic test. One unique advantage of our system is its ultra-high resolution in mapping: we localized the association signal to ~400bp LD block within one of the introns of sfl, allowing us to test specific hypotheses about the molecular mechanisms of the associated variants. Pyro-sequencing analysis revealed allele-specific expression difference due to the intronic variation, but also highlighted the genetic heterogeneity even within that locus, with additional cis-variants present to influence the expression level. Overall, we believe that our fly model system is a powerful complementary approach to the genetic study of complex traits. Its high mapping resolution and rich molecular/genetic toolkits allow faster and in-depth characterization of disease-associated variation, which is a unique advantage.

Bin Z. He
Kreitman Lab, Dept of Ecology and Evolution, University of Chicago
current address: O’Shea Lab, FAS Center for Systems Biology, Harvard University / HHMI

On the accumulation of deleterious mutations during range expansions

On the accumulation of deleterious mutations during range expansions
Stephan Peischl, Isabelle Dupanloup, Mark Kirkpatrick, Laurent Excoffier
(Submitted on 7 Jun 2013)

We investigate the effect of spatial range expansions on the evolution of fitness when beneficial and deleterious mutations co-segregate. We perform individual-based simulations of a uniform linear habitat and complement them with analytical approximations for the evolution of mean fitness at the edge of the expansion. We find that deleterious mutations accumulate steadily on the wave front during range expansions, thus creating an expansion load. Reduced fitness due to the expansion load is not restricted to the wave front but occurs over a large proportion of newly colonized habitats. The expansion load can persist and represent a major fraction of the total mutation load thousands of generations after the expansion. Our results extend qualitatively and quantitatively to two-dimensional expansions. The phenomenon of expansion load may explain growing evidence that populations that have recently expanded, including humans, show an excess of deleterious mutations. To test the predictions of our model, we analyze patterns of neutral and non-neutral genetic diversity in humans and find an excellent fit between theory and data.