Upper Rhine Valley: A migration crossroads of middle European oaks

Upper Rhine Valley: A migration crossroads of middle European oaks
Charalambos Neophytou, Hans-Gerhard Michiels
(Submitted on 10 Jun 2013)

The indigenous oak species (Quercus spp.) of the Upper Rhine Valley have migrated to their current distribution range in the area after the transition to the Holocene interglacial. Since post-glacial recolonization, they have been subjected to ecological changes and human impact. By using chloroplast microsatellite markers (cpSSRs), we provide detailed phylogeographic information and we address the contribution of natural and human-related factors to the current pattern of chloroplast DNA (cpDNA) variation. 626 individual trees from 86 oak stands including all three indigenous oak species of the region were sampled. In order to verify the refugial origin, reference samples from refugial areas and DNA samples from previous studies with known cpDNA haplotypes (chlorotypes) were used. Chlorotypes belonging to three different maternal lineages, corresponding to the three main glacial refugia, were found in the area. These were spatially structured and highly introgressed among species, reflecting past hybridization which involved all three indigenous oak species. Site condition heterogeneity was found among groups of populations which differed in terms of cpDNA variation. This suggests that different biogeographic subregions within the Upper Rhine Valley were colonized during separate post-glacial migration waves. Genetic variation was higher in Quercus robur than in Quercus petraea, which is probably due to more efficient seed dispersal and the more pronounced pioneer character of the former species. Finally, stands of Q. robur established in the last 70 years were significantly more diverse, which can be explained by the improved transportation ability of seeds and seedlings for artificial regeneration of stands during this period.

Evolutionary accessibility of modular fitness landscapes

Evolutionary accessibility of modular fitness landscapes
Benjamin Schmiegelt, Joachim Krug
(Submitted on 8 Jun 2013)

A fitness landscape is a mapping from the space of genetic sequences, which is modeled here as a binary hypercube of dimension $L$, to the real numbers. We consider random models of fitness landscapes, where fitness values are assigned according to some probabilistic rule, and study the statistical properties of pathways to the global fitness maximum along which fitness increases monotonically. Such paths are important for evolution because they are the only ones that are accessible to an adapting population when mutations occur at a low rate. The focus of this work is on the block model introduced by A.S. Perelson and C.A. Macken [Proc. Natl. Acad. Sci. USA 92:9657 (1995)] where the genome is decomposed into disjoint sets of loci (`modules’) that contribute independently to fitness, and fitness values within blocks are assigned at random. We show that the number of accessible paths can be written as a product of the path numbers within the blocks, which provides a detailed analytic description of the path statistics. The block model can be viewed as a special case of Kauffman’s NK-model, and we compare the analytic results to simulations of the NK-model with different genetic architectures. We find that the mean number of accessible paths in the different versions of the model are quite similar, but the distribution of the path number is qualitatively different in the block model due to its multiplicative structure. A similar statement applies to the number of local fitness maxima in the NK-models, which has been studied extensively in previous works. The overall evolutionary accessibility of the landscape, as quantified by the probability to find at least one accessible path to the global maximum, is dramatically lowered by the modular structure.

Spin models inferred from patient data faithfully describe HIV fitness landscapes and enable rational vaccine design

Spin models inferred from patient data faithfully describe HIV fitness landscapes and enable rational vaccine design
Karthik Shekhar, Claire F. Ruberman, Andrew L. Ferguson, John P. Barton, Mehran Kardar, Arup K. Chakraborty
(Submitted on 9 Jun 2013)

Mutational escape from vaccine induced immune responses has thwarted the development of a successful vaccine against AIDS, whose causative agent is HIV, a highly mutable virus. Knowing the virus’ fitness as a function of its proteomic sequence can enable rational design of potent vaccines, as this information can focus vaccine induced immune responses to target mutational vulnerabilities of the virus. Spin models have been proposed as a means to infer intrinsic fitness landscapes of HIV proteins from patient-derived viral protein sequences. These sequences are the product of non-equilibrium viral evolution driven by patient-specific immune responses, and are subject to phylogenetic constraints. How can such sequence data allow inference of intrinsic fitness landscapes? We combined computer simulations and variational theory \'{a} la Feynman to show that, in most circumstances, spin models inferred from patient-derived viral sequences reflect the correct rank order of the fitness of mutant viral strains. Our findings are relevant for diverse viruses.

mendelFix: a Perl script for checking Mendelian errors in high density SNP data of trio designs

mendelFix: a Perl script for checking Mendelian errors in high density SNP data of trio designs
Yuri Tani Utsunomiya, Rodrigo Vitorio Alonso, Adriana Santana do Carmo, Francine Campagnari, José Antonio Vinsintin, José Fernando Garcia
(Submitted on 10 Jun 2013)

Here we present mendelFix, a Perl script for checking Mendelian errors in genome-wide SNP data of trio designs. The program takes 12-recoded PLINK PED and MAP files as input to calculate a series of summary statistics for Mendelian errors, sets missing offspring genotypes that present Mendelian inconsistencies, and implements a simplistic procedure to infer missing genotypes using parent information. The program can be easily incorporated in any pipeline for family-based SNP data analysis, and is distributed as free software under the GNU General Public License.

Enhancement of a Novel Method for Mutational Disease Prediction using Bioinformatics Techniques and Backpropagation Algorithm

Enhancement of a Novel Method for Mutational Disease Prediction using Bioinformatics Techniques and Backpropagation Algorithm
Ayad Ghany Ismaeel, Anar Auda Ablahad
(Submitted on 7 Jun 2013)

The noval method for mutational disease prediction using bioinformatics tools and datasets for diagnosis the malignant mutations with powerful Artificial Neural Network (Backpropagation Network) for classifying these malignant mutations are related to gene(s) (like BRCA1 and BRCA2) cause a disease (breast cancer). This noval method did not take in consideration just like adopted for dealing, analyzing and treat the gene sequences for extracting useful information from the sequence, also exceeded the environment factors which play important roles in deciding and calculating some of genes features in order to view its functional parts and relations to diseases. This paper is proposed an enhancement of a novel method as a first way for diagnosis and prediction the disease by mutations considering and introducing multi other features show the alternations, changes in the environment as well as genes, comparing sequences to gain information about the structure or function of a query sequence, also proposing optimal and more accurate system for classification and dealing with specific disorder using backpropagation with mean square rate 0.000000001. Index Terms (Homology sequence, GC content and AT content, Bioinformatics, Backpropagation Network, BLAST, DNA Sequence, Protein Sequence)

Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers

Detecting interspecific and geographic differentiation patterns in two interfertile oak species (Quercus petraea (Matt.) Liebl. and Q. robur L.) using small sets of microsatellite markers
Charalambos Neophytou, Filippos A. Aravanopoulos, Siegfried Fink, Aikaterini Dounavi
(Submitted on 10 Jun 2013)

Genetic analysis was carried out in order to provide insights into differentiation among populations of two interfertile oak species, Quercus petraea and Quercus robur. Gene flow between the two species, local adaptations and speciation processes in general, may leave differential molecular signatures across the genome. Three interspecific pairs of natural populations from three ecologically different regions, one in central Europe (SW Germany) and two in the Balkan Peninsula (Greece and Bulgaria) were sampled. Grouping of highly informative SSR loci was made according to the component of variation they express – interspecific or provenance specific. Species and provenance discriminant loci were characterized based on FSTs. Locus specific FSTs were tested for deviation from the neutral expectation both within and between species. Data were then treated separately in a Bayesian analysis of genetic structure. By using three species discriminant loci, high membership probability to inferred species groups was achieved. On the other hand, analysis of genetic structure based on five provenance discriminant loci was correlated with geographic region and revealed shared genetic variation between neighbouring Q. petraea and Q. robur. Small sets of highly variable nuclear SSRs were sufficient to discriminate, either between species or between provenances. Thus, an effective tool is provided for molecular identification of both species and provenances. Furthermore, data suggest that a combination of gene flow and natural selection forms these diversity patterns. Species discriminant loci might represent genome regions affected by directional selection, which maintains species identity. Provenance specific loci might represent genome regions with high interspecific gene flow and common adaptive patterns to local environmental factors.

Our paper: Effect of Genetic Variation in a Drosophila Model of Misfolded Human Proinsulin

This guest post is by Bin He on two preprints, Genetic Complexity in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin and Effect of Genetic Variation in a Drosophila Model of Misfolded Human Proinsulin, arXived here and here, respectively. This is a cross-post from Bin’s blog

Here we describe a pair of papers, both of which have been posted by Joe on this blog in the past month. But since they are intimately connected, we would like to write an additional post to explain the rationales behind them and the major findings therein.

The central questions in these two papers concern the genetic architecture of complex traits, such as those in human common disorders. We took a model organism approach in order to complement human studies, which are getting more and more powerful because of the successful community collaboration, but are still limited in several aspects, including mapping resolution and the ability to perform experimental validations.

Another important thinking underlying this project is the idea that decanalization of a trait may have caused a release of genetic variation, which subsequently contributed to the disease variability we see today. To this end, our fly model of misfolded human proinsulin may be viewed as an external perturbation, which, by exhausting the organism’s buffering capacity, reveals normally cryptic genetic variation. Under this view, our model will have general relevance in many human disorders.

To perform this study, we first established a fly model of a disease-associated human mutant proinsulin, which was the subject of our first paper “Genetic Complexity in a Drosophila Model of Diabetes-Associated Misfolded Human Proinsulin”.

We’d like to bring out several points. First, regarding the etiology of the disease phenotype in our fly model, we believe it is mainly due to the physical property of the mutant protein, rather than the biological function of the human proinsulin. Although Drosophila also has insulin-like proteins, their sequence similarity and functions differ substantially from the human homolog. Consistent with this view, when we made a transgenic fly expressing the wild-type human proinsulin, what we observed is that, at both phenotype and transcription level, expressing the wild-type human proinsulin in developing eye and other imaginal discs do not cause any visible changes. We thus propose that our fly model is for a general class of human disease associated with unfolded or misfolded protein.

In the first paper, we also described the phenomenon of variable phenotypic severity when put on different wild-derived genetic background. A series of experiments ruled out possible confounding factors, such as correlations induced by natural variability in eye size, or different levels of transgene expression.

We were exploring the idea of using natural variation in the fly to identify associated loci underlying a complex disease trait. We did so by crossing the transgenic, Mendelian disease carrying line to a panel of wild-derived inbred lines, and asked whether the severity of the disease is dependent on the genetic background. The answer is a definite yes: the range of phenotype quantified by the size of the eye span from 10% to 80% of wildtype (the mutant human proinsulin was expressed in the eye disc during development, causing neurodegeneration. We used eye because it is dispensable in lab conditions, and easy to measure the phenotype). We then conducted a GWAS, which led to the identification of sfl, as described above, and also the HS biosynthetic pathway by genetic test. One unique advantage of our system is its ultra-high resolution in mapping: we localized the association signal to ~400bp LD block within one of the introns of sfl, allowing us to test specific hypotheses about the molecular mechanisms of the associated variants. Pyro-sequencing analysis revealed allele-specific expression difference due to the intronic variation, but also highlighted the genetic heterogeneity even within that locus, with additional cis-variants present to influence the expression level. Overall, we believe that our fly model system is a powerful complementary approach to the genetic study of complex traits. Its high mapping resolution and rich molecular/genetic toolkits allow faster and in-depth characterization of disease-associated variation, which is a unique advantage.

Bin Z. He
Kreitman Lab, Dept of Ecology and Evolution, University of Chicago
current address: O’Shea Lab, FAS Center for Systems Biology, Harvard University / HHMI

On the accumulation of deleterious mutations during range expansions

On the accumulation of deleterious mutations during range expansions
Stephan Peischl, Isabelle Dupanloup, Mark Kirkpatrick, Laurent Excoffier
(Submitted on 7 Jun 2013)

We investigate the effect of spatial range expansions on the evolution of fitness when beneficial and deleterious mutations co-segregate. We perform individual-based simulations of a uniform linear habitat and complement them with analytical approximations for the evolution of mean fitness at the edge of the expansion. We find that deleterious mutations accumulate steadily on the wave front during range expansions, thus creating an expansion load. Reduced fitness due to the expansion load is not restricted to the wave front but occurs over a large proportion of newly colonized habitats. The expansion load can persist and represent a major fraction of the total mutation load thousands of generations after the expansion. Our results extend qualitatively and quantitatively to two-dimensional expansions. The phenomenon of expansion load may explain growing evidence that populations that have recently expanded, including humans, show an excess of deleterious mutations. To test the predictions of our model, we analyze patterns of neutral and non-neutral genetic diversity in humans and find an excellent fit between theory and data.

Low-bandwidth and non-compute intensive remote identification of microbes from raw sequencing reads

Low-bandwidth and non-compute intensive remote identification of microbes from raw sequencing reads
Laurent Gautier, Ole Lund
(Submitted on 6 Jun 2013)

Cheap high-throughput DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples.
We propose a novel general approach to the analysis of sequencing data in which the reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data, and the hints can be used for more computationally-demanding work.
Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references known to the server. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment.
To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients, one of them running in a web browser, in order to demonstrate that gigabytes of raw sequencing reads of unknown origin could be identified without the need to transfer a very large volume of data, and on modestly powered computing devices.
A web access is available at this http URL. The source code for a python command-line client, a server, and supplementary data is available at this http URL.

SPATA: A Seeding and Patching Algorithm for Hybrid Transcriptome Assembly

SPATA: A Seeding and Patching Algorithm for Hybrid Transcriptome Assembly
Tin Chi Nguyen, Zhiyu Zhao, Dongxiao Zhu
(Submitted on 6 Jun 2013)

Transcriptome assembly from RNA-Seq reads is an active area of bioinformatics research. The ever-declining cost and the increasing depth of RNA-Seq have provided unprecedented opportunities to better identify expressed transcripts. However, the nonlinear transcript structures and the ultra-high throughput of RNA-Seq reads pose significant algorithmic and computational challenges to the existing transcriptome assembly approaches, either reference-guided or de novo. While reference-guided approaches offer good sensitivity, they rely on alignment results of the splice-aware aligners and are thus unsuitable for species with incomplete reference genomes. In contrast, de novo approaches do not depend on the reference genome but face a computational daunting task derived from the complexity of the graph built for the whole transcriptome. In response to these challenges, we present a hybrid approach to exploit an incomplete reference genome without relying on splice-aware aligners. We have designed a split-and-align procedure to efficiently localize the reads to individual genomic loci, which is followed by an accurate de novo assembly to assemble reads falling into each locus. Using extensive simulation data, we demonstrate a high accuracy and precision in transcriptome reconstruction by comparing to selected transcriptome assembly tools. Our method is implemented in assemblySAM, a GUI software freely available at this http URL.