Summary: Advances in sequencing capacity have lead to the generation of unprecedented amounts of genomic data. The processing of this data frequently leads to I/O bottlenecks, e.g. when analyzing a small genomic region across a large number of samples. The largest I/O burden is, however, often not imposed by the amount of data needed for the analysis but rather by index files that help retrieving this data. We have developed chopBAI, a program that can chop a BAM index (BAI) file into small pieces. The program outputs a list of BAI files each indexing a specified genomic interval. The output files are much smaller in size but maintain compatibility with existing software tools. We show how preprocessing BAI files with chopBAI can lead to a reduction of I/O by more than 95 % during the analysis of 10 Kbp genomic regions, eventually enabling the joint analysis of more than 10,000 individuals. Availability and Implementation: The software is implemented in C++, GPL licensed and available at http://github.com/DecodeGenetics/chopBAI Contact: birte.kehr@decode.is
Category Archives: Uncategorized
What’s in my pot? Real-time species identification on the MinION
Using genotype data to distinguish pleiotropy from heterogeneity: deciphering coheritability in autoimmune and neuropsychiatric diseases
Common and phylogenetically widespread coding for peptides by bacterial small RNAs
The game of survival: Sexual evolution in dynamic environments
The game of survival: Sexual evolution in dynamic environments
Ruta Mehta, Ioannis Panageas, Georgios Piliouras, Prasad Tetali, Vijay V. Vazirani
Evolution is a complex algorithmic solution to life’s most pressing challenge, that of survival. It is a mixture of numerous textbook optimization techniques. Natural selection, the preferential replication ofthe fittest, encodes the multiplicative weights update algorithm, which in static environments is tantamount to exponential growth for the best solution. Sex can be interpreted as a game between different agents/genes with identical interests, maximizing the fitness of the individual. Mutation forces the exploration of consistently suboptimal solutions. Are all of these mechanisms necessary to ensure for survival? Also, how is it that despite their contradictory character (e.g., selection versus mutation) they do not cancel each other out? We address these questions by extending classic evolutionary models to allow for a dynamically changing environment. Sexual selection is well suited for static environments where we show that it converges polynomially fast to monomorphic populations. Mutations make the difference in dynamic environments. Without them species become extinct as they do not have the flexibility to recover fast given environmental change. On the other hand, we show that with mutation, as long as the rate of change of the environment is not too fast, long term survival is possible. Finally, mutation does not cancel the role of selection in static environments. Convergence remains guaranteed and only the level of polymorphism of the equilibria is affected. Our techniques quantify exploration-exploitation tradeoffs in time evolving non-convex optimization problems which could be of independent interest.
Quantifying Reticulation in Phylogenetic Complexes Using Homology
Quantifying Reticulation in Phylogenetic Complexes Using Homology
Kevin Emmett, Raul Rabadan
Reticulate evolutionary processes result in phylogenetic histories that cannot be modeled using a tree topology. Here, we apply methods from topological data analysis to molecular sequence data with reticulations. Using a simple example, we demonstrate the correspondence between nontrivial higher homology and reticulate evolution. We discuss the sensitivity of the standard filtration and show cases where reticulate evolution can fail to be detected. We introduce an extension of the standard framework and define the median complex as a construction to recover signal of the frequency and scale of reticulate evolution by inferring and imputing putative ancestral states. Finally, we apply our methods to two datasets from phylogenetics. Our work expands on earlier ideas of using topology to extract important evolutionary features from genomic data.