Low levels of transposable element activity in Drosophila mauritiana: causes and consequences

Low levels of transposable element activity in Drosophila mauritiana: causes and consequences

Robert Kofler , Christian Schlötterer
doi: http://dx.doi.org/10.1101/018218

Transposable elements (TEs) are major drivers of genomic and phenotypic evolution, yet many questions about their biology remain poorly understood. Here, we compare TE abundance between populations of the two sister species D. mauritiana und D. simulans and relate it to the more distantly related D. melanogaster. The low population frequency of most TE insertions in D. melanogaster and D. simulans has been a key feature of several models of TE evolution. In D. mauritiana, however, the majority of TE insertions are fixed (66%). We attribute this to a lower transposition activity of up to 47 TE families in D. mauritiana, rather than stronger purifying selection. Only three families, including the extensively studied Mariner, may have a higher activity in D. mauritiana. This remarkable difference in TE activity between two recently diverged Drosophila species (≈ 250,000 years), also supports the hypothesis that TE copy numbers in Drosophila may not reflect a stable equilibrium where the rate of TE gains equals the rate of TE losses by negative selection. We propose that the transposition rate heterogeneity results from the contrasting ecology of the two species: the extent of vertical extinction of TE families and horizontal acquisition of active TE copies may be very different between the colonizing D. simulans and the island endemic D. mauritiana. Our findings provide novel insights in the evolution of TEs in Drosophila and suggest that the ecology of the host species could be a major, yet underappreciated, factor governing the evolutionary dynamics of TEs.

When the mean is not enough: Calculating fixation time distributions in birth-death processes

When the mean is not enough: Calculating fixation time distributions in birth-death processes

Peter Ashcroft, Arne Traulsen, Tobias Galla
(Submitted on 16 Apr 2015)

Studies of fixation dynamics in Markov processes predominantly focus on the mean time to absorption. This may be inadequate if the distribution is broad and skewed. We compute the distribution of fixation times in one-step birth-death processes with two absorbing states. These are expressed in terms of the spectrum of the process, and we provide different representations as forward-only processes in eigenspace. These allow efficient sampling of fixation time distributions. As an application we study evolutionary game dynamics, where invading mutants can reach fixation or go extinct. We also highlight the median fixation time as a possible analog of mixing times in systems with small mutation rates and no absorbing states, whereas the mean fixation time has no such interpretation.

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia

Fast principal components analysis reveals independent evolution of ADH1B gene in Europe and East Asia

Kevin J Galinsky , Gaurav Bhatia , Po-Ru Loh , Stoyan Georgiev , Sayan Mukherjee , Nick J Patterson , Alkes L Price
doi: http://dx.doi.org/10.1101/018143

Principal components analysis (PCA) is a widely used tool for inferring population structure and correcting confounding in genetic data. We introduce a new algorithm, FastPCA, that leverages recent advances in random matrix theory to accurately approximate top PCs while reducing time and memory cost from quadratic to linear in the number of individuals, a computational improvement of many orders of magnitude. We apply FastPCA to a cohort of 54,734 European Americans, identifying 5 distinct subpopulations spanning the top 4 PCs. Using a new test for natural selection based on population differentiation along these PCs, we replicate previously known selected loci and identify three new signals of selection, including selection in Europeans at the ADH1B gene. The coding variant rs1229984 has previously been associated to alcoholism and shown to be under selection in East Asians; we show that it is a rare example of independent evolution on two continents.

Fulfilling the promise of Mendelian randomization

Fulfilling the promise of Mendelian randomization

Joseph Pickrell
doi: http://dx.doi.org/10.1101/018150

Many important questions in medicine involve questions about causality, For example, do low levels of high-density lipoproteins (HDL) cause heart disease? Does high body mass index (BMI) cause type 2 diabetes? Or are these traits simply correlated in the population for other reasons? A popular approach to answering these problems using human genetics is called “Mendelian randomization”. We discuss the prospects and limitations of this approach, and some ways forward.

Is there such a thing as Landscape Genetics?

Is there such a thing as Landscape Genetics?

Rodney J Dyer
doi: http://dx.doi.org/10.1101/018192

For a scientific discipline to be interdisciplinary it must satisfy two conditions; it must consist of contributions from at least two existing disciplines and it must be able to provide insights, through this interaction, that neither progenitor discipline could address. In this paper, I examine the complete body of peer-reviewed literature self-identified as landscape genetics using the statistical approaches of text mining and natural language processing. The goal here is to quantify the kinds of questions being addressed in landscape genetic studies, the ways in which questions are evaluated mechanistically, and how they are differentiated from the progenitor disciplines of landscape ecology and population genetics. I then circumscribe the main factions within published landscape genetic papers examining the extent to which emergent questions are being addressed and highlighting a deep bifurcation between existing individual- and population-based approaches. I close by providing some suggestions on where theoretical and analytical work is needed if landscape genetics is to serve as a real bridge connecting evolution and ecology sensu lato.

The design and analysis of binary variable traits in common garden genetic experiments of highly fecund species to assess heritability

The design and analysis of binary variable traits in common garden genetic experiments of highly fecund species to assess heritability

Sarah W Davies , Samuel Scarpino , Thanapat Pongwarin , James Scott , Mikhail V Matz
doi: http://dx.doi.org/10.1101/018044

Many biologically important traits are binomially distributed, with their key phenotypes being presence or absence. Despite their prevalence, estimating the heritability of binomial traits presents both experimental and statistical challenges. Here we develop both an empirical and computational methodology for estimating the narrow-sense heritability of binary traits for highly fecund species. Our experimental approach controls for undesirable culturing effects, while minimizing culture numbers, increasing feasibility in the field. Our statistical approach accounts for known issues with model-selection by using a permutation test to calculate significance values and includes both fitting and power calculation methods. We illustrate our methodology by estimating the narrow-sense heritability for larval settlement, a key life-history trait, in the reef-building coral Orbicella faveolata. The experimental, statistical and computational methods, along with all of the data from this study, were deployed in the R package multiDimBio.

A pooling-based approach to mapping genetic variants associated with DNA methylation

A pooling-based approach to mapping genetic variants associated with DNA methylation

Irene Miriam Kaplow , Julia L MacIsaac , Sarah M Mah , Lisa M McEwen , Michael S Kobor , Hunter B Fraser
doi: http://dx.doi.org/10.1101/013649

DNA methylation is an epigenetic modification that plays a key role in gene regulation. Previous studies have investigated its genetic basis by mapping genetic variants that are associated with DNA methylation at specific sites, but these have been limited to microarrays that cover less than 2% of the genome and cannot account for allele-specific methylation (ASM). Other studies have performed whole-genome bisulfite sequencing on a few individuals, but these lack statistical power to identify variants associated with DNA methylation. We present a novel approach in which bisulfite-treated DNA from many individuals is sequenced together in a single pool, resulting in a truly genome-wide map of DNA methylation. Compared to methods that do not account for ASM, our approach increases statistical power to detect associations while sharply reducing cost, effort, and experimental variability. As a proof of concept, we generated deep sequencing data from a pool of 60 human cell lines; we evaluated almost twice as many CpGs as the largest microarray studies and identified over 2,000 genetic variants associated with DNA methylation. We found that these variants are highly enriched for associations with chromatin accessibility and CTCF binding but are less likely to be associated with traits indirectly linked to DNA, such as gene expression and disease phenotypes. In summary, our approach allows genome-wide mapping of genetic variants associated with DNA methylation in any tissue of any species, without the need for individual-level genotype or methylation data.

A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data

A Statistical Framework to Predict Functional Non-Coding Regions in the Human Genome Through Integrated Analysis of Annotation Data

Qiongshi Lu , Yiming Hu , Jiehuan Sun , Yuwei Cheng , Kei-Hoi Cheung , Hongyu Zhao
doi: http://dx.doi.org/10.1101/018093

Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu

Natural selection defines the cellular complexity

Natural selection defines the cellular complexity

Han Chen , Xionglei He
doi: http://dx.doi.org/10.1101/018069

Current biology is perplexed by the lack of a theoretical framework for understanding the organization principles of the molecular system within a cell. Here we first studied growth rate, one of the seemingly most complex cellular traits, using functional data of yeast single-gene deletion mutants. We observed nearly one thousand expression informative genes (EIGs) whose expression levels are linearly correlated to the trait within an unprecedentedly large functional space. A simple model considering six EIG-formed protein modules revealed a variety of novel mechanistic insights, and also explained ~50% of the variance of cell growth rates measured by Bar-seq technique for over 400 yeast mutants (Pearson’s R = 0.69), a performance comparable to the microarray-based (R = 0.77) or colony-size-based (R = 0.66) experimental approach. We then applied the same strategy to 501 morphological traits of the yeast and achieved successes in most fitness-coupled traits each with hundreds of trait-specific EIGs. Surprisingly, there is no any EIG found for most fitness-uncoupled traits, indicating that they are controlled by super-complex epistases that allow no simple expression-trait correlation. Thus, EIGs are recruited exclusively by natural selection, which builds a rather simple functional architecture for fitness-coupled traits, and the endless complexity of a cell lies primarily in its fitness-uncoupled features.

Capturing heterotachy through multi-gamma site models

Capturing heterotachy through multi-gamma site models

Remco Bouckaert , Peter Lockhart
doi: http://dx.doi.org/10.1101/018101

Most methods for performing a phylogenetic analysis based on sequence alignments of gene data assume that the mechanism of evolution is constant through time. It is recognised that some sites do evolve somewhat faster than others, and this can be captured using a (gamma) rate heterogeneity model. Further, some species have shorter replication times than others, and this results in faster rates of substitution in some lineages. This feature of lineage specific rate variation can be captured to some extent, by using relaxed clock models. However, it is also clear that there are additional poorly characterised features of sequence data that can sometimes lead to extreme differences in lineage specific rates. This variation is poorly captured by constant time reversible substitution models. The significance of extreme lineage specific rate differences is that they lead both to errors in reconstructing evolutionary relationships as well as biased estimates for the age of ancestral nodes. We propose a new model that allows gamma rate heterogeneity to change on branches, thus offering a more realistic model of sequence evolution. It adds negligible computational cost to likelihood calculations. We illustrate its effectiveness with an example of green algae and land-plants. For many real world data sets, we find a much better fit with multi-gamma sites models as well as substantial differences in ancestral node date estimates.