Ancestry specific association mapping in admixed populations

Ancestry specific association mapping in admixed populations

Line Skotte, Thorfinn Sand S Korneliussen, Ida Moltke, Anders Albrechtsen
doi: http://dx.doi.org/10.1101/014001

As recently demonstrated in several genetic association studies, historically small and isolated populations can offer increased statistical power due to extended link- age equilibrium and increased genetic drift over many generations. However, many such populations, like the Greenlandic Inuit population, have recently experienced substantial admixture with other populations, which can complicate the association studies. One important complication is that most current methods for performing association testing are based on the assumption that the effect of the tested ge- netic marker is the same regardless of ancestry. This is a reasonable assumption for a causal variant, but may not hold for the genetic markers that are tested in association studies, which are usually not causal. The effects of non-causal genetic markers depend on how strongly their presence correlate with the presence of the causal marker, and this may vary between ancestral populations because of different linkage disequilibrium patterns and allele frequencies. Motivated by this, we here introduce a new statistical method for association testing in recently admixed populations, where the effect sizes are allowed to depend on the ancestry of the allele.Our method does not rely on accurate inference of local ancestry, yet using simulations we show that in some scenarios it gives a dramatic increase in statistical power to detect associations. In addition, the method allows for testing for difference in effect size between ancestral populations, which can be used to determine if a SNP is causal. We demonstrate the usefulness of the method on data from the Greenlandic population.

Alternative splicing QTLs in European and African populations using Altrans, a novel method for splice junction quantification

Alternative splicing QTLs in European and African populations using Altrans, a novel method for splice junction quantification

Halit Ongen, Emmanouil T Dermitzakis
doi: http://dx.doi.org/10.1101/014126

With the advent of RNA-sequencing technology we now have the power to detect different types of alternative splicing and how DNA variation affects splicing. However, given the short read lengths used in most population based RNA-sequencing experiments, quantifying transcripts accurately remains a challenge. Here we present a novel method, Altrans, for discovery of alternative splicing quantitative trait loci (asQTLs). To assess the performance of Altrans we compared it to Cufflinks, a well-established transcript quantification method. Simulations show that in the presence of transcripts absent from the annotation, Altrans performs better in quantifications than Cufflinks. We have applied Altrans and Cufflinks to the Geuvadis dataset, which comprises samples from European and African populations, and discovered (FDR = 1%) 1806 and 243 asQTLs with Altrans, and 1596 and 288 asQTLs with Cufflinks for Europeans and Africans, respectively. Although Cufflinks results replicated better across the two populations, this likely due to the increased sensitivity of Altrans in detecting harder to detect associations. We show that, by discovering a set of asQTLs in a smaller subset of European samples and replicating these in the remaining larger subset of Europeans, both methods achieve similar replication levels (94% and 98% replication in Altrans and Cufflinks, respectively). We find that method specific asQTLs are largely due to different types of alternative splicing events detected by each method. We overlapped the asQTLs with biochemically active regions of the genome and observed significant enrichments for many functional marks and variants in splicing regions, highlighting the biological relevance of the asQTLs identified. All together, we present a novel approach for discovering asQTLs that is a more direct assessment of splicing compared to other methods and is complementary to other transcript quantification methods.

Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin

Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin

Austin G Meyer, Claus O Wilke
doi: http://dx.doi.org/10.1101/014183

We have carried out a comprehensive analysis of the determinants of human influenza A H3 hemagglutinin evolution, considering three distinct predictors of evolutionary variation at in- dividual sites: solvent accessibility (as a proxy for protein fold stability and/or conservation), experimental epitope sites (as a proxy for host immune bias), and proximity to the receptor- binding region (as a proxy for protein function). We have found that these three predictors individually explain approximately 15% of the variation in site-wise dN/dS. However, the sol- vent accessibility and proximity predictors seem largely independent of each other, while the epitope sites are not. In combination, solvent accessibility and proximity explain 32% of the variation in dN/dS. Incorporating experimental epitope sites into the model adds only an ad- ditional 2 percentage points. We have also found that the historical H3 epitope sites, which date back to the 1980s and 1990s, show only weak overlap with the latest experimental epi- tope data, and we have defined a novel set of four epitope groups which are experimentally supported and cluster in 3D space. Finally, sites with dN/dS > 1, i.e., the sites most likely driving seasonal immune escape, are not correctly predicted by either historical or experimental epitope sites, but only by proximity to the receptor-binding region. In summary, proximity to the receptor-binding region, rather than host immune bias, seems to be the primary determinant of H3 immune-escape evolution.

Integrating crop growth models with whole genome prediction through approximate Bayesian computation

Integrating crop growth models with whole genome prediction through approximate Bayesian computation

Frank Technow, Carlos D. Messina, L. Radu Totir, Mark Cooper
doi: http://dx.doi.org/10.1101/014100

Genomic selection, enabled by whole genome prediction (WGP) methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (GxE), continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs) attempt to represent the functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of GxE and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC), a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with a simulated data set. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising novel approach to improving prediction accuracy for some of the most challenging scenarios of interest to applied geneticists.

Empirical determinants of adaptive mutations in yeast experimental evolution

Empirical determinants of adaptive mutations in yeast experimental evolution

Celia Payen, Anna B Sunshine, Giang T Ong, Jamie L Pogachar, Wei Zhao, Maitreya J Dunham
doi: http://dx.doi.org/10.1101/014068

High-throughput sequencing technologies have enabled expansion of the scope of genetic screens to identify mutations that underlie quantitative phenotypes, such as fitness improvements that occur during the course of experimental evolution. This new capability has allowed us to describe the relationship between fitness and genotype at a level never possible before, and ask deeper questions, such as how genome structure, available mutation spectrum, and other factors drive evolution. Here we combined functional genomics and experimental evolution to first map on a genome scale the distribution of potential beneficial mutations available as a first step to an evolving population and then compare these to the mutations actually observed in order to define the constraints acting upon evolution. We first constructed a single-step fitness landscape for the yeast genome by using barcoded gene deletion and overexpression collections, competitive growth in continuous culture, and barcode sequencing. By quantifying the relative fitness effects of thousands of single-gene amplifications or deletions simultaneously we revealed the presence of hundreds of accessible evolutionary paths. To determine the actual mutation spectrum used in evolution, we built a catalog of >1000 mutations selected during experimental evolution. By combining both datasets, we were able to ask how and why evolution is constrained. We identified adaptive mutations in laboratory evolved populations, derived mutational signatures in a variety of conditions and ploidy states, and determined that half of the mutations accumulated positively affect cellular fitness. We also uncovered hundreds of potential beneficial mutations never observed in the mutational spectrum derived from the experimental evolution catalog and found that those adaptive mutations become accessible in the absence of the dominant adaptive solution. This comprehensive functional screen explored the set of potential adaptive mutations on one genetic background, and allows us for the first time at this scale to compare the mutational path with the actual, spontaneously derived spectrum of mutations.

Feller’s Contributions to Mathematical Biology

Feller’s Contributions to Mathematical Biology

Ellen Baake, Anton Wakolbinger
(Submitted on 21 Jan 2015)

This is a review of William Feller’s important contributions to mathematical biology. The seminal paper [Feller1951] “Diffusion processes in genetics” was particularly influential on the development of stochastic processes at the interface to evolutionary biology, and interesting ideas in this direction (including a first characterization of what is nowadays known as “Feller’s branching diffusion”) already shaped up in the paper [Feller 1939] (written in German) “The foundations of a probabistic treatment of Volterra’s theory of the struggle for life”. Feller’s article “On fitness and the cost of natural selection” [Feller 1967] contains a critical analysis of the concept of “genetic load”.

Approximate statistical alignment by iterative sampling of substitution matrices

Approximate statistical alignment by iterative sampling of substitution matrices

Joseph L. Herman, Adrienn Szabó, Instván Miklós, Jotun Hein
(Submitted on 19 Jan 2015)

We outline a procedure for jointly sampling substitution matrices and multiple sequence alignments, according to an approximate posterior distribution, using an MCMC-based algorithm. This procedure provides an efficient and simple method by which to generate alternative alignments according to their expected accuracy, and allows appropriate parameters for substitution matrices to be selected in an automated fashion. In the cases considered here, the sampled alignments with the highest likelihood have an accuracy consistently higher than alignments generated using the standard BLOSUM62 matrix.

Musings on the theory that variation in cancer risk among tissues can be explained by the number of divisions of normal stem cells

Musings on the theory that variation in cancer risk among tissues can be explained by the number of divisions of normal stem cells

Cristian Tomasetti, Bert Vogelstein
(Submitted on 21 Jan 2015)

This manuscript has been written to address questions related to our recent publication (Science 347:78-81, 2015). We appreciate the many reactions to this paper that have been communicated to us, either privately or publicly. The following addresses several of the most important statistical and technical issues related to our analysis and conclusions. Our responses to non-technical questions are available at this http URL

Mutation detection in candidate genes for parauberculosis resistance in sheep

Mutation detection in candidate genes for parauberculosis resistance in sheep

Bianca Moioli, Luigi De Grossi, Roberto Steri, Silvia D’Andrea, Fabio Pilla
doi: http://dx.doi.org/10.1101/014035

The marker-assisted selection exploits anonymous genetic markers that have been associated with measurable differences on complex traits; because it is based on the Linkage Disequilibrium between the polymorphic markers and the polymorphisms which code for the trait, its success is limited to the population in which the association has been assessed. The identification of the gene with effect on the target and the detection of the functional mutations will allow selection in independent populations, while encouraging studies on gene expression. The results of a genome-wide scan performed with the Illumina Ovine SNP50K Beadchip, on 100 sheep, 50 of which positive at paratuberculosis serological assessment, identified two candidate genes of immunity response, the PCP4 and the CD109, located in proximity of the markers with different allele frequency in positive and negative sheep. The coding region of the two genes was directly sequenced: three missense mutations were detected: two in the PCP4 gene and one in the second exon of the CD109 gene. The PCP4 mutations had a very low frequency (.12 and .07) so making hazardous to hypothesize their direct effect on immune response. On the contrary, the mutation detected in the CD109 gene showed a strong linkage disequilibrium with the anonymous marker. Direct sequencing of the DNA of sheep of different populations showed that disequilibrium was maintained. Allele frequency at the hypothesized marker associated to immune response, calculated for other breeds of sheep, showed that the marker allele potentially associated to disease resistance is more frequent in the local breeds and in breeds that have not been submitted to selection programs.

The genetics of resistance to Morinda fruit toxin during the postembryonic stages in Drosophila sechellia

The genetics of resistance to Morinda fruit toxin during the postembryonic stages in Drosophila sechellia

Yan Huang, Deniz Erezyilmaz
doi: http://dx.doi.org/10.1101/014027

Many phytophagous insect species are ecologic specialists that have adapted to utilize a single host plant. Drosophila sechellia is a specialist that utilizes the ripe fruit of Morinda citrifolia, which is toxic to its sibling species, D. simulans. Here we apply multiplexed shotgun genotyping and QTL analysis to examine the genetic basis of resistance to M. citrifolia fruit toxin in interspecific hybrids. We find that at least four dominant and four recessive loci interact additively to confer resistance to the M. citrifolia fruit toxin. These QTL include a dominant locus of large effect on the third chromosome (QTL-IIIsima) that was not detected in previous analyses. The small-effect loci that we identify overlap with regions that were identified in selection experiments with D. simulans on octanoic acid and in QTL analyses of adult resistance to octanoic acid. Our high-resolution analysis sheds new light upon the complexity of M. citrifolia resistance, and suggests that partial resistance to lower levels of M. citrifolia toxin could be passed through introgression from D. sechellia to D. simulans in nature. The identification of a locus of major effect, QTL-IIIsima, is an important step towards identifying the molecular basis of host plant specialization by D. sechellia.