Bayesian Inference of Divergence Times and Feeding Evolution in Grey Mullets (Mugilidae)

Bayesian Inference of Divergence Times and Feeding Evolution in Grey Mullets (Mugilidae)
Francesco Santini , Michael R. May , Giorgio Carnevale , Brian R. Moore
doi: http://dx.doi.org/10.1101/019075

Grey mullets (Mugilidae, Ovalentariae) are coastal fishes found in near-shore environments of tropical, subtropical, and temperate regions within marine, brackish, and freshwater habitats throughout the world. This group is noteworthy both for the highly conserved morphology of its members—which complicates species identification and delimitation—and also for the uncommon herbivorous or detritivorous diet of most mullets. In this study, we first attempt to identify the number of mullet species, and then—for the resulting species—estimate a densely sampled time-calibrated phylogeny using three mitochondrial gene regions and three fossil calibrations. Our results identify two major subgroups of mullets that diverged in the Paleocene/Early Eocene, followed by an Eocene/Oligocene radiation across both tropical and subtropical habitats. We use this phylogeny to explore the evolution of feeding preference in mullets, which indicates multiple independent origins of both herbivorous and detritivorous diets within this group. We also explore correlations between feeding preference and other variables, including body size, habitat (marine, brackish, or freshwater), and geographic distribution (tropical, subtropical, or temperate). Our analyses reveal: (1) a positive correlation between trophic index and habitat (with herbivorous and/or detritivorous species predominantly occurring in marine habitats); (2) a negative correlation between trophic index and geographic distribution (with herbivorous species occurring predominantly in subtropical and temperate regions), and; (3) a negative correlation between body size and geographic distribution (with larger species occurring predominantly in subtropical and temperate regions).

GWGGI: software for genome-wide gene-gene interaction analysis

GWGGI: software for genome-wide gene-gene interaction analysis
Changshuai Wei, Qing Lu
Journal-ref: BMC Genetics 2014, 15:101
Subjects: Quantitative Methods (q-bio.QM); Data Structures and Algorithms (cs.DS); Genomics (q-bio.GN); Applications (stat.AP)

Background: While the importance of gene-gene interactions in human diseases has been well recognized, identifying them has been a great challenge, especially through association studies with millions of genetic markers and thousands of individuals. Computationally efficient and powerful tools are in great need for the identification of new gene-gene interactions in high-dimensional association studies. Result: We develop C++ software for genome-wide gene-gene interaction analyses (GWGGI). GWGGI utilizes tree-based algorithms to search a large number of genetic markers for a disease-associated joint association with the consideration of high-order interactions, and then uses non-parametric statistics to test the joint association. The package includes two functions, likelihood ratio Mann-whitney (LRMW) and Tree Assembling Mann-whitney (TAMW).We optimize the data storage and computational efficiency of the software, making it feasible to run the genome-wide analysis on a personal computer. The use of GWGGI was demonstrated by using two real data-sets with nearly 500 k genetic markers. Conclusion: Through the empirical study, we demonstrated that the genome-wide gene-gene interaction analysis using GWGGI could be accomplished within a reasonable time on a personal computer (i.e., ~3.5 hours for LRMW and ~10 hours for TAMW). We also showed that LRMW was suitable to detect interaction among a small number of genetic variants with moderate-to-strong marginal effect, while TAMW was useful to detect interaction among a larger number of low-marginal-effect genetic variants.

Trees Assembling Mann Whitney Approach for Detecting Genome-wide Joint Association among Low Marginal Effect loci

Trees Assembling Mann Whitney Approach for Detecting Genome-wide Joint Association among Low Marginal Effect loci
Changshuai Wei, Daniel J. Schaid, Qing Lu
Journal-ref: Genet Epidemiol. 2013 Jan;37(1):84-91
Subjects: Quantitative Methods (q-bio.QM); Computation (stat.CO); Machine Learning (stat.ML)

Common complex diseases are likely influenced by the interplay of hundreds, or even thousands, of genetic variants. Converging evidence shows that genetic variants with low marginal effects (LME) play an important role in disease development. Despite their potential significance, discovering LME genetic variants and assessing their joint association on high dimensional data (e.g., genome wide association studies) remain a great challenge. To facilitate joint association analysis among a large ensemble of LME genetic variants, we proposed a computationally efficient and powerful approach, which we call Trees Assembling Mann whitney (TAMW). Through simulation studies and an empirical data application, we found that TAMW outperformed multifactor dimensionality reduction (MDR) and the likelihood ratio based Mann whitney approach (LRMW) when the underlying complex disease involves multiple LME loci and their interactions. For instance, in a simulation with 20 interacting LME loci, TAMW attained a higher power (power=0.931) than both MDR (power=0.599) and LRMW (power=0.704). In an empirical study of 29 known Crohn’s disease (CD) loci, TAMW also identified a stronger joint association with CD than those detected by MDR and LRMW. Finally, we applied TAMW to Wellcome Trust CD GWAS to conduct a genome wide analysis. The analysis of 459K single nucleotide polymorphisms was completed in 40 hours using parallel computing, and revealed a joint association predisposing to CD (p-value=2.763e-19). Further analysis of the newly discovered association suggested that 13 genes, such as ATG16L1 and LACC1, may play an important role in CD pathophysiological and etiological processes.

Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data

Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data
Jean-Philippe Fortin , Kasper D Hansen
doi: http://dx.doi.org/10.1101/019000

Analysis of Hi-C data has shown that the genome can be divided into two compartments called A/B compartments. These compartments are cell-type specific and are associated with open and closed chromatin. We show that A/B compartments can be reliably estimated using epigenetic data from two different platforms, the Illumina 450k DNA methylation microarray and DNase hypersensitivity sequencing. We do this by exploiting the fact that the structure of long range correlations differs between open and closed compartments. This work makes A/B compartments readily available in a wide variety of cell types, including many human cancers.

Negative Niche Construction Favors the Evolution of Cooperation

Negative Niche Construction Favors the Evolution of Cooperation
Brian D Connelly , Katherine J Dickinson , Sarah P Hammarlund , Benjamin Kerr
doi: http://dx.doi.org/10.1101/018994

By benefitting others at a cost to themselves, cooperators face an ever present threat from defectors—individuals that avail themselves of the cooperative benefit without contributing. A longstanding challenge to evolutionary biology is to understand the mechanisms that support the many instances of cooperation that nevertheless exist. Hammarlund et al. recently demonstrated that cooperation can persist by hitchhiking along with beneficial non-social adaptations. Importantly, cooperators play an active role in this process. In spatially-structured environments, clustered cooperator populations reach greater densities, which creates more mutational opportunities to gain beneficial non-social adaptations. Cooperation rises in abundance by association with these adaptations. However, once adaptive opportunities have been exhausted, the ride abruptly ends as cooperators are displaced by adapted defectors. Using an agent-based model, we demonstrate that the selective feedback that is created as populations construct their local niches can maintain cooperation indefinitely. This cooperator success depends specifically on negative niche construction, which acts as a perpetual source of adaptive opportunities. As populations adapt, they alter their environment in ways that reveal additional opportunities for adaptation. Despite being independent of niche construction in our model, cooperation feeds this cycle. By reaching larger densities, populations of cooperators are better able to adapt to changes in their constructed niche and successfully respond to the constant threat posed by defectors. We relate these findings to previous studies from the niche construction literature and discuss how this model could be extended to provide a greater understanding of how cooperation evolves in the complex environments in which it is found.

Sequencing of 15,622 gene-bearing BACs reveals new features of the barley genome

Sequencing of 15,622 gene-bearing BACs reveals new features of the barley genome
María Muñoz-Amatriaín , Stefano Lonardi , MingCheng Luo , Kavitha Madishetty , Jan Svensson , Matthew Moscou , Steve Wanamaker , Tao Jiang , Andris Kleinhofs , Gary Muehlbauer , Roger Wise , Nils Stein , Yaqin Ma , Edmundo Rodriguez , Dave Kudrna , Prasanna R Bhat , Shiaoman Chao , Pascal Condamine , Shane Heinen , Josh Resnik , Rod Wing , Heather N Witt , Matthew Alpert , Marco Beccuti , Serdar Bozdag , Francesca Cordero , Hamid Mirebrahim , Rachid Ounit , Yonghui Wu , Frank You , Jie Zheng , Hana Šimková , Jaroslav Doležel , Jane Grimwood , Jeremy Schmutz , Denisa Duma , Lothar Altschmied , Tom Blake , Phil Bregitzer , Laurel Cooper , Muharrem Dilbirligi , Anders Falk , Leila Feiz , Andreas Graner , Perry Gustafson , Patrick Hayes , Peggy Lemaux , Jafar Mammadov , Timothy Close
doi: http://dx.doi.org/10.1101/018978

Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, since only 6,278 BACs in the physical map were sequenced, detailed fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15,622 BACs representing the minimal tiling path of 72,052 physical mapped gene-bearing BACs. This generated about 1.7 Gb of genomic sequence containing 17,386 annotated barley genes. Exploration of the sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high rates of recombination, there are also gene-dense regions with suppressed recombination. Knowledge of these deviant regions is relevant to trait introgression, genome-wide association studies, genomic selection model development and map-based cloning strategies. Sequences and their gene and SNP annotations can be accessed and exported via http://harvest-web.org/hweb/utilmenu.wc or through the software HarvEST:Barley (download from harvest.ucr.edu). In the latter, we have implemented a synteny viewer between barley and Aegilops tauschii to aid in comparative genome analysis.

Theoretical consequences of the Mutagenic Chain Reaction for manipulating natural populations

Theoretical consequences of the Mutagenic Chain Reaction for manipulating natural populations
Robert Unckless , Philipp Messer , Andrew Clark
doi: http://dx.doi.org/10.1101/018986

The use of recombinant genetic technologies for population manipulation has mostly remained an abstract idea due to the lack of a suitable means to drive novel gene constructs to high frequency in populations. Recently Gantz and Bier showed that the use of CRISPR/Cas9 technology could provide an artificial drive mechanism, the so-called Mutagenic Chain Reaction (MCR), which could lead to rapid fixation of even a deleterious introduced allele. We establish the equivalence of this system to models of meiotic drive and review the results of simple models showing that, when there is a fitness cost to the MCR allele, an internal equilibrium exists that is usually unstable. Introductions must be at a frequency above this critical point for the successful invasion of the MCR allele. These modeling results have important implications for application of MCR in natural populations.

A Chronological Atlas of Natural Selection in the Human Genome during the Past Half-million Years

A Chronological Atlas of Natural Selection in the Human Genome during the Past Half-million Years
Hang Zhou , Sile Hu , Rostislav Matveev , Qianhui Yu , Jing Li , Philipp Khaitovich , Li Jin , Michael Lachmann , Mark Stoneking , Qiaomei Fu , Kun Tang
doi: http://dx.doi.org/10.1101/018929

The spatiotemporal distribution of recent human adaptation is a long standing question. We developed a new coalescent-based method that collectively assigned human genome regions to modes of neutrality or to positive, negative, or balancing selection. Most importantly, the selection times were estimated for all positive selection signals, which ranged over the last half million years, penetrating the emergence of anatomically modern human (AMH). These selection time estimates were further supported by analyses of the genome sequences from three ancient AMHs and the Neanderthals. A series of brain function-related genes were found to carry signals of ancient selective sweeps, which may have defined the evolution of cognitive abilities either before Neanderthal divergence or during the emergence of AMH. Particularly, signals of brain evolution in AMH are strongly related to Alzheimer’s disease pathways. In conclusion, this study reports a chronological atlas of natural selection in Human.

Driven to Extinction: On the Probability of Evolutionary Rescue from Sex-Ratio Meiotic Drive

Driven to Extinction: On the Probability of Evolutionary Rescue from Sex-Ratio Meiotic Drive
Robert Unckless , Andrew Clark
doi: http://dx.doi.org/10.1101/018820

Many evolutionary processes result in sufficiently low mean fitness that they pose a risk of species extinction. Sex-ratio meiotic drive was recognized by W.D. Hamilton (1967) to pose such a risk, because as the driving sex chromosome becomes common, the opposite sex becomes rare. We expand on Hamilton’s classic model by allowing for the escape from extinction due to evolution of suppressors of X and Y drivers. We explore differences in the two systems in their probability of escape from extinction. Several novel conclusions are evident, including a) that extinction time scales approximately with the log of population size so that even large populations may go extinct quickly, b) extinction risk is driven by the relationship between female fecundity and drive strength, c) anisogamy and the fact that X and Y drive result in sex ratios skewed in opposite directions, mean systems with Y drive are much more likely to go extinct than those with X drive, and d) suppressors are most likely to become established when the strength of drive is intermediate, since weak drive leads to weak selection for suppression and strong drive leads to rapid extinction.

Author post: Rapid antibiotic resistance predictions from genome sequence data for S. aureus and M. tuberculosis

This guest post is by Zamin Iqbal [@ZaminIqbal] and Phelim Bradley [@Phelimb]

Our paper “Rapid antibiotic resistance predictions from genome sequence data for S. aureus and M. tuberculosis” has just appeared on the Biorxiv. We’re excited about it for a number of reasons.

The idea of using a graph of genetic variation as a reference, instead of a linear genome, has been discussed for some while, and in fact a previous biorxiv preprint of ours applying them to the MHC has just come out in Nature Genetics:
http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3257.html

In this paper we apply those ideas to bacteria, where we let go of the linear coordinate system in order to handle plasmid-mediated genes. Our idea is simple – we want to see if genomic data can be used to predict antibiotic resistance in bacteria, and we explicitly want to build a general framework that will extend to many species, and handle mixed infections.

The paper does not deal with the issue of discovering mechanisms/mutations/genes which drive drug resistance – we take a set of geno-pheno rules as prerequisite, and then use a graph of resistance mutations and genes on different genetic backgrounds to detect presence of alleles and compare statistical models – is the population clonal susceptible, minor resistant or major resistant? Although it is accepted that minor alleles can sweep to fixation, in general there is neither consensus nor quantitative data on the correlation between allele frequency and in vitro phenotypic resistance or patient outcome (the latter obviously being much harder). At a practical level,in some cases a clinician might avoid a drug if they knew there was a 5%-frequency resistance allele, and in others they might increase the dose. Resistance is of course a quantitative trait, often measured in terms of the minimum concentration of a drug required to stop growth of a fixed inoculum – but commonly a threshold is drawn and samples are classified in a binary fashion.

A paper last year from some of us (http://jcm.asm.org/content/52/4/1182.full) showed that a simple panel of SNPs and genes was enough to predict resistance with high sensitivity and specificity for S. aureus (where SNPs, indels, chromosomal genes and plasmid-mediated genes can all cause resistance) – once you discard all samples with any mixed strains. (Standard process is to take a patient sample and culture “overnight” (12-24 hours), thus removing almost all diversity and samples which show any morphological signs of diversity after culture are discarded or subcultured). By contrast, for M. tuberculosis (which causes TB), known resistance mutations explain a relatively low proportion of phenotypic resistance (~85%) for first-line drugs, and even less for 2nd line (I explain below what 1st/2nd line are). The Mtb population within-host is highly structured and multiple genotypes can evolve in different loci within the body, so it’s important to be able to deal with mixtures. Typical phenotyping relies on several weeks of solid culture (Mtb is slow growing), but mixtures are more able to survive this type of culture than in the case of S. aureus.

We show with simulations that we can use the graph to detect low frequency mutations and genes (no surprise), and that for S. aureus we make no minor calls for our validation set of ~500 blood-cultured samples (no surprise). Each sample is phenotyped with 2 standard lab methods, and where they disagree a higher quality test is used to arbitrate. This consensus allows us to estimate error rates both for our method (called Mykrobe predictor) and for the phenotypic tests. As a result we’re able to show not only that we do comparably with FDA requirements for a diagnostic, but also that we match or beat the common phenotypic tests.

On the other hand for TB, the story is much more complex and interesting. We analyse ~3500 genomes in total, split into ~1900 training samples and ~1600 for validation. For M. tuberculosis, a sample is classed as resistant if after some weeks of culturing under drug pressure, the number of surviving colonies is >1% of the number of colonies from a control strain treated identically – the number 1% is of course arbitrary (set down by Canetti in the 1960s I think), though it has been shown that phenotypic resistance does correlate with worse patient outcome. Sequencing on the other hand is done before the drug pressure, so we are fundamentally testing a different population, and we can’t simply mirror that 1% allele frequency expectation. This is what we use the 1900 training samples for – determining what frequency to set for our minor-resistant model. We ended up using 20%, and also found that there
was an appreciable amount of lower frequency resistance, which did not survive the 6-week drug-pressure susceptibility test, but which might cause resistance in a patient.

Mtb infections can last a long time, and despite their slow growth, the sheer number of bacilli in a host result in a vast in-host diversity. As a result, mono therapies fail, as resistant strains sweep to fixation – standard treatment is therefore with 4 “first-line” drugs, reducing the chance that any strain has enough mutations to resist them all. If the first-line drugs fail, or if the strain is known to be resistant, then it is necessary to fall back to more toxic and less effective second-line drugs. We found, somewhat to our surprise, that

1. Overall, minor alleles contribute very little to phenotypic resistance in first-line drugs, but they do make a significant contribution to second-line drugs, improving predictive power by >15%. This matches previous reports that patient samples had mixed R and S alleles for 2nd line drugs. This could have major public health consequences, as resistance to these drugs needs to be detected to distinguish MDR-TB (resistant to isoniazid, rifampicin) from XDR-TB (isoniazid, rifampicin + second-line), a major concern for the WHO.

2. Interestingly, a noticeable number of rifampicin false-positive calls were due to SNPs which confer resistance but have been shown to slow growth. Since the phenotyping test is intrinsically a measure of relative growth, these strains may be misclassified as susceptible – i.e. these are probably false-susceptible calls due to an artefact of the nature of the test. This has been reported before by the way.

Anyway – please check out the paper for details. We think this large-scale analysis of whether minor alleles contribute to in vitro phenotype, and whether they should be used for prediction is new and interesting both scientifically and in terms of translation. The bigger question is what the consequences are for patient outcome, and how to deal with in-host diversity, and for that we of course need data collection and sharing. We’ve spent a lot of time in the Oxford John Radcliffe Hospital working with clinicians, and trying to determine what information they really need from this kind of predictive test, and we’ve produced both Windows/Mac apps with very simple user-interfaces (drag-the-fastq on, and let it run) for them to use; we’ve also produced an Illumina Basespace app, currently submitted to Illumina for approval, which should enable automated cloud-use.

Our paper also has a whole bunch of work I’ve not mentioned here, where we needed to identify species, and detect contaminants – most interesting when common contaminants can contain the same resistance gene as the species under test.

Our software is up on github here
https://github.com/iqbal-lab/Mykrobe-predictor
including some desktop apps and example fastq files so you can test it.

Comments very welcome!

Zam and Phelim

PS By the way, the 4 first-line drugs have different effectiveness in different body compartments – see this interesting paper for the modelling of the consequences: http://biorxiv.org/content/early/2014/12/19/013003.