Genetic loci with parent of origin effects cause hybrid seed lethality between Mimulus species

Genetic loci with parent of origin effects cause hybrid seed lethality between Mimulus species
Austin G Garner, Amanda M Kenney, Lila Fishman, Andrea L Sweigart
doi: http://dx.doi.org/10.1101/022863

The classic finding in both flowering plants and mammals that hybrid lethality often depends on parent of origin effects suggests that divergence in the underlying loci might be an important source of hybrid incompatibilities between species. In flowering plants, there is now good evidence from diverse taxa that seed lethality arising from interploidy crosses is often caused by endosperm defects associated with deregulated imprinted genes. A similar seed lethality phenotype occurs in many crosses between closely related diploid species, but the genetic basis of this form of early-acting F1 postzygotic reproductive isolation is largely unknown. Here, we show that F1 hybrid seed lethality is an exceptionally strong isolating barrier between two closely related Mimulus species, M. guttatus and M. tilingii, with reciprocal crosses producing less than 1% viable seeds. Using a powerful crossing design and high-resolution genetic mapping, we identify both maternally- and paternally-derived loci that contribute to hybrid seed incompatibility. Strikingly, these two sets of loci are largely non-overlapping, providing strong evidence that genes with parent of origin effects are the primary driver of F1 hybrid seed lethality between M. guttatus and M. tilingii. We find a highly polygenic basis for both parental components of hybrid seed lethality suggesting that multiple incompatibility loci have accumulated to cause strong postzygotic isolation between these closely related species. Our genetic mapping experiment also reveals hybrid transmission ratio distortion and chromosomal differentiation, two additional correlates of functional and genomic divergence between species.

Early modern human dispersal from Africa: genomic evidence for multiple waves of migration

Early modern human dispersal from Africa: genomic evidence for multiple waves of migration
Francesca Tassi, Silvia Ghirotto, Massimo Mezzavilla, Sibelle Torres Vilaça, Lisa De Santi, Guido Barbujani
doi: http://dx.doi.org/10.1101/022889

Background. Anthropological and genetic data agree in indicating the African continent as the main place of origin for modern human. However, it is unclear whether early modern humans left Africa through a single, major process, dispersing simultaneously over Asia and Europe, or in two main waves, first through the Arab peninsula into Southern Asia and Oceania, and later through a Northern route crossing the Levant. Results. Here we show that accurate genomic estimates of the divergence times between European and African populations are more recent than those between Australo-Melanesia and Africa, and incompatible with the effects of a single dispersal. This difference cannot possibly be accounted for by the effects of hybridization with archaic human forms in Australo-Melanesia. Furthermore, in several populations of Asia we found evidence for relatively recent genetic admixture events, which could have obscured the signatures of the earliest processes. Conclusions. We conclude that the hypothesis of a single major human dispersal from Africa appears hardly compatible with the observed historical and geographical patterns of genome diversity, and that Australo-Melanesian populations seem still to retain a genomic signature of a more ancient divergence from Africa

A comparative study of SVDquartets and other coalescent-based species tree estimation methods

A comparative study of SVDquartets and other coalescent-based species tree estimation methods
Jed Chou, Ashu Gupta, Shashank Yaduvanshi, Ruth Davidson, Mike Nute, Siavash Mirarab, Tandy Warnow
doi: http://dx.doi.org/10.1101/022855

Background: Species tree estimation is challenging in the presence of incomplete lineage sorting (ILS), which can make gene trees different from the species tree. Because ILS is expected to occur and the standard concatenation approach can return incorrect trees with high support in the presence of ILS, “coalescent-based” summary methods (which first estimate gene trees and then combine gene trees into a species tree) have been developed that have theoretical guarantees of robustness to arbitrarily high amounts of ILS. Some studies have suggested that summary methods should only be used on “c-genes” (i.e., recombination-free loci) that can be extremely short (sometimes fewer than 100 sites). However, gene trees estimated on short alignments can have high estimation error, and summary methods tend to have high error on short c-genes. To address this problem, Chifman and Kubatko introduced SVDquartets, a new coalescent-based method. SVDquartets takes multi-locus unlinked single-site data, infers the quartet trees for all subsets of four species, and then combines the set of quartet trees into a species tree using a quartet amalgamation heuristic. Yet, the relative accuracy of SVDquartets to leading coalescent-based methods has not been assessed. Results: We compared SVDquartets to two leading coalescent-based methods (ASTRAL-2 and NJst), and to concatenation using maximum likelihood. We used a collection of simulated datasets, varying ILS levels, numbers of taxa, and number of sites per locus. Although SVDquartets was sometimes more accurate than ASTRAL-2 and NJst, most often the best results were obtained using ASTRAL-2, even on the shortest gene sequence alignments we explored (with only 10 sites per locus). Finally, concatenation was the most accurate of all methods under low ILS conditions. Conclusions: ASTRAL-2 generally had the best accuracy under higher ILS conditions, and concatenation had the best accuracy under the lowest ILS conditions. However, SVDquartets was competitive with the best methods under conditions with low ILS and small numbers of sites per locus. The good performance under many conditions of ASTRAL-2 in comparison to SVDquartets is surprising given the known vulnerability of ASTRAL-2 and similar methods to short gene sequences.

A tree metric using structure and length to capture distinct phylogenetic signals

A tree metric using structure and length to capture distinct phylogenetic signalsMichelle Kendall, Caroline Colijn
Subjects: Populations and Evolution (q-bio.PE)

Phylogenetic trees are a central tool in understanding evolution. They are typically inferred from sequence data, and capture evolutionary relationships through time. It is essential to be able to compare trees from different data sources (e.g. several genes from the same organisms) and different inference methods. We propose a new metric for robust, quantitative comparison of rooted, labeled trees. It enables clear visualizations of tree space, gives meaningful comparisons between trees, and can detect distinct islands of tree topologies in posterior distributions of trees. This makes it possible to select well-supported summary trees. We demonstrate our approach on Dengue fever phylogenies.

Monoallelic methylation and allele specific expression in a social insect

Monoallelic methylation and allele specific expression in a social insect
Kate D Lee, Zoe N Lonsdale, Maria Kyriakidou, Despina Nathanael, Harindra E Amarasinghe, Eamonn B Mallon
doi: http://dx.doi.org/10.1101/022657

Abstract

Social insects are emerging models for epigenetics. Here we examine the link between monoallelic methylation and monoallelic expression in the bumblebee \textit{Bombus terrestris} using whole methylome and transcriptome analysis. We found nineteen genes displaying monoallelic methylation and expression. They were enriched for functions to do with social organisation in the social insects. These are the biological processes predicted to involve imprinting by evolutionary theory.

Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models

Inference of super-exponential human population growth via efficient computation of the site frequency spectrum for generalized models
Feng Gao, Alon Keinan
doi: http://dx.doi.org/10.1101/022574

The site frequency spectrum (SFS) and other genetic summary statistics are at the heart of many population genetics studies. Previous studies have shown that human populations had undergone a recent epoch of fast growth in effective population size. These studies assumed that growth is exponential, and the ensuing models leave unexplained excess amount of extremely rare variants. This suggests that human populations might have experienced a recent growth with speed faster than exponential. Recent studies have introduced a generalized growth model where the growth speed can be faster or slower than exponential. However, only simulation approaches were available for obtaining summary statistics under such models. In this study, we provide expressions to accurately and efficiently evaluate the SFS and other summary statistics under generalized models, which we further implement in a publicly available software. Investigating the power to infer deviation of growth from being exponential, we observed that decent sample sizes facilitate accurate inference, e.g. a sample of 3000 individuals with the amount of data expected from exome sequencing allows observing and accurately estimating growth with speed deviating by 10% or more from that of exponential. Applying our inference framework to data from the NHLBI Exome Sequencing Project, we found that a model with a generalized growth epoch fits the observed SFS significantly better than the equivalent model with exponential growth (p-value = 3.85 × 10-6). The estimated growth speed significantly deviates from exponential (p-value << 10-12), with the best-fit estimate being of growth speed 12% faster than exponential.

Investigating the Evolutionary Importance of Denisovan Introgressions in Papua New Guineans and Australians

Investigating the Evolutionary Importance of Denisovan Introgressions in Papua New Guineans and Australians
Ya Hu, Qiliang Ding, Yi Wang, Shuhua Xu, Yungang He, Minxian Wang, Jiucun Wang, Li Jin
doi: http://dx.doi.org/10.1101/022632

Previous research reported that Papua New Guineans (PNG) and Australians contain introgressions from Denisovans. Here we present a genome-wide analysis of Denisovan introgressions in PNG and Australians. We firstly developed a two-phase method to detect Denisovan introgressions from whole-genome sequencing data. This method has relatively high detection power (79.74%) and low false positive rate (2.44%) based on simulations. Using this method, we identified 1.34 Gb of Denisovan introgressions from sixteen PNG and four Australian genomes, in which we identified 38,877 Denisovan introgressive alleles (DIAs). We found that 78 Denisovan introgressions were under positive selection. Genes located in the 78 introgressions are related to evolutionarily important functions, such as spermatogenesis, fertilization, cold acclimation, circadian rhythm, development of brain, neural tube, face, and olfactory pit, immunity, etc. We also found that 121 DIAs are missense. Genes harboring the 121 missense DIAs are also related to evolutionarily important functions, such as female pregnancy, development of face, lung, heart, skin, nervous system, and male gonad, visual and smell perception, response to heat, pain, hypoxia, and UV, lipid transport, metabolism, blood coagulation, wound healing, aging, etc. Taken together, this study suggests that Denisovan introgressions in PNG and Australians are evolutionarily important, and may help PNG and Australians in local adaptation. In this study, we also proposed a method that could efficiently identify archaic hominin introgressions in modern non-African genomes.

The distribution and impact of common copy-number variation in the genome of the domesticated apple, Malus x domestica Borkh.

The distribution and impact of common copy-number variation in the genome of the domesticated apple, Malus x domestica Borkh.
James Boocock, David David Chagné, Tony R Merriman, Mik Black
doi: http://dx.doi.org/10.1101/021857

Background Copy number variation (CNV) is a common feature of eukaryotic genomes, and a growing body of evidence suggests that genes affected by CNV are enriched in processes that are associated with environmental responses. Here we use next generation sequence (NGS) data to detect copy-number variable regions (CNVRs) within the Malus x domestica genome, as well as to examine their distribution and impact. Methods CNVRs were detected using NGS data derived from 30 accessions of M. x domestica analysed using the read-depth method, as implemented in the CNVrd2 software. To improve the reliability of our results, we developed a quality control and analysis procedure that involved checking for organelle DNA, not repeat masking, and the determination of CNVR identity using a permutation testing procedure. Results Overall, we identified 876 CNVRs, which spanned 3.5% of the apple genome. To verify that detected CNVRs were not artefacts, we analysed the B- allele-frequencies (BAF) within a SNP array dataset derived from a screening of 185 individual apple accessions and found the CNVRs were enriched for SNPs having aberrant BAFs (P < 1e-13, Fisher’s Exact test). Putative CNVRs overlapped 845 gene models and were enriched for resistance (R) genes (P < 1e-22, Fisher’s exact test). Of note is a cluster of resistance genes on chromosome 2 near a region containing multiple major gene loci conferring resistance to apple scab. Conclusion We present the first analysis and catalogue of CNVRs in the M. x domestica genome. The enrichment of the CNVRs with R genes and their overlap with gene loci of agricultural significance draw attention to a form of unexplored genetic variation in apple. This research will underpin further investigation of the role that CNV plays within the apple genome.

Accelerating Scientific Publication in Biology

Accelerating Scientific Publication in Biology
Ronald D Vale
doi: http://dx.doi.org/10.1101/022368

Scientific publications enable results and ideas to be transmitted throughout the scientific community. The number and type of journal publications also have become the primary criteria used in evaluating career advancement. Our analysis suggests that publication practices have changed considerably in the life sciences over the past thirty years. Considerably more experimental data is now required for publication, and the average time required for graduate students to publish their first paper has increased and is approaching the desirable duration of Ph.D. training. Since publication is generally a requirement for career progression, schemes to reduce the time of graduate student and postdoctoral training may be difficult to implement without also considering new mechanisms for accelerating communication of their work. The increasing time to publication also delays potential catalytic effects that ensue when many scientists have access to new information. The time has come for the life scientists, funding agencies, and publishers to discuss how to communicate new findings in a way that best serves the interests of the public and scientific community.

Haplotypes of common SNPs can explain missing heritability of complex diseases

Haplotypes of common SNPs can explain missing heritability of complex diseases
Gaurav Bhatia, Alexander Gusev, Po-Ru Loh, Bjarni J Vilhjálmsson, Stephan Ripke, Shaun Purcell, Eli Stahl, Mark Daly, Teresa R de Candia, Kenneth S Kendler, Michael C O’Donovan, Sang Hong Lee, Naomi R Wray, Benjamin M Neale, Matthew C Keller, Noah A Zaitlen, Bogdan Pasaniuc, Jian Yang, Alkes L Price, Schizophrenia Working Group Psychiatric Genomics C
doi: http://dx.doi.org/10.1101/022418

While genome-wide significant associations generally explain only a small proportion of the narrow-sense heritability of complex disease (h2), recent work has shown that more heritability is explained by all genotyped SNPs (hg2). However, much of the heritability is still missing (hg2 0.1% explained substantially more phenotypic variance (hhap2 = 0.64 (S.E. 0.084)) than genotyped SNPs alone (hg2 = 0.32 (S.E. 0.029)). These estimates were based on cross-cohort comparisons, ensuring that cohort-specific assay artifacts did not contribute to our estimates. In a large multiple sclerosis data set (WTCCC2-MS), we observed an even larger difference between hhap2 and hg2, though data from other cohorts will be required to validate this result. Overall, our results suggest that haplotypes of common SNPs can explain a large fraction of missing heritability of complex disease, shedding light on genetic architecture and informing disease mapping strategies.