Pollen-specific genes accumulate more deleterious mutations than sporophytic genes under relaxed purifying selection in Arabidopsis thaliana.

Pollen-specific genes accumulate more deleterious mutations than sporophytic genes under relaxed purifying selection in Arabidopsis thaliana.

Mark Christian Harrison , Eamonn B Mallon , Dave Twell , Robert L Hammond
doi: http://dx.doi.org/10.1101/016626

The strength of purifying selection varies among loci and leads to differing frequencies of deleterious alleles within genomes. Selection is generally stronger for highly and broadly expressed genes but can be less efficient for diploid expressed, deleterious alleles if heterozygous. In plants expression level, tissue specificity and ploidy level differ between pollen specific and sporophyte specific genes. This may explain why the reported strength and direction of the relationship between selection and the specificity of a gene to either pollen or sporophytic tissues varies between studies and species. In this study, we investigate the individual effects of expression level and tissue specificity on selection efficacy within pollen genes and sporophytic genes of Arabidopsis thaliana. Due to high homozygosity levels caused by selfing, masking is expected to play a lesser role. We find that expression level and tissue specificity independently influence selection in A. thaliana. Furthermore, contrary to expectations, pollen genes are evolving faster due to relaxed purifying selection and have accumulated a higher frequency of deleterious alleles. This suggests that high homozygosity levels resulting from high selfing rates reduce the effects of pollen competition and masking in A. thaliana, so that the high tissue specificity and expression noise of pollen genes are leading to lower selection efficacy compared to sporophyte genes.

Sex chromosome dosage compensation in Heliconius butterflies: global yet still incomplete?

Sex chromosome dosage compensation in Heliconius butterflies: global yet still incomplete?

James R Walters , Thomas J Hardcastle , Chris Jiggins
doi: http://dx.doi.org/10.1101/016675

The evolution of heterogametic sex chromosome is often – but not always – accompanied by the evolution of dosage compensating mechanisms that mitigate the impact of sex-specific gene dosage on levels of gene expression. One emerging view of this process is that such mechanisms may only evolve in male-heterogametic (XY) species but not in female-heterogametic (ZW) species, which will consequently exhibit “incomplete” sex chromosome dosage compensation. However, some recent results from moths suggest that Lepidoptera (moths and butterflies) may prove to be an exception to this prediction. Here we report an analysis of sex chromosome dosage compensation in Heliconius butterflies, sampling multiple individuals for several different adult tissues (head, abdomen, leg, mouth, and antennae). Methodologically, we introduce a novel application of linear mixed-effects models to assess dosage compensation, offering a unified statistical framework that can estimate effects specific to chromosome, to sex, and their interactions (i.e., a dosage effect). Our results show substantially reduced Z-linked expression relative to autosomes in both sexes, as previously observed in bombycoid moths. This observation is consistent with an increasing body of evidence that at least some species of moths and butterflies possess an epigenetic sex chromosome dosage compensating mechanism that operates by reducing Z chromosome expression in males. However, this mechanism appears to be imperfect in Heliconius, resulting in a modest dosage effect that produces an average 5-20% male-bias on the Z chromosome, depending on the tissue. Strong sex chromosome dosage effects have been previously in a pyralid moth. Thus our results reflect a mixture of previous patterns reported for Lepidoptera and bisect the emerging view that female-heterogametic ZW taxa have incomplete dosage compensation because they lack a chromosome-wide epigenetic mechanism mediating sex chromosome dosage compensation. In the case of Heliconius, sex chromosome dosage effects persist apparently despite such a mechanism. We also analyze chromosomal distributions of sex-biased genes and show an excess of male-biased and a dearth of female-biased genes on the Z chromosome relative to autosomes, consistent with predictions of sexually antagonistic evolution.

Introgression obscures and reveals historical relationships among the American live oaks

Introgression obscures and reveals historical relationships among the American live oaks

Deren Eaton , Antonio Gonzalez-Rodriguez , Andrew Hipp , Jeannine Cavender-Bares
doi: http://dx.doi.org/10.1101/016238

Introgressive hybridization challenges the concepts we use to define species and our ability to infer their evolutionary relationships. Methods for inferring historical introgression from the genomes of extant species are now widely used, however, few guidelines have been articulated for how best to interpret their results. Because these tests are inherently comparative, we show that they are sensitivite to the effects of missing data (unsampled species) and to non-independence (hierarchical relationships among species). We demonstrate this using genomic RAD data sampled from populations across the geographic ranges of all extant species in the American live oaks (Quercus series Virentes), a group notorious for hybridization. By considering all species in the clade, and their phylogenetic relationships, we were able to distinguish true hybridizing lineages from those that falsely appear admixed due to phylogenetic structure among hybridizing relatives. Six of seven species show evidence of admixture, often with multiple other species, but which can be explained by hybrid introgression among few related lineages where they occur in close proximity. We identify the Cuban oak as a highly admixed lineage and use an information-theoretic model comparison approach to test alternative scenarios for its origin. Hybrid speciation is a poor fit compared to a model in which a population from Central America colonized Cuba and received subsequent gene flow from Florida. The live oaks form a continuous ring-like distribution around the Gulf of Mexico, connected in Cuba, across which they could effectively exchange alleles. However, introgression appears to remain localized to areas of sympatry, suggesting that oak species boundaries, and their geographic ranges have remained relatively stable over evolutionary time.

Two variance component model improves genetic prediction in family data sets

Two variance component model improves genetic prediction in family data sets

George Tucker , Po-Ru Loh , Iona M MacLeod , Ben J Hayes , Michael E Goddard , Bonnie Berger , Alkes L Price
doi: http://dx.doi.org/10.1101/016618

Genetic prediction based on either identity by state (IBS) sharing or pedigree information has been investigated extensively using Best Linear Unbiased Prediction (BLUP) methods. However, methods to combine IBS sharing and pedigree information for genetic prediction in humans have not been explored. We introduce a two variance component model for genetic prediction: one component for IBS sharing and one for approximate pedigree structure, both estimated using genetic markers. In simulations using real genotypes from CARe and FHS family cohorts, we demonstrate that the two variance component model achieves gains in prediction r2 over standard BLUP at current sample sizes, and we project based on simulations that these gains will continue to hold at larger sample sizes. Accordingly, in analyses of four quantitative phenotypes from CARe and two quantitative phenotypes from FHS, the two variance component model significantly improves prediction r2 in each case, with up to a 16% relative improvement. We also find that standard mixed model association tests can produce inflated test statistics in datasets with related individuals, whereas the two variance component model corrects for inflation.

Calculating the Human Mutation Rate by Using a NUMT from the Early Oligocene

Calculating the Human Mutation Rate by Using a NUMT from the Early Oligocene
Ian Logan
doi: http://dx.doi.org/10.1101/016428

As the number of whole genomes available for study increases, so also does the opportunity to find unsuspected features hidden within our genetic code. One such feature allows for an estimate of the Human Mutation Rate in human chromosomes to be made. A NUMT is a small fragment of the mitochondrial DNA that enters the nucleus of a cell, gets captured by a chromosome and thereafter passed on from generation to generation. Over the millions of years of evolution, this unexpected phenomenon has happened many times. But it is usually very difficult to be able to say just when a NUMT might have been created. However, this paper presents evidence to show that for one particular NUMT the date of formation was around 29 million ago, which places the event in the Early Oligocene; when our ancestors were small monkey-like creatures. So now all of us carry this NUMT in each of our cells as do Old World Monkeys, the Great Apes and our nearest relations, the Chimpanzees. The estimate of the Human Mutation obtained by the method outlined here gives a value which is higher than has been generally found; but this new value perhaps only applies to non-coding regions of the Human genome where there is little, if any, selection pressure against new mutations.

LINKS: Scaffolding genome assemblies with kilobase-long nanopore reads

LINKS: Scaffolding genome assemblies with kilobase-long nanopore reads
Rene L Warren , Benjamin P Vandervalk , Steven JM Jones , Inanc Birol
doi: http://dx.doi.org/10.1101/016519

Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. Established and emerging long read technologies show great promise in this regard, but their current associated higher error rates typically require com-putational base correction and/or additional bioinformatics pre-processing before they could be of value. We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a solution that makes use of the information in error-rich long reads, without the need for read alignment or base correction. We show how the conti-guity of an ABySS E. coli K-12 genome assembly could be in-creased over five-fold by the use of beta-released Oxford Nanopore Ltd. (ONT) long reads and how LINKS leverages long-range infor-mation in S. cerevisiae W303 ONT reads to yield an assembly with less than half the errors of competing applications. Re-scaffolding the colossal white spruce assembly draft (PG29, 20 Gbp) and how LINKS scales to larger genomes is also presented. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts.

Detecting hidden diversification shifts in models of trait-dependent speciation and extinction

Detecting hidden diversification shifts in models of trait-dependent speciation and extinction
Jeremy M Beaulieu , Brian C O’Meara
doi: http://dx.doi.org/10.1101/016386

The distribution of diversity can vary considerably from clade to clade. Attempts to understand these patterns often employ state speciation and extinction models to determine whether the evolution of a particular novel trait has increased speciation rates and/or decreased their extinction rates. It is still unclear, however, whether these models are uncovering important drivers of diversification, or whether they are simply pointing to more complex patterns involving many unmeasured and co-distributed factors. Here we describe an extension to the popular state speciation and extinction models that specifically accounts for the presence of unmeasured factors that could impact diversification rates estimated for the states of any observed trait. Specifically, our model, which we refer to as HiSSE (Hidden State Speciation and Extinction), assumes that related to each observed state in the model are “hidden” states that exhibit potentially distinct diversification dynamics and transition rates than the observed states in isolation. Under rigorous simulation tests and when applied to empirical data, we find that HiSSE performs reasonably well, and can at least detect net diversification rate differences between observed and hidden states. We also discuss the remaining issues with state speciation and extinction models in general, and the important ways in which HiSSE provides a more nuanced understanding of trait-dependent diversification.

Beyond 2/3 and 1/3: the complex signatures of sex-biased admixture on the X chromosome

Beyond 2/3 and 1/3: the complex signatures of sex-biased admixture on the X chromosome
Amy Goldberg , Noah A Rosenberg
doi: http://dx.doi.org/10.1101/016543

Sex-biased demography, in which parameters governing migration and population size differ between females and males, has been studied through comparisons of X chromosomes, which are inherited sex-specifically, and autosomes, which are not. A common form of sex bias in humans is sex-biased admixture, in which at least one of the source populations differs in its proportions of females and males contributing to an admixed population. Studies of sex-biased admixture often examine the mean ancestry for markers on the X chromosome in relation to the autosomes. A simple framework noting that in a population with equally many females and males, 2/3 of X chromosomes appear in females, suggests that the mean X-chromosomal admixture fraction is a linear combination of female and male admixture parameters, with coefficients 2/3 and 1/3, respectively. Extending a mechanistic admixture model to accommodate the X chromosome, we demonstrate that this prediction is not generally true in admixture models, though it holds in the limit for an admixture process occurring as a single event. For a model with constant ongoing admixture, we determine the mean X-chromosomal admixture, comparing admixture on female and male X chromosomes to corresponding autosomal values. Surprisingly, in reanalyzing African-American genetic data to estimate sex-specific contributions from African and European sources, we find that the range of contributions compatible with the excess African ancestry on the X chromosome compared to autosomes has a wide spread, permitting scenarios either without male-biased contributions from Europe or without female-biased contributions from Africa.

The Spatial Mixing of Genomes in Secondary Contact Zones

The Spatial Mixing of Genomes in Secondary Contact Zones
Alisa Sedghifar , Yaniv Brandvain , Peter L. Ralph , Graham Coop
doi: http://dx.doi.org/10.1101/016337

Recent genomic studies have highlighted the important role of admixture in shaping genome-wide patterns of diversity. Past admixture leaves a population genomic signature of linkage disequilibrium (LD), reflecting the mixing of parental chromosomes by segregation and recombination. The extent of this LD can be used to infer the timing of admixture. However, the results of inference can depend strongly on the assumed demographic model. Here, we introduce a theoretical framework for modeling patterns of LD in a geographic contact zone where two differentiated populations are diffusing back together. We derive expressions for the expected LD and admixture tract lengths across geographic space as a function of the age of the contact zone and the dispersal distance of individuals. We develop an approach to infer age of contact zones using population genomic data from multiple spatially sampled populations by fitting our model to the decay of LD with recombination distance. We use our approach to explore the fit of a geographic contact zone model to three human population genomic datasets from populations along the Indonesian archipelago, populations in Central Asia and populations in India.

Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis

Contrasting regional architectures of schizophrenia and other complex diseases using fast variance components analysis
Po-Ru Loh , Gaurav Bhatia , Alexander Gusev , Hilary K Finucane , Brendan K Bulik-Sullivan , Samuela J Pollack , Schizophrenia Working Group Psychiatric Genomics Consortium , Teresa R de Candia , Sang Hong Lee , Naomi R Wray , Kenneth S Kendler , Michael C O’Donovan , Benjamin M Neale , Nick Patterson , Alkes L Price
doi: http://dx.doi.org/10.1101/016527

Heritability analyses of GWAS cohorts have yielded important insights into complex disease architecture, and increasing sample sizes hold the promise of further discoveries. Here, we analyze the genetic architecture of schizophrenia in 49,806 samples from the PGC, and nine complex diseases in 54,734 samples from the GERA cohort. For schizophrenia, we infer an overwhelmingly polygenic disease architecture in which ≥76% of 1Mb genomic regions harbor at least one variant influencing schizophrenia risk. We also observe significant enrichment of heritability in GC-rich regions and in higher-frequency SNPs for both schizophrenia and GERA diseases. In bivariate analyses, we observe significant genetic correlations (ranging from 0.18 to 0.85) for 13 of 36 pairs of GERA diseases; genetic correlations were consistently stronger (1.3x on average) than correlations of overall disease liabilities. To accomplish these analyses, we developed a novel, fast algorithm for multi-component, multi-trait variance components analysis that overcomes prior computational barriers that made such analyses intractable at this scale.