Patterns of positive selection in seven ant genomes


Patterns of positive selection in seven ant genomes

Julien Roux, Eyal Privman, Sebastien Moretti, Josephine T. Daub, Marc Robinson-Rechavi, Laurent Keller
(Submitted on 19 Nov 2013)

The evolution of ant species is marked by remarkable adaptations that allowed the development of very complex social systems. To identify how ant-specific adaptations are associated with specific patterns of molecular evolution we searched for signs of positive selection on amino-acid changes in proteins during the evolution of the ant lineage. We identified 24 functional categories of genes which were enriched for positively selected genes in the ant lineage. We also reanalyzed genome-wide dataset in bees and flies with the same methodology to check if genes under positive selection in ants were also under positive selection in the other analyzed lineages. Notably, genes implicated in immunity were enriched for positively selected genes in the three lineages, ruling out the hypothesis that the evolution of hygienic behaviors in social insects caused a major relaxation of selective pressure on this set of genes. Our scan also indicated that genes implicated in neurogenesis and olfaction started to undergo increased positive selection before the evolution of sociality in Hymenoptera, although it is assumed that the main challenges of the olfactory and neural systems in this lineage occurred with the evolution of social living. Finally, the comparison between these three lineages allowed us to pinpoint molecular evolution patterns that were specific to the ant lineage. In particular, there was relaxed selective pressure for genes related to metabolism in ants but not in bees and flies, possibly reflecting the loss of flight in ant workers. By contrast, there was recurrent positive selection on genes with mitochondrial functions specifically in ants, suggesting that the activity of mitochondria was improved during ant evolution. This might have been an important step toward the evolution of extreme lifespan that is a hallmark of this lineage.

Joint analysis of functional genomic data and genome-wide association studies of 18 human traits

Joint analysis of functional genomic data and genome-wide association studies of 18 human traits
Joseph Pickrell

Annotations of gene structures and regulatory elements can inform genome-wide association studies (GWAS). However, choosing the relevant annotations for interpreting an association study of a given trait remains challenging. We describe a statistical model that uses association statistics computed across the genome to identify classes of genomic element that are enriched or depleted for loci that influence a trait. The model naturally incorporates multiple types of annotations. We applied the model to GWAS of 18 human traits, including red blood cell traits, platelet traits, glucose levels, lipid levels, height, BMI, and Crohn’s disease. For each trait, we evaluated the relevance of 450 different genomic annotations, including protein-coding genes, enhancers, and DNase-I hypersensitive sites in over a hundred tissues and cell lines. We show that the fraction of phenotype-associated SNPs that influence protein sequence ranges from around 2% (for platelet volume) up to around 20% (for LDL cholesterol); that repressed chromatin is significantly depleted for SNPs associated with several traits; and that cell type-specific DNase-I hypersensitive sites are enriched for SNPs associated with several traits (for example, fibroblasts in Crohn’s disease and muscle tissue in bone density). Finally, by re-weighting each GWAS using information from functional genomics, we increase the number of loci with high-confidence associations by around 5%.

The evolution of sex differences in disease genetics

The evolution of sex differences in disease genetics
William P Gilks, Jessica K Abbott, Edward H Morrow
There are significant differences in the biology of males and females, ranging from biochemical pathways to behavioural responses, which are relevant to modern medicine. Broad-sense heritability estimates differ between the sexes for many common medical disorders, indicating that genetic architecture can be sex-dependent. Recent genome-wide association studies (GWAS) have successfully identified sex-specific and sex-biased effects, where in addition to sex-specific effects on gene expression, twenty-two medical traits have sex-specific or sex-biased loci. Sex-specific genetic architecture of complex traits is also extensively documented in model organisms using genome-wide linkage or association mapping, and in gene disruption studies. The evolutionary origins of sex-specific genetic architecture and sexual dimorphism lie in the fact that males and females share most of their genetic variation yet experience different selection pressures. At the extreme is sexual antagonism, where selection on an allele acts in opposite directions between the sexes. Sexual antagonism has been repeatedly identified via a number of experimental methods in a range of different taxa. Although the molecular basis remains to be identified, mathematical models predict the maintenance of deleterious variants that experience selection in a sex-dependent manner. There are multiple mechanisms by which sexual antagonism and alleles under sex-differential selection could contribute toward the genetics of common, complex disorders. The evidence we review clearly indicates that further research into sex-dependent selection and the sex-specific genetic architecture of diseases would be rewarding. This would be aided by studies of laboratory and wild animal populations, and by modelling sex-specific effects in genome-wide association data with joint, gene-by-sex interaction tests. We predict that even sexually monomorphic diseases may harbour cryptic sex-specific genetic architecture. Furthermore, empirical evidence suggests that investigating sex-dependent epistasis may be especially rewarding. Finally, the prevalent nature of sex-specific genetic architecture in disease offers scope for the development of more effective, sex-specific therapies.

On the optimal trimming of high-throughput mRNA sequence data

On the optimal trimming of high-throughput mRNA sequence data
Matthew D MacManes

The widespread and rapid adoption of high-throughput sequencing technologies has changed the face of modern studies of evolutionary genetics. Indeed, newer sequencing technologies, like Illumina sequencing, have afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score < 2 or < 5, is optimal for most studies across a wide variety of metrics.

The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana

The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana
Erik Wijnker, Geo Velikkakam James, Jia Ding, Frank Becker, Jonas R. Klasen, Vimal Rawat, Beth A. Rowan, Daniel F. de Jong, C. Bastiaan de Snoo, Luis Zapata, Bruno Huettel, Hans de Jong, Stephan Ossowski, Detlef Weigel, Maarten Koornneef, Joost J.B. Keurentjes, Korbinian Schneeberger
(Submitted on 13 Nov 2013)

Knowledge of the exact distribution of meiotic crossovers (COs) and gene conversions (GCs) is essential for understanding many aspects of population genetics and evolution, from haplotype structure and long-distance genetic linkage to the generation of new allelic variants of genes. To this end, we resequenced the four products of 13 meiotic tetrads along with 10 doubled haploids derived from Arabidopsis thaliana hybrids. GC detection through short reads has previously been confounded by genomic rearrangements. Rigid filtering for misaligned reads allowed GC identification at high accuracy and revealed an ~80-kb transposition, which undergoes copy-number changes mediated by meiotic recombination. Non-crossover associated GCs were extremely rare most likely due to their short average length of ~25-50 bp, which is significantly shorter than the length of CO associated GCs. Overall, recombination preferentially targeted non-methylated nucleosome-free regions at gene promoters, which showed significant enrichment of two sequence motifs.

Drosophila embryogenesis scales uniformly across temperature and developmentally diverse species

Drosophila embryogenesis scales uniformly across temperature and developmentally diverse species
Steven Gregory Kuntz, Michael B Eisen
Temperature affects both the timing and outcome of animal development, but the detailed effects of temperature on the progress of early development have been poorly characterized. To determine the impact of temperature on the order and timing of events during Drosophila melanogaster embryogenesis, we used time-lapse imaging to track the progress of embryos from shortly after egg laying through hatching at seven precisely maintained temperatures between 17.5°C and 32.5°C. We employed a combination of automated and manual annotation to determine when 36 milestones occurred in each embryo. D. melanogaster embryogenesis takes 33 hours at 17.5°C, and accelerates with increasing temperature to a low of 16 hours at 27.5°C, above which embryogenesis slows slightly. Remarkably, while the total time of embryogenesis varies over two fold, the relative timing of events from cellularization through hatching is constant across temperatures. To further explore the relationship between temperature and embryogenesis, we expanded our analysis to cover ten additional Drosophila species of varying climatic origins. Six of these species, like D. melanogaster, are of tropical origin, and embryogenesis time at different temperatures was similar for them all. D. mojavensis, a sub-tropical fly, develops slower than the tropical species at lower temperatures, while D. virilis, a temperate fly, exhibits slower development at all temperatures. The alpine sister species D. persimilis and D. pseudoobscura develop as rapidly as tropical flies at cooler temperatures, but exhibit diminished acceleration above 22.5°C and have drastically slowed development by 30°C. Despite ranging from 13 hours for D. erecta at 30°C to 46 hours for D. virilis at 17.5°C, the relative timing of events from cellularization through hatching is constant across all of the species and temperatures examined here, suggesting the existence of a previously unrecognized timer controlling the progress of embryogenesis that has been tuned by natural selection in response to the thermal environment in which each species lives.

A Complete Public Domain Family Genomics Dataset

A Complete Public Domain Family Genomics Dataset
Manuel Corpas, Mike Cariaso, Alain Coletta, David Weiss, Andrew P Harrison, Federico Moran, Huanming Yang

BACKGROUND: The availability of open access genomic data is essential for the personal genomics field. Public genomic data allow comparative analyses, testing of new tools and genotype-phenotype association studies. Personal genomics data of unrelated individuals are available in the public domain, notably the Personal Genome Project; however, to date genomics family data and metadata are severely lacking, mainly due to cost, privacy concerns or restricted access to Next Generation Sequencing (NGS) technology. Family data have a lot to offer as they allow the study of heritability, something which is impossible to do just by using unrelated individuals. FINDINGS: A whole family from Southern Spain decided to genotype, sequence and analyse their personal genomes making them publicly available under a Creative Commons 0 license (CC0; commonly denominated as public domain). These data include a) five 23andMe SNP chip genotype bed files, b) four raw exomes with their assorted bam files and VCF files, c) a metagenomic raw sequencing data file and d) derived data of likely phenotypes using SNPedia-derived tools. CONCLUSIONS: To our knowledge this is the first CC0 released set of genomic, phenotypic and metagenomic data for a whole family. This dataset is also unique in that it was obtained through direct-to-consumer genetic tests. Hence any ordinary citizen with enough budget and samples should be able to reproduce this experiment. We envisage this dataset to be a useful resource for a variety of applications in the personal genomics field as a) negative control data for trait association discovery, b) testing data for development of new software and c) sample data for heritability studies. We encourage prospective users to share with us derived results so that they can be added to our existing collection.

Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato

Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato

Nicola Nadeau, Mayte Ruiz, Patricio Salazar, Brian Counterman, Jose Alejandro Medina, Humberto Ortiz-Zuazaga, Anna Morrison, W. Owen McMillan, Chri Jiggins, Riccardo Papa

Hybrid zones can be valuable tools for studying evolution and identifying genomic regions responsible for adaptive divergence and underlying phenotypic variation. Hybrid zones between subspecies of Heliconius butterflies can be very narrow and are maintained by strong selection acting on colour pattern. The co-mimetic species H. erato and H. melpomene have parallel hybrid zones where both species undergo a change from one colour pattern form to another. We use restriction associated DNA sequencing to obtain several thousand genome wide sequence markers and use these to analyse patterns of population divergence across two pairs of parallel hybrid zones in Peru and Ecuador. We compare two approaches for analysis of this type of data; alignment to a reference genome and de novo assembly, and find that alignment gives the best results for species both closely (H. melpomene) and distantly (H. erato, ~15% divergent) related to the reference sequence. Our results confirm that the colour pattern controlling loci account for the majority of divergent regions across the genome, but we also detect other divergent regions apparently unlinked to colour pattern differences. We also use association mapping to identify previously unmapped colour pattern loci, in particular the Ro locus. Finally, we identify within our sample a new cryptic population of H. timareta in Ecuador, which occurs at relatively low altitude and is mimetic with H. melpomene malleti.

Genome-wide targets of selection: female response to experimental removal of sexual selection in Drosophila melanogaster

Genome-wide targets of selection: female response to experimental removal of sexual selection in Drosophila melanogaster
Paolo Innocenti, Ilona Flis, Edward H Morrow

Despite the common assumption that promiscuity should in general be favored in males, but not in females, to date there is no consensus on the general impact of multiple mating on female fitness. Notably, very little is known about the genetic and physiological features underlying the female response to sexual selection pressures. By combining an experimental evolution approach with genomic techniques, we investigated the effects of single and multiple matings on female fecundity and gene expression. We experimentally manipulated the mating system in replicate populations of Drosophila melanogaster by removing sexual selection, with the aim of testing differences in short term post-mating effects of females evolved under different mating strategies. We show that monogamous females suffer decreased fecundity, a decrease that was partially recovered by experimentally reversing the selection pressure back to the ancestral promiscuous state. The post-mating gene expression profiles of monogamous females differ significantly from promiscuous females, involving 9% of the genes tested. These transcripts are active in several tissues, mainly ovaries, neural tissues and midgut, and are involved in metabolic processes, reproduction and signaling pathways. Our results demonstrate how the female post-mating response can evolve under different mating systems, and provide novel insights into the genes targeted by sexual selection in females, by identifying a list of candidate genes responsible for the decrease in female fecundity in the absence of promiscuity.

Improved annotation of 3-prime untranslated regions and complex loci by combination of strand-specific Direct RNA Sequencing, RNA-seq and ESTs

Improved annotation of 3-prime untranslated regions and complex loci by combination of strand-specific Direct RNA Sequencing, RNA-seq and ESTs
Nick Schurch, Christian Cole, Alexander Sherstnev, Junfang Song, Céline Duc, Kate G. Storey, W. H. Irwin McLean, Sara J. Brown, Gordon G. Simpson, Geoffrey J. Barton
(Submitted on 11 Nov 2013)

The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct annotation is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3-prime untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3-prime polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3-prime UTR re-annotation (including extension of one 3-prime UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data