Patterns of positive selection in seven ant genomes


Patterns of positive selection in seven ant genomes

Julien Roux, Eyal Privman, Sebastien Moretti, Josephine T. Daub, Marc Robinson-Rechavi, Laurent Keller
(Submitted on 19 Nov 2013)

The evolution of ant species is marked by remarkable adaptations that allowed the development of very complex social systems. To identify how ant-specific adaptations are associated with specific patterns of molecular evolution we searched for signs of positive selection on amino-acid changes in proteins during the evolution of the ant lineage. We identified 24 functional categories of genes which were enriched for positively selected genes in the ant lineage. We also reanalyzed genome-wide dataset in bees and flies with the same methodology to check if genes under positive selection in ants were also under positive selection in the other analyzed lineages. Notably, genes implicated in immunity were enriched for positively selected genes in the three lineages, ruling out the hypothesis that the evolution of hygienic behaviors in social insects caused a major relaxation of selective pressure on this set of genes. Our scan also indicated that genes implicated in neurogenesis and olfaction started to undergo increased positive selection before the evolution of sociality in Hymenoptera, although it is assumed that the main challenges of the olfactory and neural systems in this lineage occurred with the evolution of social living. Finally, the comparison between these three lineages allowed us to pinpoint molecular evolution patterns that were specific to the ant lineage. In particular, there was relaxed selective pressure for genes related to metabolism in ants but not in bees and flies, possibly reflecting the loss of flight in ant workers. By contrast, there was recurrent positive selection on genes with mitochondrial functions specifically in ants, suggesting that the activity of mitochondria was improved during ant evolution. This might have been an important step toward the evolution of extreme lifespan that is a hallmark of this lineage.

Joint analysis of functional genomic data and genome-wide association studies of 18 human traits

Joint analysis of functional genomic data and genome-wide association studies of 18 human traits
Joseph Pickrell

Annotations of gene structures and regulatory elements can inform genome-wide association studies (GWAS). However, choosing the relevant annotations for interpreting an association study of a given trait remains challenging. We describe a statistical model that uses association statistics computed across the genome to identify classes of genomic element that are enriched or depleted for loci that influence a trait. The model naturally incorporates multiple types of annotations. We applied the model to GWAS of 18 human traits, including red blood cell traits, platelet traits, glucose levels, lipid levels, height, BMI, and Crohn’s disease. For each trait, we evaluated the relevance of 450 different genomic annotations, including protein-coding genes, enhancers, and DNase-I hypersensitive sites in over a hundred tissues and cell lines. We show that the fraction of phenotype-associated SNPs that influence protein sequence ranges from around 2% (for platelet volume) up to around 20% (for LDL cholesterol); that repressed chromatin is significantly depleted for SNPs associated with several traits; and that cell type-specific DNase-I hypersensitive sites are enriched for SNPs associated with several traits (for example, fibroblasts in Crohn’s disease and muscle tissue in bone density). Finally, by re-weighting each GWAS using information from functional genomics, we increase the number of loci with high-confidence associations by around 5%.

On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al

On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al.

Claudiu I Bandea

In a recent article entitled “On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE”, Graur et al. dismantle ENCODE’s evidence and conclusion that 80% of the human genome is functional. However, the article by Graur et al. contains assumptions and statements that are questionable. Primarily, the authors limit their evaluation of DNA’s biological functions to informational roles, sidestepping putative non-informational functions. Here, I bring forward an old hypothesis on the evolution of genome size and on the role of so called ‘junk DNA’ (jDNA), which might explain C-value enigma. According to this hypothesis, the jDNA functions as a defense mechanism against insertion mutagenesis by endogenous and exogenous inserting elements such as retroviruses, thereby protecting informational DNA sequences from inactivation or alteration of their expression. Notably, this model couples the mechanisms and the selective forces responsible for the origin of jDNA with its putative protective biological function, which represents a classic case of ‘fighting fire with fire.’ One of the key tenets of this theory is that in humans and many other species, jDNAs serves as a protective mechanism against insertional oncogenic transformation. As an adaptive defense mechanism, the amount of protective DNA varies from one species to another based on the rate of its origin, insertional mutagenesis activity, and evolutionary constraints on genome size.

Validity of covariance models for the analysis of geographical variation

Validity of covariance models for the analysis of geographical variation
Gilles Guillot, René Schilling, Emilio Porcu, Moreno Bevilacqua
(Submitted on 17 Nov 2013)

Due to the availability of large molecular data-sets, covariance models are increasingly used to describe the structure of genetic variation as an alternative to more heavily parametrised biological models. We focus here on a class of parametric covariance models that received sustained attention lately and show that the conditions under which they are valid mathematical models have been overlooked so far. We provide rigorous results for the construction of valid covariance models in this family. We also outline how to construct alternative covariance models for the analysis of geographical variation that are both mathematically well behaved and easily implementable.

Data Mining of Online Genealogy Datasets for Revealing Lifespan Patterns in Human Population

Data Mining of Online Genealogy Datasets for Revealing Lifespan Patterns in Human Population
Michael Fire, Yuval Elovici
(Submitted on 18 Nov 2013)

Online genealogy datasets contain extensive information about millions of people and their past and present family connections. This vast amount of data can assist in identifying various patterns in human population. In this study, we present methods and algorithms which can assist in identifying variations in lifespan distributions of human population in the past centuries, in detecting social and genetic features which correlate with human lifespan, and in constructing predictive models of human lifespan based on various features which can easily be extracted from genealogy datasets.
We have evaluated the presented methods and algorithms on a large online genealogy dataset with over a million profiles and over 8.8 million connections, all of which were collected from the WikiTree website. Our findings indicate that significant but small positive correlations exist between the parents’ lifespan and their children’s lifespan. Additionally, we found slightly higher and significant correlations between the lifespans of spouses. We also discovered a very small positive and significant correlation between longevity and reproductive success in males, and a small and significant negative correlation between longevity and reproductive success in females. Moreover, our machine learning algorithms presented better than random classification results in predicting which people who outlive the age of 50 will also outlive the age of 80.
We believe that this study will be the first of many studies which utilize the wealth of data on human populations, existing in online genealogy datasets, to better understand factors which influence human lifespan. Understanding these factors can assist scientists in providing solutions for successful aging.

Genetic diversity in introduced populations with Allee effect

Genetic diversity in introduced populations with Allee effect
Meike J. Wittmann, Wilfried Gabriel, Dirk Metzler
(Submitted on 18 Nov 2013)

A phenomenon that strongly influences the demography of small introduced populations and thereby potentially their genetic diversity is the Allee effect, a reduction in population growth rates at small population sizes. We take a stochastic modeling approach to investigate levels of genetic diversity in populations that successfully overcame a strong demographic Allee effect, a scenario in which populations smaller than a certain critical size are expected to decline. Our results indicate that compared to successful populations without Allee effect, successful Allee-effect populations tend to 1) derive from larger founder population sizes and thus have a higher initial amount of genetic variation, 2) spend fewer generations at small population sizes where genetic drift is particularly strong, and 3) spend more time around the critical population size and thus experience more drift there. Altogether, the Allee effect can either increase or decrease genetic diversity, depending on the average founder population size. In the case of multiple introduction events, there is an additional increase in diversity because Allee-effect populations tend to derive from a larger number of introduction events than other populations. Finally, we show that given genetic data from sufficiently many populations, we can statistically infer the critical population size.

The evolution of sex differences in disease genetics

The evolution of sex differences in disease genetics
William P Gilks, Jessica K Abbott, Edward H Morrow
There are significant differences in the biology of males and females, ranging from biochemical pathways to behavioural responses, which are relevant to modern medicine. Broad-sense heritability estimates differ between the sexes for many common medical disorders, indicating that genetic architecture can be sex-dependent. Recent genome-wide association studies (GWAS) have successfully identified sex-specific and sex-biased effects, where in addition to sex-specific effects on gene expression, twenty-two medical traits have sex-specific or sex-biased loci. Sex-specific genetic architecture of complex traits is also extensively documented in model organisms using genome-wide linkage or association mapping, and in gene disruption studies. The evolutionary origins of sex-specific genetic architecture and sexual dimorphism lie in the fact that males and females share most of their genetic variation yet experience different selection pressures. At the extreme is sexual antagonism, where selection on an allele acts in opposite directions between the sexes. Sexual antagonism has been repeatedly identified via a number of experimental methods in a range of different taxa. Although the molecular basis remains to be identified, mathematical models predict the maintenance of deleterious variants that experience selection in a sex-dependent manner. There are multiple mechanisms by which sexual antagonism and alleles under sex-differential selection could contribute toward the genetics of common, complex disorders. The evidence we review clearly indicates that further research into sex-dependent selection and the sex-specific genetic architecture of diseases would be rewarding. This would be aided by studies of laboratory and wild animal populations, and by modelling sex-specific effects in genome-wide association data with joint, gene-by-sex interaction tests. We predict that even sexually monomorphic diseases may harbour cryptic sex-specific genetic architecture. Furthermore, empirical evidence suggests that investigating sex-dependent epistasis may be especially rewarding. Finally, the prevalent nature of sex-specific genetic architecture in disease offers scope for the development of more effective, sex-specific therapies.

On the optimal trimming of high-throughput mRNA sequence data

On the optimal trimming of high-throughput mRNA sequence data
Matthew D MacManes

The widespread and rapid adoption of high-throughput sequencing technologies has changed the face of modern studies of evolutionary genetics. Indeed, newer sequencing technologies, like Illumina sequencing, have afforded researchers the opportunity to gain a deep understanding of genome level processes that underlie evolutionary change. In particular, researchers interested in functional biology and adaptation have used these technologies to sequence mRNA transcriptomes of specific tissues, which in turn are often compared to other tissues, or other individuals with different phenotypes. While these techniques are extremely powerful, careful attention to data quality is required. In particular, because high-throughput sequencing is more error-prone than traditional Sanger sequencing, quality trimming of sequence reads should be an important step in all data processing pipelines. While several software packages for quality trimming exist, no general guidelines for the specifics of trimming have been developed. Here, using empirically derived sequence data, I provide general recommendations regarding the optimal strength of trimming, specifically in mRNA-Seq studies. Although very aggressive quality trimming is common, this study suggests that a more gentle trimming, specifically of those nucleotides whose Phred score < 2 or < 5, is optimal for most studies across a wide variety of metrics.

The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana

The genomic landscape of meiotic crossovers and gene conversions in Arabidopsis thaliana
Erik Wijnker, Geo Velikkakam James, Jia Ding, Frank Becker, Jonas R. Klasen, Vimal Rawat, Beth A. Rowan, Daniel F. de Jong, C. Bastiaan de Snoo, Luis Zapata, Bruno Huettel, Hans de Jong, Stephan Ossowski, Detlef Weigel, Maarten Koornneef, Joost J.B. Keurentjes, Korbinian Schneeberger
(Submitted on 13 Nov 2013)

Knowledge of the exact distribution of meiotic crossovers (COs) and gene conversions (GCs) is essential for understanding many aspects of population genetics and evolution, from haplotype structure and long-distance genetic linkage to the generation of new allelic variants of genes. To this end, we resequenced the four products of 13 meiotic tetrads along with 10 doubled haploids derived from Arabidopsis thaliana hybrids. GC detection through short reads has previously been confounded by genomic rearrangements. Rigid filtering for misaligned reads allowed GC identification at high accuracy and revealed an ~80-kb transposition, which undergoes copy-number changes mediated by meiotic recombination. Non-crossover associated GCs were extremely rare most likely due to their short average length of ~25-50 bp, which is significantly shorter than the length of CO associated GCs. Overall, recombination preferentially targeted non-methylated nucleosome-free regions at gene promoters, which showed significant enrichment of two sequence motifs.

Drosophila embryogenesis scales uniformly across temperature and developmentally diverse species

Drosophila embryogenesis scales uniformly across temperature and developmentally diverse species
Steven Gregory Kuntz, Michael B Eisen
Temperature affects both the timing and outcome of animal development, but the detailed effects of temperature on the progress of early development have been poorly characterized. To determine the impact of temperature on the order and timing of events during Drosophila melanogaster embryogenesis, we used time-lapse imaging to track the progress of embryos from shortly after egg laying through hatching at seven precisely maintained temperatures between 17.5°C and 32.5°C. We employed a combination of automated and manual annotation to determine when 36 milestones occurred in each embryo. D. melanogaster embryogenesis takes 33 hours at 17.5°C, and accelerates with increasing temperature to a low of 16 hours at 27.5°C, above which embryogenesis slows slightly. Remarkably, while the total time of embryogenesis varies over two fold, the relative timing of events from cellularization through hatching is constant across temperatures. To further explore the relationship between temperature and embryogenesis, we expanded our analysis to cover ten additional Drosophila species of varying climatic origins. Six of these species, like D. melanogaster, are of tropical origin, and embryogenesis time at different temperatures was similar for them all. D. mojavensis, a sub-tropical fly, develops slower than the tropical species at lower temperatures, while D. virilis, a temperate fly, exhibits slower development at all temperatures. The alpine sister species D. persimilis and D. pseudoobscura develop as rapidly as tropical flies at cooler temperatures, but exhibit diminished acceleration above 22.5°C and have drastically slowed development by 30°C. Despite ranging from 13 hours for D. erecta at 30°C to 46 hours for D. virilis at 17.5°C, the relative timing of events from cellularization through hatching is constant across all of the species and temperatures examined here, suggesting the existence of a previously unrecognized timer controlling the progress of embryogenesis that has been tuned by natural selection in response to the thermal environment in which each species lives.