Author post: An amino acid polymorphism in the Drosophila insulin receptor demonstrates pleiotropic and adaptive function in life history trait

This next guest post is by Annalise Paaby on her paper: Paaby et al. “An amino acid polymorphism in the Drosophila insulin receptor demonstrates pleiotropic and adaptive function in life history traits” bioRxived here.

Find the alleles!
Organisms vary, even within populations, in ways that appear adaptive. We would very much like to identify the genetic elements that encode these phenotypic differences—but this is a challenging task. For polygenic traits, the tiny contributions of single loci can be near-impossible to detect in an experimental setting. In contrast, natural selection operates on a grand scale, with power to discriminate between alleles. We took advantage of the fact that Drosophila melanogaster are distributed across an extreme environmental gradient in order to identify a specific polymorphism that contributes to adaptive variation.D. melanogaster live along the east coasts of North America and Australia. On both continents, flies in low-latitude, warm environments develop faster and are more fecund, while flies in high-latitude, cold environments live longer and are more resistant to most stresses.Knocking out insulin signaling genes extends lifespan, increases stress tolerance, and reduces reproduction. Given these phenotypes, we wondered whether insulin signaling genes might vary in natural populations and influence life history. In a paper published a few years ago, we showed that alleles of a polymorphism in the Insulin-like Receptor (InR) showed clines in frequency in both North America and Australia. Since the populations were founded at different times from different source populations, the replicated pattern on separate continents is good evidence that the polymorphism is a target of selection.

What is this polymorphism?
The polymorphism we discovered is a complex indel that disrupts a region of glutamines and histidines in the first exon of InR. In our original survey, we found many segregating alleles, all differing in length by multiples of three nucleotides.H owever, two alleles comprise the majority. An allele we call InRshort is common at high latitudes, and InRlong, which is six nucleotides longer, is common at low latitudes. The alleles differ in four amino acids across a span of 16 residues.

The alleles affect signaling
In our current study, we show that InRshort and InRlong affect levels of insulin signaling. We took InRshort and InRlong flies from a single population in New York, replaced the X and second chromosomes, and randomized the genetic backgrounds of the third chromosome, on which InR resides. We measured levels of insulin signaling in test lines by performing qPCR on seven transcriptional targets in the pathway, all downstream of the receptor.We found that for five of the seven targets (four of which were significant), signaling was highest in InRlong, lowest in InRshort, and intermediate in the heterozygote—suggesting that InRshort and InRlong act additively on signaling levels. The directionality of these results makes sense: reduction of insulin signaling is known to extend lifespan, increase stress tolerance and reduce reproductive success, and these are the phenotypes we see at high latitudes where InRshort is common.

Fluctuations over time
In our new study, we returned to the North American populations we evaluated five years prior. However, this time around we mapped 100-bp paired-end reads from pooled population samples. (These data relate to Alan Bergland’s larger exploration of spatial and temporal variation in D. melanogaster, described here on arXiv.) We called each of the discrete polymorphisms within the complex indel polymorphism—SNPs or small indels—individually. Some of those discrete polymorphisms distinguish between the InRshort and InRlong alleles, and they confirm that the clines persist in North America.We reasoned that alleles prevalent in high-latitude, cold climates might be selected for in the winter, and alleles prevalent in low-latitude, warm climates might be selected for in the summer. We examined a Pennsylvania population at multiple timepoints over three years and saw dramatic fluctuations in allele frequency (changes of approximately 20%) for discrete polymorphisms associated with InRshort and InRlong. As predicted, the “winter” and “summer” alleles were those common at high and low latitudes, respectively.However, the polymorphisms that showed the most dramatic fluctuations over seasonal time were not necessarily those with the strongest clines in frequency across geographical space. We suggest that aspects of demography and selection probably vary between seasonal and geographical environments, even in the face of apparently similar climatic pressures.

A question of pleiotropy
A longstanding question in the field of life history evolution is whether single alleles affect multiple traits at once (pleiotropy) or affect traits individually but reside near each other (linkage). The question itself arises from the observation, made many times over, that life history traits are typically correlated. For example, long-lived individuals often show reduced reproductive fitness. Longevity is also often positively correlated with the ability to tolerate stress. Do the same genetic variants encode multiple trait phenotypes?We assayed our InRshort and InRlong test lines for multiple phenotypes: fecundity, development time, body size and allometry, body weight and lipid content, tolerance for multiple stresses, and lifespan. We used the test lines described above, a replicate set of InRshort and InRlong lines derived from a second population, and lines in which we measured the effects of InRshort and InRlong in an InRhypomorph mutant background.Our full report can be found in the manuscript, but the take-home message is that InRshort and InRlong are significantly associated with all of the tested traits, in directions predicted by a selection regime favoring fast development time, rapid egg-laying, and high heat tolerance in warm climates, and resistance to cold and starvation stresses in cold climates. The InRshort allele was also associated with increased lifespan in males, though we do not necessarily expect that lifespan itself is associated with fitness.In conclusion, our results implicate insulin signaling as a major mediator of life history adaptation in D. melanogaster, and suggest that tradeoffs can be explained by extensive pleiotropy at a single locus.

Some other things I would like to mention
I value this study for its functional tests—phenotypic effects of candidate polymorphisms are often missing from evolutionary studies. However, and this is a major caveat: the InRshort and InRlong alleles were embedded in genotypic backgrounds that extended well beyond the locus in the test lines. On their own, I do not consider the functional tests definitive. But D. melanogaster have low linkage disequilibrium, which we know decays rapidly just outside our candidate polymorphism. In my opinion, the segregation of InRshort and InRlong in large, recombining wild populations pinpoints the functional alleles, while the experimental assays confirm our hypotheses about the selection regime.When we first measured fecundity, we counted every single egg laid by every single female over every single one of their lives. And the InRlong females, which we knew were more fecund—their culture bottles grew like gangbusters—laid only five more eggs on average than InRshort females! Highly non-significant. But, it looked like the InRlong flies laid eggs faster. We set up a different assay to measure eggs laid in the first day, and InRlong was six times more fecund. I think this provides an important lesson. We can easily imagine big fitness consequences for egg laying rate, but we might not think to measure it in the lab. Many studies, especially those from a molecular genetics point of view, have been keen to emphasize decoupling of lifespan and reproduction for so-called longevity genes. For conclusions drawn about natural genetic variants (which are the ones of utmost relevance, in my opinion), the question of tradeoffs must consider those fitness axes that are relevant to the wild organism. And these are often unknowable.We found that InRshort and InRlong were associated with smaller and larger body sizes, respectively. This makes sense in terms of levels of insulin signaling, but not in terms of body sizes in wild populations. High latitude flies are typically larger, not smaller. So, if InRshort and InRlong alleles affect body size, they either do so epistatically with other body size loci or they suffer antagonistic selection pressures along multiple fitness axes. Interesting!

DISEASES: Text mining and data integration of disease–gene associations

DISEASES: Text mining and data integration of disease–gene associations

Sune Pletscher-Frankild, Albert Pallejà, Kalliopi Tsafou, Janos X Binder, Lars Juhl Jensen
doi: http://dx.doi.org/10.1101/008425

Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease–gene associations from biomedical abstracts. The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences both within and between sentences. We show that this approach is able to extract half of all manually curated associations with a false positive rate of only 0.16%. Nonetheless, text mining should not stand alone, but be combined with other types of evidence. For this reason, we have developed the DISEASES resource, which integrates the results from text mining with manually curated disease–gene associations, cancer mutation data, and genome-wide association studies from existing databases. The DISEASES resource is accessible through a user-friendly web interface at http://diseases.jensenlab.org/, where the text-mining software and all associations are also freely available for download.

A genomic map of the effects of linked selection in Drosophila

A genomic map of the effects of linked selection in Drosophila

Eyal Elyashiv, Shmuel Sattath, Tina T. Hu, Alon Strustovsky, Graham McVicker, Peter Andolfatto, Graham Coop, Guy Sella
(Submitted on 23 Aug 2014)

Natural selection at one site shapes patterns of genetic variation at linked sites. Quantifying the effects of ‘linked selection’ on levels of genetic diversity is key to making reliable inference about demography, building a null model in scans for targets of adaptation, and learning about the dynamics of natural selection. Here, we introduce the first method that jointly infers parameters of distinct modes of linked selection, notably background selection and selective sweeps, from genome-wide diversity data, functional annotations and genetic maps. The central idea is to calculate the probability that a neutral site is polymorphic given local annotations, substitution patterns, and recombination rates. Information is then combined across sites and samples using composite likelihood in order to estimate genome-wide parameters of distinct modes of selection. In addition to parameter estimation, this approach yields a map of the expected neutral diversity levels along the genome. To illustrate the utility of our approach, we apply it to genome-wide resequencing data from 125 lines in Drosophila melanogaster and reliably predict diversity levels at the 1Mb scale. Our results corroborate estimates of a high fraction of beneficial substitutions in proteins and untranslated regions (UTR). They allow us to distinguish between the contribution of sweeps and other modes of selection around amino acid substitutions and to uncover evidence for pervasive sweeps in untranslated regions (UTRs). Our inference further suggests a substantial effect of linked selection from non-classic sweeps. More generally, we demonstrate that linked selection has had a larger effect in reducing diversity levels and increasing their variance in D. melanogaster than previously appreciated.

Escape from crossover interference increases with maternal age

Escape from crossover interference increases with maternal age

Christopher L. Campbell, Nicholas A. Furlotte, Nick Eriksson, David Hinds, Adam Auton
(Submitted on 23 Aug 2014)

Recombination plays a fundamental role in meiosis, ensuring the proper segregation of chromosomes and contributing to genetic diversity by generating novel combinations of alleles. Using data derived from directUtoUconsumer genetic testing, we investigated patterns of recombination in over 4,200 families. Our analysis revealed a number of sex differences in the distribution of recombination. We find the fraction of male events occurring within hotspots to be 4.6% higher than for females. We confirm that the recombination rate increases with maternal age, while hotspot usage decreases, with no such effects observed in males. Finally, we show that the placement of female recombination events becomes increasingly deregulated with maternal age, with an increasing fraction of events appearing to escape crossover interference.

IPED2: Inheritance Path based Pedigree Reconstruction Algorithm for Complicated Pedigrees

IPED2: Inheritance Path based Pedigree Reconstruction Algorithm for Complicated Pedigrees

Dan He, Zhanyong Wang, Laxmi Parida, Eleazar Eskin
(Submitted on 23 Aug 2014)

Reconstruction of family trees, or pedigree reconstruction, for a group of individuals is a fundamental problem in genetics. The problem is known to be NP-hard even for datasets known to only contain siblings. Some recent methods have been developed to accurately and efficiently reconstruct pedigrees. These methods, however, still consider relatively simple pedigrees, for example, they are not able to handle half-sibling situations where a pair of individuals only share one parent. In this work, we propose an efficient method, IPED2, based on our previous work, which specifically targets reconstruction of complicated pedigrees that include half-siblings. We note that the presence of half-siblings makes the reconstruction problem significantly more challenging which is why previous methods exclude the possibility of half-siblings. We proposed a novel model as well as an efficient graph algorithm and experiments show that our algorithm achieves relatively accurate reconstruction. To our knowledge, this is the first method that is able to handle pedigree reconstruction based on genotype data only when half-sibling exists in any generation of the pedigree.

Population split time estimation and X to autosome effective population size differences inferred using physically phased genomes

Population split time estimation and X to autosome effective population size differences inferred using physically phased genomes

Shiya Song, Elzbieta Sliwerska, Jeffrey M Kidd
doi: http://dx.doi.org/10.1101/008367

Haplotype resolved genome sequence information is of growing interest due to its applications in both population genetics and medical genetics. Here, we assess the ability to correctly reconstruct haplotype sequences using fosmid pooled sequencing and apply the sequences to explore historical population relationships. We resolved phased haplotypes of sample NA19240, a trio child from the Yoruba HapMap collection using pools of a total of 521,783 fosmid clones. We phased 93% of heterozygous SNPs into haplotype-resolved blocks, with an N50 size of 318kb. Using trio information from HapMap, we linked adjacent blocks together to form paternal and maternal alleles, producing near-to-complete haplotypes. Comparison with 33 individual fosmids sequenced using capillary sequencing shows that our reconstructed sequence haplotypes have a sequence error rate of 0.005%. Utilizing fosmid-phased haplotypes from a Yoruba, a European and a Gujarati sample, we analyzed population history and inferred population split times. We date the initial split between Yoruba and out of African populations to 90,000-100,000 years ago with substantial gene flow occurring until nearly 50,000 years ago, and obtain congruent results with the autosomes and the X chromosome. We estimate that the initial split between European and Gujarati population occurred around 45,000 years ago and gene flow ended around 28,000 years ago. Analysis of X vs autosome inferred effective population sizes reveals distinct epochs in which the ratio of the effective number of males to females changes. We find a period of female bias during the ancestral human lineage up to 1 million years ago and a short period of male bias in Yoruba lineage from 160-400 thousand years ago. We demonstrate the construction of haplotype sequences of sufficient completeness and accuracy for population genetic analysis. As experimental and analytic methods improve, these approaches will continue to shed new light on the history of populations.

Sources of PCR-induced distortions in high-throughput sequencing datasets

Sources of PCR-induced distortions in high-throughput sequencing datasets

Justus M Kebschull, Anthony M Zador
doi: http://dx.doi.org/10.1101/008375

PCR allows the exponential and sequence specific amplification of DNA, even from minute starting quantities. Today, PCR is at the core of the most successful DNA sequencing technologies and is a fundamental step in preparing DNA samples for high throughput sequencing. Despite its importance, we have little comprehensive understanding of the biases and errors that PCR introduces into pools of DNA molecules. Understanding PCRs imperfections and their impact on the amplification of different sequences in a complex mixture is particularly important for a proper understanding of high-throughput sequencing data. We examined the effects of bias, stochasticity, template switches and polymerase errors introduced during PCR on sequence representation in next-generation sequencing libraries. Using Illumina sequencing results of a pool of diverse PCR amplicons with a defined structure, we searched for signatures of each process. We further developed quantitative models for each process and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. PCR errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results will have particular relevance to single cell sequencing, in which sequences are represented by only one or a few molecules.