We investigated the evolution of the response of human, chicken, alligator and frog glucocorticoid receptors (GRs) to dexamethasone, cortisol, corticosterone, 11-deoxycorticosterone, 11-deoxycortisol and aldosterone. We find significant differences among these vertebrates in the transcriptional activation of their full length GRs by these steroids, indicating that there were changes in the specificity of the GR for steroids during the evolution of terrestrial vertebrates. To begin to study the role of interactions between different domains on the GR in steroid sensitivity and specificity for terrestrial GRs, we investigated transcriptional activation of truncated GRs containing their hinge domain and ligand binding domain (LBD) fused to a GAL4 DNA binding domain (GAL4 DBD). Compared to corresponding full length GRs, transcriptional activation of GAL4 DBD-GR hinge/LBD constructs required higher steroid concentrations and displayed altered steroid specificity, indicating that interactions between the hinge/LBD and other domains are important in glucocorticoid activation of these terrestrial GRs.
Monthly Archives: January 2016
Statistical evidence for common ancestry: New tests of universal ancestry
Statistical evidence for common ancestry: New tests of universal ancestry
While there is no doubt among evolutionary biologists that all living species, or merely all living species within a particular group (e.g., animals), share descent from a common ancestor, formal statistical methods for evaluating common ancestry from aligned DNA sequence data have received criticism. One primary criticism is that prior methods take sequence similarity as evidence for common ancestry while ignoring other potential biological causes of similarity, such as functional constraints. We present a new statistical framework to test separate ancestry versus common ancestry that avoids this pitfall. We illustrate the efficacy of our approach using a recently published large molecular alignment to examine common ancestry of all primates (including humans). We find overwhelming evidence against separate ancestry and in favor of common ancestry for orders and families of primates. We also find overwhelming evidence that humans share a common ancestor with other primate species. The novel statistical methods presented here provide formal means to test separate ancestry versus common ancestry from aligned DNA sequence data while accounting for functional constraints that limit nucleotide base usage on a site-by-site basis.
Phenotypic spandrel: absolute discrimination and ligand antagonism
Phenotypic spandrel: absolute discrimination and ligand antagonism
We consider the general problem of absolute discrimination between categories of ligands irrespective of their concentration. An instance of this problem is immune discrimination between self and not-self. We connect this problem to biochemical adaptation, and establish that ligand antagonism – the ability of sub threshold ligands to negatively impact response – is a necessary consequence of absolute discrimination.Thus antagonism constitutes a “phenotypic spandrel”: a phenotype existing as a necessary by-product of another phenotype. We exhibit a simple analytic model of absolute discrimination displaying ligand antagonism, where antagonism strength is linear in distance from threshold. This contrasts with proofreading based models, where antagonism vanishes far from threshold and thus displays an inverted hierarchy of antagonism compared to simple model . The phenotypic spandrel studied here is expected to structure many decision pathways such as immune detection mediated by TCRs and FcϵRIs.
Winners curse correction and variable thresholding improve performance of polygenic risk modeling based on summary-level data from genome-wide association studies
Heritability analysis suggests that genome-wide association studies (GWAS) have the potential to improve genetic risk prediction for complex diseases. Polygenic risk-score (PRS) is a widely used modelling technique that requires only availability of summary-level data from the discovery samples. We propose two modifications to improve the performance of PRS. First, we propose threshold dependent winners curse adjustments for marginal association coefficients that are used to weight the SNPs in PRS. Second, to exploit various external functional/annotation knowledge that might identify subset of SNPs highly enriched for association signals, we consider using variable thresholds for SNPs selection. We applied our methods to the GWAS summary-level data of fourteen complex diseases. Our analysis shows that while a simple winners curse correction uniformly leads to enhancement of performance of the models across traits, incorporation of functional SNPs was beneficial for only selected traits. Compared to standard PRS algorithm, the proposed methods in combination leads to substantial efficiency gain (25-50% increase in the prediction R2) for five out of fifteen diseases. As an example, for GWAS of type 2 diabetes, the lasso-based winners curse correction improves prediction R2 from 2.29% based on standard PRS to 3.1% (P=0.0017) and incorporating functional annotation data further improved R2 to 3.53% (P=2.0E-5). Our simulation studies provided further clarification why differential treatment of certain category of functional SNPs, even when shown to be highly enriched for GWAS-heritability, does not lead to proportionate improvement in genetic risk-prediction due to non-uniform linkage disequilibrium structure.
Long single-molecule reads can resolve the complexity of the Influenza virus composed of rare, closely related mutant variants
As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm”. The ability of next-generation sequencing to produce massive quantities of genomic data inexpensively has allowed virologists to study the structure of viral populations from an infected host at an unprecedented resolution. However, high similarity and low frequency of the viral variants impose a huge challenge to assembly of individual full-length genomes. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. The proposed protocol is able to eliminate sequencing errors and reconstruct closely related viral mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2% and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. With high sensitivity and accuracy, 2SNV is anticipated to facilitate not only viral quasispecies reconstruction, but also other biological questions that require detection of rare haplotypes such as genetic diversity in cancer cell population, and monitoring B-cell and T-cell receptor repertoire. The open source implementation of 2SNV is freely available for download at http://alan.cs.gsu.edu/NGS/?q=content/2snv
Variation in the molecular clock of primates
Variation in the molecular clock of primates
Events in primate evolution are often dated by assuming a “molecular clock”, i.e., a constant rate of substitution per unit time, but the validity of this assumption remains unclear. Among mammals, it is well known that there exists substantial variation in yearly substitution rates. Such variation is to be expected from differences in life-history traits, suggesting that it should also be found among primates. Motivated by these considerations, we analyze whole genomes from ten primate species, including Old World Monkeys (OWMs), New World Monkeys (NWMs) and apes, focusing on putatively neutral autosomal sites and controlling for possible effects of biased gene conversion and methylation at CpG sites. We find that substitution rates are ~65% higher in lineages leading from the hominoid-NWM ancestor to NWMs than to apes. Within apes, rates are ~2% higher in chimpanzees and ~7% higher in the gorilla than in humans. Substitution types subject to biased gene conversion show no more variation among species than those not subject to it. Not all mutation types behave similarly, however: in particular, transitions at CpG sites exhibit a more clock-like behavior than do other types, presumably due to their non-replicative origin. Thus, not only the total rate, but also the mutational spectrum varies among primates. This finding suggests that events in primate evolution are most reliably dated using CpG transitions. Taking this approach, we estimate that the average time to the most recent common ancestor of human and chimpanzee is 12.1 million years and their split time 7.9 million years.
ARGON: fast, whole-genome simulation of the discrete time Wright-Fisher process
ARGON: fast, whole-genome simulation of the discrete time Wright-Fisher process
Simulation under the coalescent model is ubiquitous in the analysis of genetic data. The rapid growth of real data sets from multiple human populations led to increasing interest in simulating very large sample sizes at whole-chromosome scales. When the sample size is large, the coalescent model becomes an increasingly inaccurate approximation of the discrete time Wright-Fisher model (DTWF). Analytical and computational treatment of the DTWF, however, is generally harder. We present a simulator (ARGON) for the DTWF process that scales up to hundreds of thousands of samples and whole-chromosome lengths, with a time/memory performance comparable or superior to currently available methods for coalescent simulation. The simulator supports arbitrary demographic history, migration, variable mutation/recombination rates and gene conversion, and efficiently outputs pairwise identical-by-descent (IBD) sharing data. ARGON (version 0.1) is written in Java, open source, and freely available at https://github.com/pierpal/ARGON.
Eighteenth century Yersinia pestis genomes reveal the long-term persistence of an historical plague focus
The 14th-18th century pandemic of Yersinia pestis caused devastating disease outbreaks in Europe for almost 400 years. The reasons for plague’s persistence and abrupt disappearance in Europe are poorly understood, but could have been due to either the presence of now-extinct plague foci in Europe itself, or successive disease introductions from other locations. Here we present five Y. pestis genomes from one of the last European outbreaks of plague, from 1722 in Marseille, France. The lineage identified has not been found in any extant Y. pestis foci sampled to date, and has its ancestry in strains obtained from victims of the 14th century Black Death. These data suggest the existence of a previously uncharacterized historical plague focus that persisted for at least three centuries. We propose that this disease source may have been responsible for the many resurgences of plague in Europe following the Black Death.
A neo-sex chromosome in the Monarch butterfly, Danaus plexippus
A neo-sex chromosome in the Monarch butterfly, Danaus plexippus
We report the discovery of a neo-sex chromosome in Monarch butterfly, Danaus plexippus, and several of its close relatives. Z-linked scaffolds in the D. plexippus genome assembly were identified via sex-specific differences in Illumina sequencing coverage. Additionally, a majority of the D. plexippus genome assembly was assigned to chromosomes based on counts of 1-to-1 orthologs relative to the butterfly Melitaea cinxia (and two other Lepidopteran species), where genome scaffolds have been robustly mapped to linkage groups. Combining sequencing-coverage based Z-linkage with homology based chromosomal assignments provided strong evidence for a Z-autosome fusion in the Danaus lineage, involving the autosome homologous to chromosome 21 in M. cinxia. Coverage analysis also identified three scaffolds containing notable assembly errors resulting in chimeric Z-autosome fusions. The timing of this Z-autosome fusion event currently remains ambiguous due to incomplete sampling of karyotypes in the Danaini tribe of butterflies. The discovery of a neo-Z and the provisional assignment of chromosome linkage for >90% of D. plexippus genes lay the foundation for novel insights concerning sex chromosome evolution in this increasingly prominent female-heterogametic model species for functional and evolutionary genomics.
The genealogical sorting index and species delimitation
The genealogical sorting index and species delimitation
The Genealogical Sorting Index (gsi) has been widely used in species-delimitation studies, where it is usually interpreted as a measure of the degree to which each of several predefined groups of specimens display a pattern of divergent evolution in a phylogenetic tree. Here we show that the gsi value obtained for a given group is highly dependent on the structure of the tree outside of the group of interest. By calculating the gsi from simulated datasets we demonstrate this dependence undermines some of desirable properties of the statistic. We also review the use of the gsi delimitation studies, and show that the gsi has typically been used under scenarios in which it is expected to produce large and statistically significant results for samples that are not divergent from all other populations and thus should not be considered species. Our proposed solution to this problem performs better than the gsi in under these conditions. Nevertheless, we show that our modified approach can produce positive results for populations that are connected by substantial levels of gene flow, and are thus unlikely to represent distinct species. We stress that the properties of gsi made clear in this manuscript must be taken into account if the statistic is used in species-delimitation studies. More generally, we argue that the results of genetic species-delimitation methods need to be interpreted in the light the biological and ecological setting of a study, and not treated as the final test applied to hypotheses generated by other data.