Spatial localization of recent ancestors for admixed individuals

Spatial localization of recent ancestors for admixed individuals
Wen-Yun Yang, Alexander Platt, Charleston Wen-Kai Chiang, Eleazar Eskin, John Novembre, Bogdan Pasaniuc

Ancestry analysis from genetic data plays a critical role in studies of human disease and evolution. Recent work has introduced explicit models for the geographic distribution of genetic variation and has shown that such explicit models yield superior accuracy in ancestry inference over non-model-based methods. Here we extend such work to introduce a method that models admixture between ancestors from multiple sources across a geographic continuum. We devise efficient algorithms based on hidden Markov models to localize on a map the recent ancestors (e.g. grandparents) of admixed individuals, joint with assigning ancestry at each locus in the genome. We validate our methods using empirical data from individuals with mixed European ancestry from the POPRES study and show that our approach is able to localize their recent ancestors within an average of 470Km of the reported locations of their grandparents. Furthermore, simulations from real POPRES genotype data show that our method attains high accuracy in localizing recent ancestors of admixed individuals in Europe (an average of 550Km from their true location for localization of 2 ancestries in Europe, 4 generations ago). We explore the limits of ancestry localization under our approach and find that performance decreases as the number of distinct ancestries and generations since admixture increases. Finally, we build a map of expected localization accuracy across admixed individuals according to the location of origin within Europe of their ancestors.

Comparison of Y-chromosomal lineage dating using either evolutionary or genealogical Y-STR mutation rates

Comparison of Y-chromosomal lineage dating using either evolutionary or genealogical Y-STR mutation rates

Chuan-Chao Wang, Li Hui

We have compared the Y chromosomal lineage dating between sequence data and commonly used Y-SNP plus Y-STR data. The coalescent times estimated using evolutionary Y-STR mutation rates correspond best with sequence-based dating when the lineages include the most ancient haplogroup A individuals. However, the times using slow mutated STR markers with genealogical rates fit well with sequence-based estimates in main lineages, such as haplogroup CT, DE, K, NO, IJ, P, E, C, I, J, N, O, and R. In addition, genealogical rates lead to more plausible time estimates for Neolithic coalescent sublineages compared with sequence-based dating.

Detection and Polarization of Introgression in a Five-taxon Phylogeny

Detection and Polarization of Introgression in a Five-taxon Phylogeny
James B Pease, Matthew W. Hahn

In clades of closely related taxa, discordant genealogies due to incomplete lineage sorting (ILS) can complicate the detection of introgression. The D-statistic (a.k.a. the ABBA/BABA test) was proposed to infer introgression in the presence of ILS for a four-taxon clade. However, the original D-statistic cannot be directly applied to a symmetric five-taxon phylogeny, and the direction of introgression cannot be inferred for any tree topology. Here we explore the issues associated with previous methods for adapting the D-statistic to a larger tree topology, and propose new “DFOIL” tests to infer both the taxa involved in and the direction of introgressions for a symmetric five-taxon phylogeny. Using theory and simulations, we find that previous modifications of the D-statistic to five-taxon phylogenies incorrectly identify both the pairs of taxa exchanging migrants as well as the direction of introgression. The DFOIL statistics are shown to overcome this deficiency and to correctly determine the direction of introgressions. The DFOIL tests are relatively simple and computationally inexpensive to calculate, and can be easily applied to various phylogenomic datasets. In addition, our general approach to the problem of introgression detection could be adapted to larger tree topologies and other models of sequence evolution.

The Landscape of Human STR Variation

The Landscape of Human STR Variation
Thomas F. Willems, Melissa Gymrek, Gareth Highnam, The 1000 Genomes Project The 1000 Genomes Project, David Mittelman, Yaniv Erlich

Short Tandem Repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across over 1,000 individuals in phase 1 of the 1000 Genomes Project. This process nearly saturated common STR variations. After employing a series of quality controls, we utilize this call set to analyze determinants of STR variation, assess the human reference genome?s representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations. The resource is publicly available at http://strcat.teamerlich.org/ both in raw format and via a graphical interface. 

Evolution of bow-tie architectures in biology

Evolution of bow-tie architectures in biology
Tamar Friedlander, Avraham E. Mayo, Tsvi Tlusty, Uri Alon
Subjects: Molecular Networks (q-bio.MN)

Bow-tie or hourglass structure is a common architectural feature found in biological and technological networks. A bow-tie in a multi-layered structure occurs when intermediate layers have much fewer components than the input and output layers. Examples include metabolism where a handful of building blocks mediate between multiple input nutrients and multiple output biomass components, and signaling networks where information from numerous receptor types passes through a small set of signaling pathways to regulate multiple output genes. Little is known, however, about how bow-tie architectures evolve. Here, we address the evolution of bow-tie architectures using simulations of multi-layered systems evolving to fulfill a given input-output goal. We find that bow-ties spontaneously evolve when two conditions are met: (i) the evolutionary goal is rank deficient, where the rank corresponds to the minimal number of input features on which the outputs depend, and (ii) The effects of mutations on interaction intensities between components are described by product rule – namely the mutated element is multiplied by a random number. Product-rule mutations are more biologically realistic than the commonly used sum-rule mutations that add a random number to the mutated element. These conditions robustly lead to bow-tie structures. The minimal width of the intermediate network layers (the waist or knot of the bow-tie) equals the rank of the evolutionary goal. These findings can help explain the presence of bow-ties in diverse biological systems, and can also be relevant for machine learning applications that employ multi-layered networks.

Most viewed on Haldane’s Sieve: April 2014

The most viewed posts on Haldane’s Sieve in April 2014 were:

The evolution of tyrosine-recombinase elements in Nematoda

The evolution of tyrosine-recombinase elements in Nematoda
Amir Szitenberg, Georgios Koutsovoulos, Mark L Blaxter, David H Lunt
Comments: 18 pages
Subjects: Populations and Evolution (q-bio.PE); Genomics (q-bio.GN)

Transposable elements can be categorised into DNA and RNA elements based on their mechanism of transposition. Tyrosine recombinase elements (YREs) are relatively rare and poorly understood, despite sharing characteristics with both DNA and RNA elements. Previously, the Nematoda have been reported to have a substantially different diversity of YREs compared to other animal phyla: the Dirs1-like YRE retrotransposon was encountered in most animal phyla but not in Nematoda, and a unique Pat1-like YRE retrotransposon has only been recorded from Nematoda. We explored the diversity of YREs in Nematoda by sampling broadly across the phylum and including 34 genomes representing the three classes within Nematoda. We developed a method to isolate and classify YREs based on both feature organization and phylogenetic relationships in an open and reproducible workflow. We also ensured that our phylogenetic approach to YRE classification identified truncated and degenerate elements, informatively increasing the number of elements sampled. We identified Dirs1-like elements (thought to be absent from Nematoda) in the nematode classes Enoplia and Dorylaimia indicating that nematode model species do not adequately represent the diversity of transposable elements in the phylum. Nematode Pat1-like elements were found to be a derived form of another PAT element that is present more widely in animals. Several sequence features used widely for the classification of YREs were found to be homoplasious, highlighting the need for a phylogenetically-based classification scheme. Nematode model species do not represent the diversity of transposable elements in the phylum.

Genome-wide Scan of Archaic Hominin Introgressions in Eurasians Reveals Complex Admixture History

Genome-wide Scan of Archaic Hominin Introgressions in Eurasians Reveals Complex Admixture History
Ya Hu, Yi Wang, Qiliang Ding, Yungang He, Minxian Wang, Jiucun Wang, Shuhua Xu, Li Jin
Comments: 42 Pages, 1 Table, 4 Figures, 1 Supplementary Table, and 10 Supplementary Figures
Subjects: Populations and Evolution (q-bio.PE)

Introgressions from Neanderthals and Denisovans were detected in modern humans. Introgressions from other archaic hominins were also implicated, however, identification of which poses a great technical challenge. Here, we introduced an approach in identifying introgressions from all possible archaic hominins in Eurasian genomes, without referring to archaic hominin sequences. We focused on mutations emerged in archaic hominins after their divergence from modern humans (denoted as archaic-specific mutations), and identified introgressive segments which showed significant enrichment of archaic-specific mutations over the rest of the genome. Furthermore, boundaries of introgressions were identified using a dynamic programming approach to partition whole genome into segments which contained different levels of archaic-specific mutations. We found that detected introgressions shared more archaic-specific mutations with Altai Neanderthal than they shared with Denisovan, and 60.3% of archaic hominin introgressions were from Neanderthals. Furthermore, we detected more introgressions from two unknown archaic hominins whom diverged with modern humans approximately 859 and 3,464 thousand years ago. The latter unknown archaic hominin contributed to the genomes of the common ancestors of modern humans and Neanderthals. In total, archaic hominin introgressions comprised 2.4% of Eurasian genomes. Above results suggested a complex admixture history among hominins. The proposed approach could also facilitate admixture research across species.

Bayesian Model Comparison in Genetic Association Analysis: Linear Mixed Modeling and SNP Set Testing

Bayesian Model Comparison in Genetic Association Analysis: Linear Mixed Modeling and SNP Set Testing

Xiaoquan Wen
(Submitted on 29 Apr 2014)

We consider the problems of hypothesis testing and model comparison under a flexible Bayesian linear regression model whose formulation is closely connected with the linear mixed effect model and the parametric models for SNP set analysis in genetic association studies. We derive a class of analytic approximate Bayes factors and illustrate their connections with a variety of frequentist test statistics, including the Wald statistic and the variance component score statistic. Taking advantage of Bayesian model averaging and hierarchical modeling, we demonstrate some distinct advantages and flexibilities in the approaches utilizing the derived Bayes factors in the context of genetic association studies. We demonstrate our proposed methods using real or simulated numerical examples in applications of single SNP association testing, multi-locus fine-mapping and SNP set association testin

Characterizing a collective and dynamic component of chromatin immunoprecipitation enrichment profiles in yeast

Characterizing a collective and dynamic component of chromatin immunoprecipitation enrichment profiles in yeast

Lucas D. Ward, Junbai Wang, Harmen J. Bussemaker

Recent chromatin immunoprecipitation (ChIP) experiments in fly, mouse, and human have revealed the existence of high-occupancy target (HOT) regions or “hotspots” that show enrichment across many assayed DNA-binding proteins. Similar co-enrichment observed in yeast so far has been treated as artifactual, and has not been fully characterized. Here we reanalyze ChIP data from both array-based and sequencing-based experiments to show that in the yeast S. cerevisiae, the collective enrichment phenomenon is strongly associated with proximity to noncoding RNA genes and with nucleosome depletion. DNA sequence motifs that confer binding affinity for the proteins are largely absent from these hotspots, suggesting that protein-protein interactions play a prominent role. The hotspots are condition-specific, suggesting that they reflect a chromatin state or protein state, and are not a static feature of underlying sequence. Additionally, only a subset of all assayed factors is associated with these loci, suggesting that the co-enrichment cannot be simply explained by a chromatin state that is universally more prone to immunoprecipitation. Together our results suggest that the co-enrichment patterns observed in yeast represent transcription factor co-occupancy. More generally, they make clear that great caution must be used when interpreting ChIP enrichment profiles for individual factors in isolation, as they will include factor-specific as well as collective contributions.