Tandem duplications and the limits of natural selection in Drosophila yakuba and Drosophila simulans

Tandem duplications and the limits of natural selection in Drosophila yakuba and Drosophila simulans
Rebekah L Rogers, Julie M Cridland, Ling Shao, Tina T Hu, Peter Andolfatto, Kevin R Thornton
Subjects: Populations and Evolution (q-bio.PE)

Tandem duplications are an essential source of genetic novelty, and their prevalence in natural populations is expected to influence the trajectory of adaptive walks. Here, we describe evolutionary impacts of recently-derived, segregating tandem duplications in Drosophila yakuba and Drosophila simulans. We observe an excess of duplicated genes involved in defense against pathogens, chorion development, cuticular peptides, and lipases or endopeptidases associated with the accessory glands, as well as insecticide metabolism, suggesting that duplications function in Red Queen dynamics and rapid evolution. We observe evidence of widespread selection on the D. simulans X, suggesting adaptation through duplication is common on the X. Though we find many high frequency variants, duplicates display an excess of low frequency variants consistent with largely detrimental impacts, limiting the variation that can effectively facilitate adaptation. Although we observe hundreds of gene duplications, we show that segregating variation is insufficient to provide duplicate copies of the entire genome, and the number of duplications in the population spans 13.4% of major chromosome arms in D. yakuba and 9.7% in D. simulans. Whole gene duplication rates are low at $1.1 \times 10^{-9}$ in D. yakuba and $6.1 \times 10^{-9}$ in D. simulans, suggesting long wait times for new mutations. Hence, if adaptive processes are dependent on individual duplications, evolution will be severely limited by mutation. Hence, parallel recruitment of the same duplicated gene in different species will be rare and standing variation will define evolutionary outcomes, in spite of convergence across rapidly evolving phenotypes.

Consistency of the Maximum Likelihood Estimator of Evolutionary Tree

Consistency of the Maximum Likelihood Estimator of Evolutionary Tree
Arindam RoyChoudhury
Subjects: Populations and Evolution (q-bio.PE)

Maximum likelihood estimation (MLE) methods are widely used for evolutionary tree. As evolutionary tree is not a smooth parameter, the consistency of its MLE has been a topic of debate. It has been noted without proof that the classical proof of consistency by Wald holds for the MLE of evolutionary tree. Other proofs of consistency under various models were also proposed. Here we will discuss some shortcomings in some of these proofs and comment on the applicability of Wald’s proof.

Author post: Spatial localization of recent ancestors for admixed individuals

A guest post by Bogdan Pasaniuc [@bpasaniuc] on his paper with coauthors: Spatial localization of recent ancestors for admixed individuals by Wen-Yun Yang, Alexander Platt, Charleston Wen-Kai Chiang, Eleazar Eskin, John Novembre, Bogdan Pasaniuc. bioRxived here.

Geographic localization based on genetic data has received much attention recently. Here we present a preprint that aims to address one of the drawbacks of existing approaches. As opposed to existing works that typically make a very strong assumption that all recent ancestors come from the same location on a map, we seek to infer multiple locations for a given individual corresponding to its ancestors. That is, our approach uses genetic data from a given individual to localize on the map its recent ancestors several generations ago (e.g. grandparents).

To accomplish this we approximate the admixture process (i.e. mixing of genetic variants from different sources) in a genetic-geographic continuum. We view the mixed ancestry genome as being generated from several locations on a map (corresponding to its recent ancestors) and model the mosaic structure of local ancestries across the genome through an admixture HMM. We link geography to the admixture process by allowing allele frequencies at every site in the genome to vary across geography according to a logistic gradient function (as in SPA[1]); the complete model is an admixture HMM for a genotype-specific pair of ancestral locations on the map.

As the number of generations since admixture increases the total number of ancestors to localize increases dramatically making the inference infeasible (http://gcbias.org/2013/11/11/how-does-your-number-of-genetic-ancestors-grow-back-over-time/). To account for this, we limit the number of different “ancestry locations” that contribute to admixture to a small constant, each with varying amount of contribution. We devise efficient algorithms to make inferences in our model and show that accuracy decreases with number of locations to infer, with number of generations in the admixture and with geographic distance among ancestors. For example, SPAMIX can localize the grandparents of the POPRES[2] individuals with multiple sub-continental European ancestries within 470Km of their reported locations.

As with all methods, limitations do exist and we outline several here. We use logistic gradient functions to relate geography to genetics and investigating more complex functions may prove fruitful. We developed an efficient algorithm for producing point estimates for location and locus-specific ancestry; in some cases a probabilistic output may be desired. Finally, our approach models admixture-LD and assumes no background LD; more involved procedures to model background LD (such as the one we proposed [3]) is an interesting area of research.

1. Yang, Wen-Yun, et al. “A model-based approach for analysis of spatial structure in genetic data.” Nature genetics 44.6 (2012): 725-731.
2. Nelson, Matthew R., et al. “The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research.” The American Journal of Human Genetics 83.3 (2008): 347-358.
3. Baran, Yael, et al. “Enhanced localization of genetic samples through linkage-disequilibrium correction.” The American Journal of Human Genetics 92.6 (2013): 882-894.

Spatial localization of recent ancestors for admixed individuals

Spatial localization of recent ancestors for admixed individuals
Wen-Yun Yang, Alexander Platt, Charleston Wen-Kai Chiang, Eleazar Eskin, John Novembre, Bogdan Pasaniuc

Ancestry analysis from genetic data plays a critical role in studies of human disease and evolution. Recent work has introduced explicit models for the geographic distribution of genetic variation and has shown that such explicit models yield superior accuracy in ancestry inference over non-model-based methods. Here we extend such work to introduce a method that models admixture between ancestors from multiple sources across a geographic continuum. We devise efficient algorithms based on hidden Markov models to localize on a map the recent ancestors (e.g. grandparents) of admixed individuals, joint with assigning ancestry at each locus in the genome. We validate our methods using empirical data from individuals with mixed European ancestry from the POPRES study and show that our approach is able to localize their recent ancestors within an average of 470Km of the reported locations of their grandparents. Furthermore, simulations from real POPRES genotype data show that our method attains high accuracy in localizing recent ancestors of admixed individuals in Europe (an average of 550Km from their true location for localization of 2 ancestries in Europe, 4 generations ago). We explore the limits of ancestry localization under our approach and find that performance decreases as the number of distinct ancestries and generations since admixture increases. Finally, we build a map of expected localization accuracy across admixed individuals according to the location of origin within Europe of their ancestors.

Comparison of Y-chromosomal lineage dating using either evolutionary or genealogical Y-STR mutation rates

Comparison of Y-chromosomal lineage dating using either evolutionary or genealogical Y-STR mutation rates

Chuan-Chao Wang, Li Hui

We have compared the Y chromosomal lineage dating between sequence data and commonly used Y-SNP plus Y-STR data. The coalescent times estimated using evolutionary Y-STR mutation rates correspond best with sequence-based dating when the lineages include the most ancient haplogroup A individuals. However, the times using slow mutated STR markers with genealogical rates fit well with sequence-based estimates in main lineages, such as haplogroup CT, DE, K, NO, IJ, P, E, C, I, J, N, O, and R. In addition, genealogical rates lead to more plausible time estimates for Neolithic coalescent sublineages compared with sequence-based dating.

Detection and Polarization of Introgression in a Five-taxon Phylogeny

Detection and Polarization of Introgression in a Five-taxon Phylogeny
James B Pease, Matthew W. Hahn

In clades of closely related taxa, discordant genealogies due to incomplete lineage sorting (ILS) can complicate the detection of introgression. The D-statistic (a.k.a. the ABBA/BABA test) was proposed to infer introgression in the presence of ILS for a four-taxon clade. However, the original D-statistic cannot be directly applied to a symmetric five-taxon phylogeny, and the direction of introgression cannot be inferred for any tree topology. Here we explore the issues associated with previous methods for adapting the D-statistic to a larger tree topology, and propose new “DFOIL” tests to infer both the taxa involved in and the direction of introgressions for a symmetric five-taxon phylogeny. Using theory and simulations, we find that previous modifications of the D-statistic to five-taxon phylogenies incorrectly identify both the pairs of taxa exchanging migrants as well as the direction of introgression. The DFOIL statistics are shown to overcome this deficiency and to correctly determine the direction of introgressions. The DFOIL tests are relatively simple and computationally inexpensive to calculate, and can be easily applied to various phylogenomic datasets. In addition, our general approach to the problem of introgression detection could be adapted to larger tree topologies and other models of sequence evolution.

The Landscape of Human STR Variation

The Landscape of Human STR Variation
Thomas F. Willems, Melissa Gymrek, Gareth Highnam, The 1000 Genomes Project The 1000 Genomes Project, David Mittelman, Yaniv Erlich

Short Tandem Repeats are among the most polymorphic loci in the human genome. These loci play a role in the etiology of a range of genetic diseases and have been frequently utilized in forensics, population genetics, and genetic genealogy. Despite this plethora of applications, little is known about the variation of most STRs in the human population. Here, we report the largest-scale analysis of human STR variation to date. We collected information for nearly 700,000 STR loci across over 1,000 individuals in phase 1 of the 1000 Genomes Project. This process nearly saturated common STR variations. After employing a series of quality controls, we utilize this call set to analyze determinants of STR variation, assess the human reference genome?s representation of STR alleles, find STR loci with common loss-of-function alleles, and obtain initial estimates of the linkage disequilibrium between STRs and common SNPs. Overall, these analyses further elucidate the scale of genetic variation beyond classical point mutations. The resource is publicly available at http://strcat.teamerlich.org/ both in raw format and via a graphical interface. 

Evolution of bow-tie architectures in biology

Evolution of bow-tie architectures in biology
Tamar Friedlander, Avraham E. Mayo, Tsvi Tlusty, Uri Alon
Subjects: Molecular Networks (q-bio.MN)

Bow-tie or hourglass structure is a common architectural feature found in biological and technological networks. A bow-tie in a multi-layered structure occurs when intermediate layers have much fewer components than the input and output layers. Examples include metabolism where a handful of building blocks mediate between multiple input nutrients and multiple output biomass components, and signaling networks where information from numerous receptor types passes through a small set of signaling pathways to regulate multiple output genes. Little is known, however, about how bow-tie architectures evolve. Here, we address the evolution of bow-tie architectures using simulations of multi-layered systems evolving to fulfill a given input-output goal. We find that bow-ties spontaneously evolve when two conditions are met: (i) the evolutionary goal is rank deficient, where the rank corresponds to the minimal number of input features on which the outputs depend, and (ii) The effects of mutations on interaction intensities between components are described by product rule – namely the mutated element is multiplied by a random number. Product-rule mutations are more biologically realistic than the commonly used sum-rule mutations that add a random number to the mutated element. These conditions robustly lead to bow-tie structures. The minimal width of the intermediate network layers (the waist or knot of the bow-tie) equals the rank of the evolutionary goal. These findings can help explain the presence of bow-ties in diverse biological systems, and can also be relevant for machine learning applications that employ multi-layered networks.

Most viewed on Haldane’s Sieve: April 2014

The most viewed posts on Haldane’s Sieve in April 2014 were:

The evolution of tyrosine-recombinase elements in Nematoda

The evolution of tyrosine-recombinase elements in Nematoda
Amir Szitenberg, Georgios Koutsovoulos, Mark L Blaxter, David H Lunt
Comments: 18 pages
Subjects: Populations and Evolution (q-bio.PE); Genomics (q-bio.GN)

Transposable elements can be categorised into DNA and RNA elements based on their mechanism of transposition. Tyrosine recombinase elements (YREs) are relatively rare and poorly understood, despite sharing characteristics with both DNA and RNA elements. Previously, the Nematoda have been reported to have a substantially different diversity of YREs compared to other animal phyla: the Dirs1-like YRE retrotransposon was encountered in most animal phyla but not in Nematoda, and a unique Pat1-like YRE retrotransposon has only been recorded from Nematoda. We explored the diversity of YREs in Nematoda by sampling broadly across the phylum and including 34 genomes representing the three classes within Nematoda. We developed a method to isolate and classify YREs based on both feature organization and phylogenetic relationships in an open and reproducible workflow. We also ensured that our phylogenetic approach to YRE classification identified truncated and degenerate elements, informatively increasing the number of elements sampled. We identified Dirs1-like elements (thought to be absent from Nematoda) in the nematode classes Enoplia and Dorylaimia indicating that nematode model species do not adequately represent the diversity of transposable elements in the phylum. Nematode Pat1-like elements were found to be a derived form of another PAT element that is present more widely in animals. Several sequence features used widely for the classification of YREs were found to be homoplasious, highlighting the need for a phylogenetically-based classification scheme. Nematode model species do not represent the diversity of transposable elements in the phylum.