Human migration and human isolation serve as the driving forces of modern human civilization. Recent migrations of long isolated populations has resulted in genetically admixed populations. The history of population admixture is generally complex; however, understanding the admixture process is critical to both evolutionary and medical studies. Here, we utilized admixture induced linkage disequilibrium (LD) to infer occurrence of continuous admixture events, which is common for most existing admixed populations. Unlike previous studies, we expanded the typical continuous admixture model to a more general admixture scenario with isolation after a certain duration of continuous gene flow, and we demonstrated that such treatment significantly improved the accuracy of inference under complex admixture scenarios. Based on the extended models, we developed a method based on weighted LD to infer the admixture history considering continuous and complex demographic process of gene flow between populations. We evaluated the performance of the method by computer simulation and applied our method to real data analysis of a few well-known admixed populations.
Monthly Archives: September 2015
An Accurate Genetic Clock
A majority of human accelerated regions represents highly conserved in non-human primates DNA sequences lacking evidence of human-specific mutations
The sequence quality of reference genome databases is essential for the accurate definition of regulatory DNA segments as candidate human specific regulatory sequences (HSRS). It is unclear how database improvements would affect the validity of the HSRS definition. Here, sequence conservation analysis of 15,371 candidate HSRS was carried out using the most recent releases of reference genomes databases of humans and nonhuman primates (NHP) defining the conservation threshold as the minimum ratio of bases that must remap of 1.00. This analysis revealed that 2,262 of 2,739 (82.6%) sequences of human accelerated regions lack evidence of human-specific mutations and appear highly conserved in humans and NHP. Similarly, the majority (404 of 524; 77.1%) of human accelerated DNase hypersensitive sites represents highly conserved in humans and NHP regulatory sequences lacking evidence of human-specific mutations. Present analysis revealed a major database refinements effect on the validity of HSRS definition and suggests that human-specific phenotypes may evolve as a results of integration into human-specific genomic regulatory networks of both conserved in NHP and human-specific genomic regulatory elements.
Population genomics of intrapatient HIV-1 evolution
Population genomics of intrapatient HIV-1 evolution
Fabio Zanini, Johanna Brodin, Lina Thebo, Christa Lanz, Göran Bratt, Jan Albert, Richard A. Neher
Many microbial populations rapidly adapt to changing environments with multiple variants competing for survival. To quantify such complex evolutionary dynamics in vivo, time resolved and genome wide data including rare variants are essential. We performed whole-genome deep sequencing of HIV-1 populations in 9 untreated patients, with 6-12 longitudinal samples per patient spanning 5-8 years of infection. We show that patterns of minor diversity are reproducible between patients and mirror global HIV-1 diversity, suggesting a universal landscape of fitness costs that control diversity. Reversions towards the ancestral HIV-1 sequence are observed throughout infection and account for almost one third of all sequence changes. Reversion rates depend strongly on conservation. Frequent recombination limits linkage disequilibrium to about 100bp in most of the genome, but strong hitch-hiking due to short range linkage limits diversity.
Hierarchy and extremes in selections from pools of randomized proteins
Hierarchy and extremes in selections from pools of randomized proteins
Sébastien Boyer, Dipanwita Biswas, Ananda Kumar Soshee, Natale Scaramozzino, Clément Nizak, Olivier Rivoire
Variation and selection are the core principles of Darwinian evolution, yet quantitatively relating the diversity of a population to its capacity to respond to selection is challenging. Here, we examine this problem at a molecular level in the context of populations of partially randomized proteins selected for binding to well-defined targets. We built several minimal protein libraries, screened them in vitro by phage display and analyzed their response to selection by high-throughput sequencing. A statistical analysis of the results reveals two main findings: first, libraries with same sequence diversity but built around different “frameworks” typically have vastly different responses, second, the distribution of responses within a library follows a simple scaling law. We show how an elementary probabilistic model based on extreme value theory rationalizes these findings. Our results have implications for designing synthetic protein libraries, for estimating the density of functional biomolecules in sequence space, for characterizing diversity in natural populations and for experimentally investigating the concept of evolvability, or potential for future evolution.
Stability of Underdominant Genetic Polymorphisms in Population Networks
Stability of Underdominant Genetic Polymorphisms in Population Networks
Áki J. Láruson, Floyd A. Reed
Heterozygote disadvantage is potentially a potent driver of population genetic divergence. Also referred to as underdominance, this phenomena describes a situation where a genetic heterozygote has a lower overall fitness than either homozygote. Attention so far has mostly been given to underdominance within a single population and the maintenance of genetic differences between two populations exchanging migrants. Here we explore the dynamics of an underdominant system in a network of multiple discrete, yet interconnected, populations. Stability of genetic differences in response to increases in migration in various topological networks is assessed. The network topology can have a dominant and occasionally non-intuitive influence on the genetic stability of the system. Applications of these results to theories of speciation, population genetic engineering, and general dynamical systems are described.
Author post: Limits to adaptation in partially selfing species
This guest post is by Matthew Hartfield (@mathyhartfield) on his preprint (with Sylvain Glemin) “Limits to adaptation in partially selfing species”, available from bioRxiv here
Our paper “Limits to adaptation in partially selfing species” is now available from bioRxiv. This preprint is the result from a collaboration that has been sent back-and-forth across the Atlantic for well over a year, so we are pleased to see it online.
Haldane’s Sieve, after which this blog is named, is a theory pertaining to the role of dominance in adaptation, which was initially developed for outcrossing species and then shown to be absent in selfing species. When beneficial alleles initially appear in diploid individuals, they do so in heterozygote form (so only one of two alleles at the locus carry the advantageous type). Mathematically, these mutations have selective advantage 1 + hs where h is the degree of dominance, and s the selective advantage. Haldane’s Sieve states that recessive mutations (h 1/2), because selection is not efficient on heterozygotes if mutations are recessive. However, self-fertilising individuals are able to rapidly create homozygote forms of the mutant, increasing the efficacy of selection acting on them. Yet selfing also increases genetic drift, and hence the risk that these adaptations will go extinct by chance. Consequently, an extension of Haldane’s Sieve states that if the mutation is recessive (h 1/2).
This result holds for a single mutant in isolation. Yet mutants seldom act independently; they usually arise alongside other alleles in the genome, each of which has their own evolutionary outcomes. A known additional advantage of outcrossing is that, through recombining genomes from each parent, selected alleles can be moved from disadvantageous genomes to fitter backgrounds. For example, say an adaptive allele was present in a population, and a second adaptation arose at a nearby locus. If the second allele was not as strongly selected as the first, then it has to arise on the same genome as the initial adaptation. Otherwise it is likely to be lost as the less-fit genotype is replaced over time, a process known as selective interference. However, outcrossing can unite the two mutations into the same genome, so both can spread.
Despite these potential advantages of outcrossing, the effect of selective interference has not yet been investigated in the context of how facultative selfing influences the fixation of multiple beneficial alleles. Our model therefore aimed to determine how likely it is that secondary beneficial alleles can fix in the population, given an existing adaptation was already present, and reproduction involved a certain degree of self-fertilisation.
After working through the calculations, two subtle yet important twists on Haldane’s Sieve revealed themselves. First, due to the effects of selection interference, Haldane’s Sieve is likely to be reinforced in areas of low recombination. That is, recessive mutants are more likely to be lost in outcrossers (when compared to single-locus results), with similar losses for dominant mutations in self-fertilising organisms. Secondly, we also investigated a case where the second beneficial mutant could be reintroduced by recurrent mutation. In this case, selection interference can be very severe in selfers due to the lack of recombination. Hence some degree of outcrossing would be optimal to prevent these beneficial alleles from being repeatedly lost, even if they are recessive. In the most extreme case, complete outcrossing is best if secondary mutations only confer minor advantages.
In recent years, the role that selection interference plays in affecting mating system evolution is starting to become recognised. Our theoretical study is just one of many that elucidates how important outcrossing can be in augmenting the efficacy of selection. Our hope is that these studies will spur on further empirical work quantifying the rate of adaptation in species with different mating systems, to further unravel why species reproduce in vastly different ways.
On Tree Based Phylogenetic Networks
On Tree Based Phylogenetic Networks
Louxin Zhang
A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree based networks. Reticulation-visible networks and child-sibling networks are all tree based. In this work, we present a simply necessary and sufficient condition for tree-based networks and prove that there is a universal tree based network for each set of species such that every phylogenetic tree on the same species is a base of this network. The existence of universal tree based network implies that for any given set of phylogenetic trees (resp. clusters) on the same species there exists a tree base network that display all of them.
piecewiseSEM: Piecewise structural equation modeling in R for ecology, evolution, and systematics
piecewiseSEM: Piecewise structural equation modeling in R for ecology, evolution, and systematics
Jonathan S. Lefcheck
Ecologists and evolutionary biologists are relying on an increasingly sophisticated set of statistical tools to describe complex natural systems. One such tool that has gained increasing traction in the life sciences is structural equation modeling (SEM), a variant of path analysis that resolves complex multivariate relationships among a suite of interrelated variables. SEM has historically relied on covariances among variables, rather than the values of the data points themselves. While this approach permits a wide variety of model forms, it limits the incorporation of detailed specifications. Here, I present a fully-documented, open-source R package piecewiseSEM that builds on the base R syntax for all current generalized linear, least-square, and mixed effects models. I also provide two worked examples: one involving a hierarchical dataset with non-normally distributed variables, and a second involving phylogenetically-independent contrasts. My goal is to provide a user-friendly and tractable implementation of SEM that also reflects the ecological and methodological processes generating data.