Understanding both the role of selection in driving phenotypic change and its underlying genetic basis remain major challenges in evolutionary biology. Here we focus on a classic system of local adaptation in the North American deer mouse, Peromyscus maniculatus, which occupies two main habitat types, prairie and forest. Using historical collections we demonstrate that forest-dwelling mice have longer tails than those from non-forested habitats, even when we account for individual and population relatedness. Based on genome-wide SNP capture data, we find that mice from forested habitats in the eastern and western parts of their range form separate clades, suggesting that increased tail length evolved independently from a short-tailed ancestor. Two major changes in skeletal morphology can give rise to longer tails–increased number and increased length of vertebrae–and we find that forest mice in the east and west have both more and longer caudal vertebrae, but not trunk vertebrae, than nearby prairie forms. Using a second-generation intercross between a prairie and forest pair, we show that the number and length of caudal vertebrae are not correlated in this recombinant population, suggesting that variation in these traits is controlled by separate genetic loci. Together, these results demonstrate convergent evolution of the long-tailed forest phenotype through multiple, distinct genetic mechanisms (controlling vertebral length and vertebral number), thus suggesting that these morphological changes–either independently or together–are adaptive.
A crucial component of major transitions theory is that after the transition, adaptation occurs primarily at the level of the new, higher-level unit. For collective-level adaptations to occur, though, collective-level traits must be heritable. Since collective-level traits are functions of lower-level traits, collective-level heritability is related to particle-level heritability. However, the nature of this relationship has rarely been explored in the context of major transitions. We examine relationships between particle-level heritability and collective-level heritability for several functions that express collective-level traits in terms of particle-level traits. When this relationship is linear, the heritability of a collective-level trait is never less than that of the corresponding particle-level trait and is higher under most conditions. For more complicated functions, collective-level heritability is higher under most conditions, but can be lower when the function relating particle to cell-level traits is sensitive to small fluctuations in the state of the particles within the collective. Rather than being an impediment to major transitions, we show that collective-level heritability superior to that of the lower-level units can often arise ‘for free’, simply as a byproduct of collective formation.
Correctly estimating the age of a gene or gene family is important for a variety of fields, including molecular evolution, comparative genomics, and phylogenetics, and increasingly for systems biology and disease genetics. However, most studies use only a point estimate of a gene’s age, neglecting the substantial uncertainty involved in this estimation. Here, we characterize this uncertainty by investigating the effect of algorithm choice on gene-age inference and calculate consensus gene ages with attendant error distributions for a variety of model eukaryotes. We use thirteen orthology inference algorithms to create gene-age datasets and then characterize the error around each age-call on a per-gene and per-algorithm basis. Systematic error was found to be a large factor in estimating gene age, suggesting that simple consensus algorithms are not enough to give a reliable point estimate. We also found that different sources of error can affect downstream analyses, such as gene ontology enrichment. Our consensus gene-age datasets, with associated error terms, are made fully available at so that researchers can propagate this uncertainty through their analyses (https://github.com/marcottelab/Gene-Ages).
Genotypic fitness landscapes are constructed by assessing the fitness of all possible combinations of a given number of mutations. In the last years, several experimental fitness landscapes have been completely resolved. As fitness landscapes are high-dimensional, simple measures of their structure are used as statistics in empirical applications. Epistasis is one of the most relevant features of fitness landscapes. Here we propose a new natural measure of the amount of epistasis based on the correlation of fitness effects of mutations. This measure has a natural interpretation, captures well the interaction between mutations and can be obtained analytically for most landscape models. We discuss how this measure is related to previous measures of epistasis (number of peaks, roughness/slope, fraction of sign epistasis, Fourier-Walsh spectrum) and how it can be easily extended to landscapes with missing data or with fitness ranks only. Furthermore, the dependence of the correlation of fitness effects on mutational distance contains interesting information about the patterns of epistasis. This dependence can be used to uncover the amount and nature of epistatic interactions in a landscape or to discriminate between different landscape models.
Conserved genes evolve slowly in nature, by definition, but we find that some conserved genes are among the fastest-evolving genes in the long-term evolution experiment with Escherichia coli (LTEE). We identified the set of almost 2000 core genes shared among sixty clinical, environmental, and laboratory strains of E. coli. During the LTEE, these core genes accumulated significantly more nonsynonymous mutations than did flexible (i.e., noncore) genes after accounting for the mutational target size. Furthermore, the core genes under strongest positive selection in the LTEE are more conserved in nature than the average core gene based both on sequence diversity among E. coli strains and divergence between E. coli and Salmonella enterica. We conclude that the conditions of the LTEE are novel for E. coli, at least in relation to the long sweep of its evolution in nature. We suggest that what is most novel about the LTEE for the bacteria is the constancy of the environment, its biophysical simplicity, and the absence of microbial competitors, predators, and parasites.
We investigate the dependence of the site frequency spectrum (SFS) on the topological structure of genealogical trees. We show that basic population genetic statistics — for instance estimators of θ or neutrality tests such as Tajima’s D — can be decomposed into components of waiting times between coalescent events and of tree topology. Our results clarify the relative impact of the two components on these statistics. We provide a rigorous interpretation of positive or negative values of neutrality tests in terms of the underlying tree shape. In particular, we show that values of Tajima’s D and Fay and Wu’s H depend in a direct way on a measure of tree balance which is mostly determined by the root balance of the tree. We also compute the maximum and minimum values for neutrality tests as a function of sample size. Focusing on the standard coalescent model of neutral evolution, we discuss how waiting times between coalescent events are related to derived allele frequencies and thereby to the frequency spectrum. Finally, we show how tree balance affects the frequency spectrum. In particular, we derive the complete SFS conditioned on the root imbalance. We show that the conditional spectrum is peaked at frequencies corresponding to the root imbalance and strongly biased towards rare alleles.
To examine the role of natural selection on fecundity in a variety ofCaenorhabditis elegans genetic backgrounds, we used an experimental evolution protocol to evolve 14 distinct genetic strains over 15-20 generations. Beginning with three founder worms for each strain, we were able to generate 790 distinct genealogies, which provided information on both the effects of natural selection and the evolvability of each strain. Among these genotypes are a wildtype (N2) and a collection of mutants with targeted mutations in the daf-c, daf-d, and AMPK pathways. The overarching goal of our analysis is two-fold: to observe differences in reproductive fitness and observe related changes in reproductive timing. This yields two outcomes. The first is that the majority of selective effects on fecundity occur during the first few generations of evolution, while the negative selection for reproductive timing occurs on longer timescales. The second finding reveals that positive selection on fecundity results in positive and negative selection on reproductive timing, both of which are strain-dependent. Using a derivative of population size per generation called the reproductive carry-over (RCO) measure, it is found that the fluctuation and shape of the probability distribution may be informative in terms of developmental selection. While these consist of general patterns that transcend mutations in a specific gene, changes in the RCO measure may nevertheless be products of selection. In conclusion, we discuss the broader implications of these findings, particularly in the context of genotype-fitness maps and the role of uncharacterized mutations in individual variation and evolvability.