Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits

Power analysis of artificial selection experiments using efficient whole genome simulation of quantitative traits

Darren Kessner, John Novembre

Evolve and resequence studies combine artificial selection experiments with massively parallel sequencing technology to study the genetic basis for complex traits. In these experiments, individuals are selected for extreme values of a trait, causing alleles at quantitative trait loci (QTLs) to increase or decrease in frequency in the experimental population. We present a new analysis of the power of artificial selection experiments to detect and localize quantitative trait loci. This analysis uses a simulation framework that explicitly models whole genomes of individuals, quantitative traits, and selection based on individual trait values. We find that explicitly modeling QTL provides produces qualitatively different insights than considering independent loci with constant selection coefficients. Specifically, we observe how interference between QTLs under selection impacts the trajectories and lengthens the fixation times of selected alleles. We also show that a substantial portion of the genetic variance of the trait (50–100%) can be explained by detected QTLs in as little as 20 generations of selection, depending on the trait architecture and experimental design. Furthermore, we show that power depends crucially on the opportunity for recombination during the experiment. Finally, we show that an increase in power is obtained by leveraging founder haplotype information to obtain allele frequency estimates.

Genomic, transcriptomic and phenomic variation reveals the complex adaptation of modern maize breeding

Genomic, transcriptomic and phenomic variation reveals the complex adaptation of modern maize breeding

Haijun Liu, Xiaqing Wang, Marilyn Warburton, Weiwei Wen, Minliang Jin, Min Deng, Jie Liu, Hao Tong, Qingchun Pan, Xiaohong Yang, Jianbing Yan

The temperate-tropical division of early maize germplasm to different agricultural environments was arguably the greatest adaptation process associated with the success and near ubiquitous importance of global maize production. Deciphering this history is challenging, but new insight has been gained from the genomic, transcriptomic and phenotypic variation collected from 368 diverse temperate and tropical maize inbred lines in this study. This is the first attempt to systematically explore the mechanisms of the adaptation process. Our results indicated that divergence between tropical and temperate lines seem occur 3,400-6,700 years ago. A number of genomic selection signals and transcriptomic variants including differentially expressed individual genes and rewired co-expression networks of genes were identified. These candidate signals were found to be functionally related to stress response and most were associated with directionally selected traits, which may have been an advantage under widely varying environmental conditions faced by maize as it was migrated away from its domestication center. It?s also clear in our study that such stress adaptation could involve evolution of protein-coding sequences as well as transcriptome-level regulatory changes. This latter process may be a more flexible and dynamic way for maize to adapt to environmental changes over this dramatically short evolutionary time frame.

Natural variation in teosinte at the domestication locus teosinte branched1 (tb1)

Natural variation in teosinte at the domestication locus teosinte branched1 (tb1)

Laura Vann, Thomas Kono, Tanja Pyha ̈j ̈arvi, Matthew B Hufford, Jeffrey Ross-Ibarra

Premise of the study: The teosinte branched1 (tb1) gene is a major QTL controlling branching differences between maize and its wild progenitor, teosinte. The insertion of a transposable element (Hopscotch) upstream of tb1 is known to enhance the gene’s expression, causing reduced tillering in maize. Observations of the maize tb1 allele in teosinte and estimates of an insertion age of the Hopscotch that predates domestication led us to investigate its prevalence and potential role in teosinte. Methods: Prevalence of the Hopscotch element was assessed across an Americas-wide sample of 1110 maize and teosinte individuals using a co-dominant PCR assay. Population genetic summaries were calculated for a subset of individuals from four teosinte populations in central Mexico. Phenotypic data were also collected from a single teosinte population where Hopscotch was found segregating. Key results: Genotyping results suggest the Hopscotch element is at higher than expected frequency in teosinte. Analysis of linkage disequilibrium near tb1 does not support recent introgression of the Hopscotch allele from maize into teosinte. Population genetic signatures are consistent with selection on this locus revealing a potential ecological role for Hopscotch in teosinte. Finally, two greenhouse experiments with teosinte do not suggest tb1 controls tillering in natural populations. Conclusions: Our findings suggest the role of Hopscotch differs between maize and teosinte. Future work should assess tb1 expression levels in teosinte with and without the Hopscotch and more comprehensively phenotype teosinte to assess the ecological significance of the Hopscotch insertion and, more broadly, the tb1 locus in teosinte. Key words: domestication; maize; teosinte; teosinte branched1; transposable element

How the tortoise beats the hare: Slow and steady adaptation in structured populations suggests a rugged fitness landscape in bacteria

How the tortoise beats the hare: Slow and steady adaptation in structured populations suggests a rugged fitness landscape in bacteria

Joshua R. Nahum, Peter Godfrey-Smith, Brittany N. Harding, Joseph H. Marcus, Jared Carlson-Stevermer, Benjamin Kerr

In the context of Wright’s adaptive landscape, genetic epistasis can yield a multi-peaked or “rugged” topography. In an unstructured population, a lineage with selective access to multiple peaks is expected to rapidly fix on one, which may not be the highest peak. Contrarily, beneficial mutations in a population with spatially restricted migration take longer to fix, allowing distant parts of the population to explore the landscape semi-independently. Such a population can simultaneous discover multiple peaks and the genotype at the highest discovered peak is expected to fix eventually. Thus, structured populations sacrifice initial speed of adaptation for breadth of search. As in the Tortoise-Hare fable, the structured population (Tortoise) starts relatively slow, but eventually surpasses the unstructured population (Hare) in average fitness. In contrast, on single-peak landscapes (e.g., systems lacking epistasis), all uphill paths converge. Given such “smooth” topography, breadth of search is devalued, and a structured population only lags behind an unstructured population in average fitness (ultimately converging). Thus, the Tortoise-Hare pattern is an indicator of ruggedness. After verifying these predictions in simulated populations where ruggedness is manipulable, we then explore average fitness in metapopulations of Escherichia coli. Consistent with a rugged landscape topography, we find a Tortoise-Hare pattern. Further, we find that structured populations accumulate more mutations, suggesting that distant peaks are higher. This approach can be used to unveil landscape topography in other systems, and we discuss its application for antibiotic resistance, engineering problems, and elements of Wright’s Shifting Balance Process.

Predicting evolution from the shape of genealogical trees

Predicting evolution from the shape of genealogical trees

Richard A. Neher, Colin A. Russell, Boris I. Shraiman
(Submitted on 3 Jun 2014)

Given a sample of genome sequences from an asexual population, can one predict its evolutionary future? Here we demonstrate that the branching pattern of reconstructed genealogical trees contains information about the relative fitness of the sampled sequences and that this information can be used to infer the closest extant relative of future populations. Our approach is based on the assumption that evolution proceeds predominantly by accumulation of small effect mutations and does not require any species specific input. Hence, the resulting inference algorithm can be applied to any asexual population under persistent selection pressure. We demonstrate its performance using historical data on seasonal influenza A/H3N2 virus. We predict the progenitor lineage of the upcoming influenza season with near optimal performance in 30% of cases and makes informative predictions in 16 out of 18 years. Beyond providing a practical tool for prediction, our results suggest that continuous adaptation by small effect mutations is a major component of influenza virus evolution.

Epidemic reconstruction in a phylogenetics framework: transmission trees as partitions

Epidemic reconstruction in a phylogenetics framework: transmission trees as partitions

Matthew Hall, Andrew Rambaut
(Submitted on 2 Jun 2014)

The reconstruction of transmission trees for epidemics from genetic data has been the subject of some recent interest. It has been demonstrated that the transmission tree structure can be investigated by augmenting internal nodes of a phylogenetic tree constructed using pathogen sequences from the epidemic with information about the host that held the corresponding lineage. In this paper, we note that this augmentation is equivalent to a correspondence between transmission trees and partitions of the phylogenetic tree into connected subtrees each containing one tip, and provide a framework for Markov Chain Monte Carlo inference of phylogenies that are partitioned in this way, giving a new method to co-estimate both trees. The procedure is integrated in the existing phylogenetic inference package BEAST.

Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera

Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera

Brant C. Faircloth, Michael G. Branstetter, Noor D. White, Seán G. Brady
(Submitted on 2 Jun 2014)

Gaining a genomic perspective on phylogeny requires the collection of data from many putatively independent loci collected across the genome. Among insects, an increasingly common approach to collecting this class of data involves transcriptome sequencing, because few insects have high-quality genome sequences available; assembling new genomes remains a limiting factor; the transcribed portion of the genome is a reasonable, reduced subset of the genome to target; and the data collected from transcribed portions of the genome are similar in composition to the types of data with which biologists have traditionally worked (e.g., exons). However, molecular techniques requiring RNA as a template are limited to using very high quality source materials, which are often unavailable from a large proportion of biologically important insect samples. Recent research suggests that DNA-based target enrichment of conserved genomic elements offers another path to collecting phylogenomic data across insect taxa, provided that conserved elements are present in and can be collected from insect genomes. Here, we identify a large set (n=1510) of ultraconserved elements (UCE) shared among the insect order Hymenoptera. We use in silico analyses to show that these loci accurately reconstruct relationships among genome-enabled Hymenoptera, and we design a set of baits for enriching these loci that researchers can use with DNA templates extracted from a variety of sources. We use our UCE bait set to enrich an average of 721 UCE loci from 30 hymenopteran taxa, and we use these UCE loci to reconstruct phylogenetic relationships spanning very old (≥220 MYA) to very young (≥1 MYA) divergences among hymenopteran lineages. In contrast to a recent study addressing hymenopteran phylogeny using transcriptome data, we found ants to be sister to all remaining aculeate lineages with complete support.

The most parsimonious tree for random data

The most parsimonious tree for random data

Mareike Fischer, Michelle Galla, Lina Herbst, Mike Steel
(Submitted on 1 Jun 2014)

Applying a method to reconstruct a phylogenetic tree from random data provides a way to detect whether that method has an inherent bias towards certain tree `shapes’. For maximum parsimony, applied to a sequence of random 2-state data, each possible binary phylogenetic tree has exactly the same distribution for its parsimony score. Despite this pleasing and slightly surprising symmetry, some binary phylogenetic trees are more likely than others to be a most parsimonious (MP) tree for a sequence of k such characters, as we show. For k=2, and unrooted binary trees on six taxa, any tree with a caterpillar shape has a higher chance of being an MP tree than any tree with a symmetric shape. On the other hand, if we take any two binary trees, on any number of taxa, we prove that this bias between the two trees vanishes as the number of characters grows. However, again there is a twist: MP trees on six taxa are more likely to have certain shapes than a uniform distribution on binary phylogenetic trees predicts, and this difference does not appear to dissipate as k grows.

A field test for frequency-dependent selection on mimetic colour patterns in Heliconius butterflies

A field test for frequency-dependent selection on mimetic colour patterns in Heliconius butterflies

Patricio Alejandro Salazar Carrión, Martin Stevens, Robert T. Jones, Imogen Ogilvie, Chris Jiggins

Müllerian mimicry, the similarity among unpalatable species, is thought to evolve by frequency-dependent selection. Accordingly, phenotypes that become established in an area are positively selected because predators have learnt to avoid these forms, while introduced phenotypes are eliminated because predators have not yet learnt to associate these other forms with unprofitability. We tested this prediction in two areas where different colour morphs of the mimetic species Heliconius erato and H. melpomene have become established, as well as in the hybrid zone between these morphs. In each area we tested for selection on three colour patterns: the two parental and the most common hybrid. We recorded bird predation on butterfly models with paper wings, matching the appearance of each morph to bird vision, and plasticine bodies. We did not detect differences in survival between colour morphs, but all morphs were more highly attacked in the hybrid zone. This finding is consistent with recent evidence from controlled experiments with captive birds, which suggest that the effectiveness of warning signals decreases when a large signal diversity is available to predators. This is likely to occur in the hybrid zone where over twenty hybrid phenotypes coexist.

Phylogenetic Identification and Functional Characterization of Orthologs and Paralogs across Human, Mouse, Fly, and Worm

Phylogenetic Identification and Functional Characterization of Orthologs and Paralogs across Human, Mouse, Fly, and Worm

Yi-Chieh Wu, Mukul S Bansal, Matthew D Rasmussen, Javier Herrero, Manolis Kellis

Model organisms can serve the biological and medical community by enabling the study of conserved gene families and pathways in experimentally-tractable systems. Their use, however, hinges on the ability to reliably identify evolutionary orthologs and paralogs with high accuracy, which can be a great challenge at both small and large evolutionary distances. Here, we present a phylogenomics-based approach for the identification of orthologous and paralogous genes in human, mouse, fly, and worm, which forms the foundation of the comparative analyses of the modENCODE and mouse ENCODE projects. We study a median of 16,101 genes across 2 mammalian genomes (human, mouse), 12 Drosophila genomes, 5 Caenorhabditis genomes, and an outgroup yeast genome, and demonstrate that accurate inference of evolutionary relationships and events across these species must account for frequent gene-tree topology errors due to both incomplete lineage sorting and insufficient phylogenetic signal. Furthermore, we show that integration of two separate phylogenomic pipelines yields increased accuracy, suggesting that their sources of error are independent, and finally, we leverage the resulting annotation of homologous genes to study the functional impact of gene duplication and loss in the context of rich gene expression and functional genomic datasets of the modENCODE, mouse ENCODE, and human ENCODE projects.