Genomic, transcriptomic and phenomic variation reveals the complex adaptation of modern maize breeding

Genomic, transcriptomic and phenomic variation reveals the complex adaptation of modern maize breeding

Haijun Liu, Xiaqing Wang, Marilyn Warburton, Weiwei Wen, Minliang Jin, Min Deng, Jie Liu, Hao Tong, Qingchun Pan, Xiaohong Yang, Jianbing Yan

The temperate-tropical division of early maize germplasm to different agricultural environments was arguably the greatest adaptation process associated with the success and near ubiquitous importance of global maize production. Deciphering this history is challenging, but new insight has been gained from the genomic, transcriptomic and phenotypic variation collected from 368 diverse temperate and tropical maize inbred lines in this study. This is the first attempt to systematically explore the mechanisms of the adaptation process. Our results indicated that divergence between tropical and temperate lines seem occur 3,400-6,700 years ago. A number of genomic selection signals and transcriptomic variants including differentially expressed individual genes and rewired co-expression networks of genes were identified. These candidate signals were found to be functionally related to stress response and most were associated with directionally selected traits, which may have been an advantage under widely varying environmental conditions faced by maize as it was migrated away from its domestication center. It?s also clear in our study that such stress adaptation could involve evolution of protein-coding sequences as well as transcriptome-level regulatory changes. This latter process may be a more flexible and dynamic way for maize to adapt to environmental changes over this dramatically short evolutionary time frame.

Natural variation in teosinte at the domestication locus teosinte branched1 (tb1)

Natural variation in teosinte at the domestication locus teosinte branched1 (tb1)

Laura Vann, Thomas Kono, Tanja Pyha ̈j ̈arvi, Matthew B Hufford, Jeffrey Ross-Ibarra

Premise of the study: The teosinte branched1 (tb1) gene is a major QTL controlling branching differences between maize and its wild progenitor, teosinte. The insertion of a transposable element (Hopscotch) upstream of tb1 is known to enhance the gene’s expression, causing reduced tillering in maize. Observations of the maize tb1 allele in teosinte and estimates of an insertion age of the Hopscotch that predates domestication led us to investigate its prevalence and potential role in teosinte. Methods: Prevalence of the Hopscotch element was assessed across an Americas-wide sample of 1110 maize and teosinte individuals using a co-dominant PCR assay. Population genetic summaries were calculated for a subset of individuals from four teosinte populations in central Mexico. Phenotypic data were also collected from a single teosinte population where Hopscotch was found segregating. Key results: Genotyping results suggest the Hopscotch element is at higher than expected frequency in teosinte. Analysis of linkage disequilibrium near tb1 does not support recent introgression of the Hopscotch allele from maize into teosinte. Population genetic signatures are consistent with selection on this locus revealing a potential ecological role for Hopscotch in teosinte. Finally, two greenhouse experiments with teosinte do not suggest tb1 controls tillering in natural populations. Conclusions: Our findings suggest the role of Hopscotch differs between maize and teosinte. Future work should assess tb1 expression levels in teosinte with and without the Hopscotch and more comprehensively phenotype teosinte to assess the ecological significance of the Hopscotch insertion and, more broadly, the tb1 locus in teosinte. Key words: domestication; maize; teosinte; teosinte branched1; transposable element

How the tortoise beats the hare: Slow and steady adaptation in structured populations suggests a rugged fitness landscape in bacteria

How the tortoise beats the hare: Slow and steady adaptation in structured populations suggests a rugged fitness landscape in bacteria

Joshua R. Nahum, Peter Godfrey-Smith, Brittany N. Harding, Joseph H. Marcus, Jared Carlson-Stevermer, Benjamin Kerr

In the context of Wright’s adaptive landscape, genetic epistasis can yield a multi-peaked or “rugged” topography. In an unstructured population, a lineage with selective access to multiple peaks is expected to rapidly fix on one, which may not be the highest peak. Contrarily, beneficial mutations in a population with spatially restricted migration take longer to fix, allowing distant parts of the population to explore the landscape semi-independently. Such a population can simultaneous discover multiple peaks and the genotype at the highest discovered peak is expected to fix eventually. Thus, structured populations sacrifice initial speed of adaptation for breadth of search. As in the Tortoise-Hare fable, the structured population (Tortoise) starts relatively slow, but eventually surpasses the unstructured population (Hare) in average fitness. In contrast, on single-peak landscapes (e.g., systems lacking epistasis), all uphill paths converge. Given such “smooth” topography, breadth of search is devalued, and a structured population only lags behind an unstructured population in average fitness (ultimately converging). Thus, the Tortoise-Hare pattern is an indicator of ruggedness. After verifying these predictions in simulated populations where ruggedness is manipulable, we then explore average fitness in metapopulations of Escherichia coli. Consistent with a rugged landscape topography, we find a Tortoise-Hare pattern. Further, we find that structured populations accumulate more mutations, suggesting that distant peaks are higher. This approach can be used to unveil landscape topography in other systems, and we discuss its application for antibiotic resistance, engineering problems, and elements of Wright’s Shifting Balance Process.

Predicting evolution from the shape of genealogical trees

Predicting evolution from the shape of genealogical trees

Richard A. Neher, Colin A. Russell, Boris I. Shraiman
(Submitted on 3 Jun 2014)

Given a sample of genome sequences from an asexual population, can one predict its evolutionary future? Here we demonstrate that the branching pattern of reconstructed genealogical trees contains information about the relative fitness of the sampled sequences and that this information can be used to infer the closest extant relative of future populations. Our approach is based on the assumption that evolution proceeds predominantly by accumulation of small effect mutations and does not require any species specific input. Hence, the resulting inference algorithm can be applied to any asexual population under persistent selection pressure. We demonstrate its performance using historical data on seasonal influenza A/H3N2 virus. We predict the progenitor lineage of the upcoming influenza season with near optimal performance in 30% of cases and makes informative predictions in 16 out of 18 years. Beyond providing a practical tool for prediction, our results suggest that continuous adaptation by small effect mutations is a major component of influenza virus evolution.

Author post: Inferring human population size and separation history from multiple genome sequences

This guest post is by Stephan Schiffels (@stschiff) on his paper with Richard Durbin Inferring human population size and separation history from multiple genome sequences biorxived here

In our paper, we study genome sequences to learn about human history and how human populations are related to each other. Remarkably, we only need a few individuals for this, because once we look sufficiently many generations into the past, every single genome contains fragments from a very large number of ancestors. This means that given only two genomes, say one individual from Africa and one individual from Europe, we typically find shared fragments from common ancestors (great great … great grandparents) from 2,000 or more generations ago. This trace of shared segments in our genomes can be detected and enables us to make inference about human history.

A few years ago, Heng Li and Richard Durbin introduced the PSMC method which is based on estimating this shared common ancestry in a single diploid genome to infer population sizes. We now introduced a major extension to this approach, called MSMC (Multiple Sequentially Markovian Coalescent), which is able to find and date traces of shared ancestry across multiple genome sequences. This is generally a hard problem because of the complex way of how sequences relate with each other through recombination and mutation (see an excellent blog post by Adam Siepel). In our method, we therefore made a choice to focus only on the pair of segments which coalesce first, i.e. share the most recent common ancestor of all pairs. Because of ancestral recombinations, this changes along the sequences.

Consider again the example of an African and a European individual, each of them carrying two copies of a chromosome. In one part of their genomes, the most recent ancestor of any two chromosomes may be shared between the two European chromosomes, in other parts it may be shared between the two African chromosomes, and in some cases it may actually be found across a European and an African chromosome. The relative frequency of how often we observe each of the three cases, and the distribution of times to the most recent common ancestor, give information about when the separation happened, and how long it took for the ancestral people to part fully from each other. In the case of West-Africans and Europeans, we found that the two populations started to separate from each other (at least genetically) long before the known out-of-Africa emigration 50,000 years ago. And we see the same thing if we compare West-Africans to Asians or Americans instead of Europeans. We can also see clearly how ancestors of Native Americans separated from Asians around 20,000 years ago, consistently preceding the known first arrival of people in the New World around 15,000 years ago.

Our method can also estimate effective population size changes through time. One consequence of our approach to look only for the first common ancestor is that we can now look into the much more recent past than was previously possible with similar methods, such as PSMC. For example, we can now see a deep bottleneck in Native American ancestors around 15,000 years ago which fits with the separation and immigration history described above, and we can see recent expansions that are consistent with the spread of agriculture in Africa.

We believe that MSMC is a useful tool for estimating population history from whole genome sequences. But more ideas and development are still needed in the future to expand this approach to more genomes and to look into the past even more recently than 2,000 years ago, which is our current limit with MSMC. Closely related approaches are currently developed by Yun Song, Thomas Mailund and others, which will complement MSMC. This is a great time to work in this field, given that many more high quality individual genome sequences are being generated, and in many cases from populations that we have not covered at all in our paper. All of this will help to greatly expand our knowledge of human population history.

Epidemic reconstruction in a phylogenetics framework: transmission trees as partitions

Epidemic reconstruction in a phylogenetics framework: transmission trees as partitions

Matthew Hall, Andrew Rambaut
(Submitted on 2 Jun 2014)

The reconstruction of transmission trees for epidemics from genetic data has been the subject of some recent interest. It has been demonstrated that the transmission tree structure can be investigated by augmenting internal nodes of a phylogenetic tree constructed using pathogen sequences from the epidemic with information about the host that held the corresponding lineage. In this paper, we note that this augmentation is equivalent to a correspondence between transmission trees and partitions of the phylogenetic tree into connected subtrees each containing one tip, and provide a framework for Markov Chain Monte Carlo inference of phylogenies that are partitioned in this way, giving a new method to co-estimate both trees. The procedure is integrated in the existing phylogenetic inference package BEAST.

Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera

Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera

Brant C. Faircloth, Michael G. Branstetter, Noor D. White, Seán G. Brady
(Submitted on 2 Jun 2014)

Gaining a genomic perspective on phylogeny requires the collection of data from many putatively independent loci collected across the genome. Among insects, an increasingly common approach to collecting this class of data involves transcriptome sequencing, because few insects have high-quality genome sequences available; assembling new genomes remains a limiting factor; the transcribed portion of the genome is a reasonable, reduced subset of the genome to target; and the data collected from transcribed portions of the genome are similar in composition to the types of data with which biologists have traditionally worked (e.g., exons). However, molecular techniques requiring RNA as a template are limited to using very high quality source materials, which are often unavailable from a large proportion of biologically important insect samples. Recent research suggests that DNA-based target enrichment of conserved genomic elements offers another path to collecting phylogenomic data across insect taxa, provided that conserved elements are present in and can be collected from insect genomes. Here, we identify a large set (n=1510) of ultraconserved elements (UCE) shared among the insect order Hymenoptera. We use in silico analyses to show that these loci accurately reconstruct relationships among genome-enabled Hymenoptera, and we design a set of baits for enriching these loci that researchers can use with DNA templates extracted from a variety of sources. We use our UCE bait set to enrich an average of 721 UCE loci from 30 hymenopteran taxa, and we use these UCE loci to reconstruct phylogenetic relationships spanning very old (≥220 MYA) to very young (≥1 MYA) divergences among hymenopteran lineages. In contrast to a recent study addressing hymenopteran phylogeny using transcriptome data, we found ants to be sister to all remaining aculeate lineages with complete support.

The most parsimonious tree for random data

The most parsimonious tree for random data

Mareike Fischer, Michelle Galla, Lina Herbst, Mike Steel
(Submitted on 1 Jun 2014)

Applying a method to reconstruct a phylogenetic tree from random data provides a way to detect whether that method has an inherent bias towards certain tree `shapes’. For maximum parsimony, applied to a sequence of random 2-state data, each possible binary phylogenetic tree has exactly the same distribution for its parsimony score. Despite this pleasing and slightly surprising symmetry, some binary phylogenetic trees are more likely than others to be a most parsimonious (MP) tree for a sequence of k such characters, as we show. For k=2, and unrooted binary trees on six taxa, any tree with a caterpillar shape has a higher chance of being an MP tree than any tree with a symmetric shape. On the other hand, if we take any two binary trees, on any number of taxa, we prove that this bias between the two trees vanishes as the number of characters grows. However, again there is a twist: MP trees on six taxa are more likely to have certain shapes than a uniform distribution on binary phylogenetic trees predicts, and this difference does not appear to dissipate as k grows.

A field test for frequency-dependent selection on mimetic colour patterns in Heliconius butterflies

A field test for frequency-dependent selection on mimetic colour patterns in Heliconius butterflies

Patricio Alejandro Salazar Carrión, Martin Stevens, Robert T. Jones, Imogen Ogilvie, Chris Jiggins

Müllerian mimicry, the similarity among unpalatable species, is thought to evolve by frequency-dependent selection. Accordingly, phenotypes that become established in an area are positively selected because predators have learnt to avoid these forms, while introduced phenotypes are eliminated because predators have not yet learnt to associate these other forms with unprofitability. We tested this prediction in two areas where different colour morphs of the mimetic species Heliconius erato and H. melpomene have become established, as well as in the hybrid zone between these morphs. In each area we tested for selection on three colour patterns: the two parental and the most common hybrid. We recorded bird predation on butterfly models with paper wings, matching the appearance of each morph to bird vision, and plasticine bodies. We did not detect differences in survival between colour morphs, but all morphs were more highly attacked in the hybrid zone. This finding is consistent with recent evidence from controlled experiments with captive birds, which suggest that the effectiveness of warning signals decreases when a large signal diversity is available to predators. This is likely to occur in the hybrid zone where over twenty hybrid phenotypes coexist.

Phylogenetic Identification and Functional Characterization of Orthologs and Paralogs across Human, Mouse, Fly, and Worm

Phylogenetic Identification and Functional Characterization of Orthologs and Paralogs across Human, Mouse, Fly, and Worm

Yi-Chieh Wu, Mukul S Bansal, Matthew D Rasmussen, Javier Herrero, Manolis Kellis

Model organisms can serve the biological and medical community by enabling the study of conserved gene families and pathways in experimentally-tractable systems. Their use, however, hinges on the ability to reliably identify evolutionary orthologs and paralogs with high accuracy, which can be a great challenge at both small and large evolutionary distances. Here, we present a phylogenomics-based approach for the identification of orthologous and paralogous genes in human, mouse, fly, and worm, which forms the foundation of the comparative analyses of the modENCODE and mouse ENCODE projects. We study a median of 16,101 genes across 2 mammalian genomes (human, mouse), 12 Drosophila genomes, 5 Caenorhabditis genomes, and an outgroup yeast genome, and demonstrate that accurate inference of evolutionary relationships and events across these species must account for frequent gene-tree topology errors due to both incomplete lineage sorting and insufficient phylogenetic signal. Furthermore, we show that integration of two separate phylogenomic pipelines yields increased accuracy, suggesting that their sources of error are independent, and finally, we leverage the resulting annotation of homologous genes to study the functional impact of gene duplication and loss in the context of rich gene expression and functional genomic datasets of the modENCODE, mouse ENCODE, and human ENCODE projects.