Our paper: Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution

This guest post is by Daniël Melters [@DPMelters] and Keith Bradnam [@kbradnam] on their paper [along with co-authors]: Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. ArXived here.

The centromere poses an interesting paradox; although its function is essential, its molecular components are fast evolving. Centromeres in many animal and plant genomes have been characterized by the presence of large tandem repeat arrays. Numerous studies have suggested that the composition and length of the repeat units that comprise these arrays vary between species.
In this paper we tried to answer three main questions:
1) Can we identify the candidate centromere repeat sequences in genomes from hundreds of different species?
2) Do candidate centromere repeat sequences from different species share any common properties (sequence composition, length, GC% etc)?
3) How do these tandem repeats evolve?
To answer these questions, we took advantage of the large number of species with publicly available whole genome shotgun sequence data from various sequencing platforms. In total we analyzed 282 animal and plant genomes for the presence of high copy tandem repeat sequences, with the assumption that the most abundant tandem repeat is a good candidate for the centromere repeat.

We found high copy tandem repeats in the vast majority of the 282 genomes that we analyzed. For the smaller number of species with published cytology data, we correctly identified the published repeat sequence in 38 out of 43 cases. This confirms our assumption that the most abundant tandem repeat in any genome is likely to be the centromere repeat. In the five cases were we did not find the published centromere tandem repeats, we did not have data from sequencing platforms that would have allowed us to identify these repeats.

If an individual sequencing read contains at least four tandem repeats, then there is the possibility of detecting higher order repeat (HOR) structure. I.e. where a tandem array is made up of two alternating types of related sequence (A and B) to produce an A->B->A->B structure. In these cases, the AB dimer is more similar to other AB dimers, than A is to B. We found that HOR structure was surprisingly common in the candidate centromere repeats of many different species. The very long reads from Pacific Biosciences (PacBio) sequencing allowed us to further characterize repeat structure in great detail (for a few selected species), and this revealed additional levels of HOR structure.

To address the important question of ‘how similar are centromere repeats in different species?’, we performed an all-vs-all comparison between the most abundant tandem repeat in every species. Surprisingly, we found only 26 groups of species that shared any significant sequence similarity in their candidate centromere repeat sequence. The species that make up these 26 groups were always closely related species which had diverged less than 50 million years ago. When comparing the repeat sequences in these groups of closely related species, we found that repeats evolve not only by accumulation of mutations, but also by the spread of indels or by repeat doubling.

These results are in line with the ‘library’ hypothesis, which aims to describe how ratios of repeat variants can change over time. In addition, PacBio sequencing found very long tandem repeats (~1,500 bp). Furthermore, in switchgrass (Panicum virgatum) we identified several centromere repeat variants, but PacBio sequences did not show any mixing of these repeat variants. In summary, tandem repeats are frequently associated with the centromere function and most probably evolve according to the “library” hypothesis (a.k.a. molecular drive).

This paper is dedicated to the late Simon Chan, who passed away on the 22nd of August 2012 at the young age of 38 (see here for more infomation).

Daniël Melters and Keith Bradnam
PS. Supplementary table can be provided upon email request.


Diversity and abundance of the Abnormal chromosome 10 meiotic drive complex in Zea mays

Diversity and abundance of the Abnormal chromosome 10 meiotic drive complex in Zea mays
Lisa B. Kanizay, Tanja Pyhäjärvi, Elizabeth G. Lowry, Matthew B. Hufford, Daniel G. Peterson, Jeffrey Ross-Ibarra, R. Kelly Dawe
(Submitted on 25 Sep 2012)

Maize Abnormal chromosome 10 (Ab10) contains a classic meiotic drive system that exploits asymmetry of meiosis to preferentially transmit itself and other chromosomes containing specialized heterochromatic regions called knobs. The structure and diversity of the Ab10 meiotic drive haplotype is poorly understood. We developed a BAC library from an Ab10 line and used the data to develop sequence-based markers, focusing on the proximal portion of the haplotype that shows partial homology to normal chromosome 10. These molecular and additional cytological data demonstrate that two previously identified Ab10 variants (Ab10-I and Ab10-II) share a common origin. Dominant PCR markers were used with FISH to assay 160 diverse teosinte and maize landrace populations from across the Americas, resulting in the identification of a previously unknown but prevalent form of Ab10 (Ab10-III). We find that Ab10 occurs in at least 75% of teosinte populations at a mean frequency of 15%. Ab10 was also found in 13% of the maize landraces, but does not appear to be fixed in any wild or cultivated population. Quantitative analyses suggest that the abundance and distribution of Ab10 is governed by a complex combination of intrinsic fitness effects as well as extrinsic environmental variability.

Complex patterns of local adaptation in teosinte

Complex patterns of local adaptation in teosinte

Tanja Pyhäjärvi, Matthew B. Hufford, Sofiane Mezmouk, Jeffrey Ross-Ibarra
(Submitted on 3 Aug 2012)

Populations of widely distributed species often encounter and adapt to specific environmental conditions. However, comprehensive characterization of the genetic basis of adaptation is demanding, requiring genome-wide genotype data, multiple sampled populations, and a good understanding of population structure. We have used environmental and high-density genotype data to describe the genetic basis of local adaptation in 21 populations of teosinte, the wild ancestor of maize. We found that altitude, dispersal events and admixture among subspecies formed a complex hierarchical genetic structure within teosinte. Patterns of linkage disequilibrium revealed four mega-base scale inversions that segregated among populations and had altitudinal clines. Based on patterns of differentiation and correlation with environmental variation, inversions and nongenic regions play an important role in local adaptation of teosinte. Further, we note that strongly differentiated individual populations can bias the identification of adaptive loci. The role of inversions in local adaptation has been predicted by theory and requires attention as genome-wide data become available for additional plant species. These results also suggest a potentially important role for noncoding variation, especially in large plant genomes in which the gene space represents a fraction of the entire genome.

Our paper: The Genomic Signature of Crop-Wild Introgression in Maize

Our inaugural author post is by Matt Hufford and Jeff Ross-Ibarra [@lab_ri] on their paper:
The Genomic Signature of Crop-Wild Introgression in Maize ArXived here.

Evolutionary biologists have long been fascinated by introgressive hybridization. Numerous examples in which introgression has played an important evolutionary role are known, but genetic characterization has typically focused on only a handful of loci.

We took advantage of the recent development of inexpensive genotyping to address a long-standing question of introgression in maize evolution. Maize was domesticated in the warm low elevations of southwest Mexico, and likely colonized the highlands of central Mexico only thousands of years later. Maize is frequently cultivated in sympatry with its wild relatives the teosintes and is known to hybridize with them. Hybridization is especially common in the highlands, where maize and teosinte share several derived morphological features thought to be adaptive to high elevation.

We set out to discover the genomic extent of introgression in highland maize and teosinte populations and the degree to which this has been adaptive. We genotyped 9 sympatric population pairs of maize and teosinte at ~39,000 SNPs. We used two different algorithms (in the software STRUCTURE and HAPMIX) to model chromosomes as mosaics of maize and teosinte, and characterized regions of putative introgression. Surprisingly, we found shared regions of introgression across many populations and primarily only from teosinte into maize. To test whether this introgression may have facilitated maize adaptation to the highlands, we conducted a growth chamber experiment that revealed significant differences in putatively adaptive morphological traits between maize populations with and without introgression.

We submitted the paper to arXiv because this is a fast-moving area for empirical evolutionary genomics and we hoped to start the dialogue early on how to move forward with our results. We’d like feedback on the paper and specifically the following questions:

Are there recent advances in modeling admixture and introgression that we should apply?

Are our main findings surprising considering the putative history of maize diffusion?

Matt Hufford and Jeff Ross Ibarra

The Genomic Signature of Crop-Wild Introgression in Maize

The Genomic Signature of Crop-Wild Introgression in Maize
Matthew B. Hufford, Pesach Lubinksy, Tanja Pyhäjärvi, Michael T. Devengenzo, Norman C. Ellstrand, Jeffrey Ross Ibarra
(Submitted on 19 Aug 2012)

The evolutionary significance of hybridization and introgression has long been appreciated, but evaluation of the genome-wide effects of these phenomena has only recently become possible. Crop-wild study systems represent ideal opportunities to examine evolution through hybridization. For example, maize and the conspecific wild teosinte Zea mays ssp. mexicana are known to hybridize in the fields of highland Mexico. Despite widespread evidence of gene flow, maize and mexicana maintain distinct morphologies and have done so in sympatry for thousands of years. Neither the genomic extent nor the evolutionary importance of introgression between these taxa is understood. We assessed patterns of genome-wide introgression based on 39,029 single nucleotide polymorphisms genotyped in 189 individuals from nine sympatric maize-mexicana populations and reference allopatric populations. While portions of these genomes were particularly resistant to introgression (notably near known cross-incompatibility and domestication loci), we detected widespread evidence for introgression in both directions of gene flow. Through further characterization of these regions and a growth chamber experiment we found evidence consistent with the incorporation of adaptive mexicana alleles into maize during its expansion to the highlands of central Mexico. In contrast, very little evidence was found indicating introgression from maize to mexicana altered the niche of this wild taxon, increasing its capacity to persist commensal to agriculture. The methods we have applied here can be replicated widely across species, greatly informing our understanding of evolution through introgressive hybridization. Crop species, due to their exceptional genomic resources and frequent histories of diffusion into sympatry with relatives, should be particularly influential in these studies.