Our paper: Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution

This guest post is by Daniël Melters [@DPMelters] and Keith Bradnam [@kbradnam] on their paper [along with co-authors]: Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. ArXived here.

The centromere poses an interesting paradox; although its function is essential, its molecular components are fast evolving. Centromeres in many animal and plant genomes have been characterized by the presence of large tandem repeat arrays. Numerous studies have suggested that the composition and length of the repeat units that comprise these arrays vary between species.
In this paper we tried to answer three main questions:
1) Can we identify the candidate centromere repeat sequences in genomes from hundreds of different species?
2) Do candidate centromere repeat sequences from different species share any common properties (sequence composition, length, GC% etc)?
3) How do these tandem repeats evolve?
To answer these questions, we took advantage of the large number of species with publicly available whole genome shotgun sequence data from various sequencing platforms. In total we analyzed 282 animal and plant genomes for the presence of high copy tandem repeat sequences, with the assumption that the most abundant tandem repeat is a good candidate for the centromere repeat.

We found high copy tandem repeats in the vast majority of the 282 genomes that we analyzed. For the smaller number of species with published cytology data, we correctly identified the published repeat sequence in 38 out of 43 cases. This confirms our assumption that the most abundant tandem repeat in any genome is likely to be the centromere repeat. In the five cases were we did not find the published centromere tandem repeats, we did not have data from sequencing platforms that would have allowed us to identify these repeats.

If an individual sequencing read contains at least four tandem repeats, then there is the possibility of detecting higher order repeat (HOR) structure. I.e. where a tandem array is made up of two alternating types of related sequence (A and B) to produce an A->B->A->B structure. In these cases, the AB dimer is more similar to other AB dimers, than A is to B. We found that HOR structure was surprisingly common in the candidate centromere repeats of many different species. The very long reads from Pacific Biosciences (PacBio) sequencing allowed us to further characterize repeat structure in great detail (for a few selected species), and this revealed additional levels of HOR structure.

To address the important question of ‘how similar are centromere repeats in different species?’, we performed an all-vs-all comparison between the most abundant tandem repeat in every species. Surprisingly, we found only 26 groups of species that shared any significant sequence similarity in their candidate centromere repeat sequence. The species that make up these 26 groups were always closely related species which had diverged less than 50 million years ago. When comparing the repeat sequences in these groups of closely related species, we found that repeats evolve not only by accumulation of mutations, but also by the spread of indels or by repeat doubling.

These results are in line with the ‘library’ hypothesis, which aims to describe how ratios of repeat variants can change over time. In addition, PacBio sequencing found very long tandem repeats (~1,500 bp). Furthermore, in switchgrass (Panicum virgatum) we identified several centromere repeat variants, but PacBio sequences did not show any mixing of these repeat variants. In summary, tandem repeats are frequently associated with the centromere function and most probably evolve according to the “library” hypothesis (a.k.a. molecular drive).

This paper is dedicated to the late Simon Chan, who passed away on the 22nd of August 2012 at the young age of 38 (see here for more infomation).

Daniël Melters and Keith Bradnam
PS. Supplementary table can be provided upon email request.


Diversity and abundance of the Abnormal chromosome 10 meiotic drive complex in Zea mays

Diversity and abundance of the Abnormal chromosome 10 meiotic drive complex in Zea mays
Lisa B. Kanizay, Tanja Pyhäjärvi, Elizabeth G. Lowry, Matthew B. Hufford, Daniel G. Peterson, Jeffrey Ross-Ibarra, R. Kelly Dawe
(Submitted on 25 Sep 2012)

Maize Abnormal chromosome 10 (Ab10) contains a classic meiotic drive system that exploits asymmetry of meiosis to preferentially transmit itself and other chromosomes containing specialized heterochromatic regions called knobs. The structure and diversity of the Ab10 meiotic drive haplotype is poorly understood. We developed a BAC library from an Ab10 line and used the data to develop sequence-based markers, focusing on the proximal portion of the haplotype that shows partial homology to normal chromosome 10. These molecular and additional cytological data demonstrate that two previously identified Ab10 variants (Ab10-I and Ab10-II) share a common origin. Dominant PCR markers were used with FISH to assay 160 diverse teosinte and maize landrace populations from across the Americas, resulting in the identification of a previously unknown but prevalent form of Ab10 (Ab10-III). We find that Ab10 occurs in at least 75% of teosinte populations at a mean frequency of 15%. Ab10 was also found in 13% of the maize landraces, but does not appear to be fixed in any wild or cultivated population. Quantitative analyses suggest that the abundance and distribution of Ab10 is governed by a complex combination of intrinsic fitness effects as well as extrinsic environmental variability.