Comparative Analysis of Tandem Repeats from Hundreds of Species Reveals Unique Insights into Centromere Evolution

Comparative Analysis of Tandem Repeats from Hundreds of Species Reveals Unique Insights into Centromere Evolution

Daniël P. Melters, Keith R. Bradnam, Hugh A. Young, Natalie Telis, Michael R. May, J. Graham Ruby, Robert Sebra, Paul Peluso, John Eid, David Rank, José Fernando Garcia, Joseph L. DeRisi, Timothy Smith, Christian Tobias, Jeffrey Ross-Ibarra, Ian F. Korf, Simon W.-L. Chan
(Submitted on 22 Sep 2012)

Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. The assumption that the most abundant tandem repeat is the centromere DNA was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and in length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond ~50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution, including the appearance of higher order repeat structures in which several polymorphic monomers make up a larger repeating unit. While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animals and plants. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.

Advertisements

6 thoughts on “Comparative Analysis of Tandem Repeats from Hundreds of Species Reveals Unique Insights into Centromere Evolution

  1. Pretty cool, didn’t read the paper thoroughly, but it got my juices flowing about ideas for how to test neutrality of divergence in repeat sequences. This seems to be critical to distinguishing the meiotic drive model from the molecular drive model.

    One thing I have been thinking about it McDonald-Kreitman analogs, whereby one measures divergence in centromeric satellite sequence based on bulk amount and also polymorphism based on bulk amount. One thing I have been realizing from genome sequencing is it is a REALLY good assay for quantities of low complexity sequences. Thus, any population resequencing study should have good estimates of polymorphism in bulk content for satellite sequences. An outgroup then could be used for divergence in bulk content. The problem arises in whether a change in the mechanism of mutation happened on one lineage that would increase the “molecular drive” rate of divergence. SO, I think the way around this would be to look at polymorphism in TWO species. The second polymorphism estimate could be used to measure extant mutational processes in the second lineage. Then, accounting for variance estimates in satellite DNA content in two species, and some measure of divergence, determine whether the rate of divergence in satellite content outstripped what might be predicted based on standing variation…

    Anyway, this paper seems to provide a foundation for testing some of these meiotic drive models of centromere evolution. Pretty cool.

    Justin

  2. Neat idea, will have to think about how/if it could be implemented. Would it not still be a bit messy for several reasons, including not all taxa have one repeat type (e.g. recent crazy potato centromere paper), and centromere/meiotic drive may act on active kinetochores while molecular drive might act on pericentromeric regions as well. Definitely worth thinking about. Thanks!

  3. More thoughts and questions. So to do MK style you’d be comparing to say synomyous sites genome wide? Not clear how abundance of repeats (interpreted to mean % of genome made up of repeat) tells you something about rate of molecular drive? And not sure I follow how polymorphism in two species allows you to rule out changes to molecular drive in the lineage used as the outgroup for divergence (since molecular drive along that branch would effect repeat divergence but not synonymous site divergence, and thus look like selection)?

  4. Pingback: Our paper: Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution | Haldane's Sieve

  5. Yes, this was half baked. Let me put it in the oven a little bit more.

    Maybe MK analogs are no good. I am trying to think of the ‘hook’ that will tease apart a signature of molecular drive from meiotic drive. I assume it would come from the polymorphism signature in tandem repeat content relative to divergence. Thinking out loud here, a molecular drive process should look like a Brownian motion process with directionality, right? But it wouldn’t wipe out variation the way meiotic drive would, right? This is akin to the molecular drive process of biased gene conversion. It put’s a directional aspect to mutation, but it shouldn’t show sweep signatures. Maybe that is the angle. I think you are right, if a mixed process of molecular drive and meiotic drive were taking place on two lineages in the same fashion, well, this would be tricky…

    OK, I will simmer on this.

  6. One last bite at the apple.

    Consider three taxa or more with two focal sister taxa (A and B) One would be able to estimate branch specific accumulation of bulk repeat content on each lineage of the focal sister taxa. One could also look at diversity in the two sister taxa. Suppose the following condition were met:

    ‘A’ showed a higher branch specific rate of divergence AND a lower amount of repeat diversity.

    One tricky thing here, however, is that divergence on branch ‘A’ might be slightly deleterious and reduction in diversity on branch ‘A’ might be due to stronger Hill-Robertson effects that might be present, for some reason, on branch ‘A’ (thus perhaps also explaining fixation of mildly deleterious increases in repeat content). Perhaps this could be ruled out by looking at dN/dS in genes flanking centromeres or repeats of interest.

    Anyway, if it worked, this might only be able to detect heterogeneity in selection on repeat content across among species. Seems like a good system to test this in would be sister taxa in which one species is a selfer and the other not (like the Mimulus system that Lila Fishman works on).

    OK, I’m out.

    J

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s