Introns structure patterns of variation in nucleotide composition in Arabidopsis thaliana and rice protein-coding genes
Adrienne Ressayre, Sylvain Glemin, Pierre Montalent, Laurana Serres-Giardi, Christine Dillmann, Johann Joets
Plant genomes are large, intron-rich and present a wide range of variation in coding region G+C content. Concerning coding regions, a sort of syndrome can be described in plants: the increase in G+C content is associated with both the increase in heterogeneity among genes within a genome and the increase in variation across genes. Taking advantage of the large number of genes composing plant genomes and the wide range of variation in gene intron number, we performed a comprehensive survey of the patterns of variation in G+C content at different scales from the nucleotide level to the genome scale in two species Arabidopsis thaliana and Oryza sativa, comparing the patterns in genes with different intron numbers. In both species, we observed a pervasive effect of gene intron number and location along genes on G+C content, codon and amino acid frequencies suggesting that in both species, introns have a barrier effect structuring G+C content along genes. In external gene regions (located upstream first or downstream last intron), species-specific factors are shaping G+C content while in internal gene regions (surrounded by introns), G+C content is constrained to remain within a range common to both species. In rice, introns appear as a major determinant of gene G+C content while in A. thaliana introns have a weaker but significant effect. The structuring effect of introns in both species is susceptible to explain the G+C content syndrome observed in plants.