In Silico Biology 8, 0009 (2008); ©2008, Bioinformation Systems e.V.  


In silico analysis of microsatellites in organellar genomes of major cereals for understanding their phylogenetic relationships


Passoupathy Rajendrakumar#, Akshaya Kumar Biswal#, Sena M. Balachandran* and Raman M. Sundaram*




Biotechnology Laboratory, Crop Improvement Section, Directorate of Rice Research, Rajendranagar, Hyderabad-500030, India



# These authors contributed equally

* Corresponding authors
   Email: balasena@yahoo.com; rms_28@rediffmail.com





Edited by H. Michael; received October 02, 2007; revised January 18, 2008; accepted January 24, 2008; published February 24, 2008



Abstract

Microsatellites are abundant across prokaryotic and eukaryotic genomes. However, comparative analysis of microsatellites in the organellar genomes of plants and their utility in understanding phylogeny has not been reported. The purpose of this study was to understand the organization of microsatellites in the coding and non-coding regions of organellar genomes of major cereals viz., rice, wheat, maize and sorghum. About 5.8-14.3% of mitochondrial and 30.5-43.2% of chloroplast microsatellites were observed in the coding regions. About 83.8-86.8% of known mitochondrial genes had at least one microsatellite while this value ranged from 78.6-82.9% among the chloroplast genomes. Dinucleotide repeats were the most abundant in the coding and non-coding regions of the mitochondrial genome while mononucleotides were predominant in chloroplast genomes. Maize harbored more repeats in the mitochondrial genome, which could be due to the larger size of genome. A phylogenetic analysis based on mitochondrial and chloroplast genomic microsatellites revealed that rice and sorghum were closer to each other, while wheat was the farthest and this corroborated with the earlier reported phylogenies based on nuclear genome co-linearity and chloroplast gene-based analysis.

Keywords: microsatellites, phylogeny, organellar genomes, comparative analysis



Introduction

Majority of the world's population depend on four domesticated cereals viz., rice, wheat, maize and sorghum for daily sustenance. These cereal species provide important models for evolutionary studies of the grasses since various aspects of their biology have been well documented. The traditional approach to plant molecular phylogenetics involves analyzing nucleotide sequence variation of one [1, 2] or a few conserved genes [3, 4] from many species. Comparing more genes reduces inherent sampling errors and makes the data more dependable. It is well established that analysis of genome-wide datasets often provides convincing inferences. Hence, genome-wide analysis of microsatellites will be advantageous as they provide more number of datasets. Phylogenetic analysis of Oryza based on mononucleotide repeats and flanking sequences from organellar genomes has been reported [5]. It has been shown that different taxa exhibit different preferences for microsatellite types and their abundance also varies among different genus/species [6]. In addition to their use as molecular markers, the information on abundance and distribution of microsatellites may help in understanding their relevance in gene function or genome evolution. Despite the availability of complete organellar genomes of few cereal species, a comprehensive analysis of microsatellites has been reported only in rice [7]. The main objective of this study is to analyze the comparative abundance and distribution of microsatellites in organellar genomes of major cereals for understanding cereal phylogeny.



Methods


Identification and localization of microsatellites

The complete mitochondrial and chloroplast genome sequences of rice (gi#47118326; gi#42795473), sorghum (gi#114309646; gi#118201104), maize (gi#40794996; gi#11990232) and wheat (gi#78675232; gi#13928184) available in GenBank (http://www.ncbi.nlm.nih.gov/genomes/static/euk_o.html) were used for the study. Perfect di-, tri-, tetra-, penta- and hexanucleotide motifs (≥3 times repeated) were identified using Simple Sequence Repeat Identification Tool (SSRIT) [8]. Mononucleotide repeats with a repeat length of ≥6 nt were identified using the software FastPCR [9]. Repeats were localized in coding and non-coding regions based on the sequence annotation in GenBank database.


Phylogenetic Analysis

Class I and Class II microsatellites with 100 nt flanking sequences were retrieved by a JavaScript program developed in-house by the authors. For a repeat motif in one genome, corresponding alleles in other genomes were identified based on the presence of same flanking sequences. Microsatellites were designated as polymorphic based on the differences in repeat number. Duplicate loci were identified based on the same flanking sequences. If a particular repeat was not present in other genomes, it was considered as null allele. With these criteria, binary data was generated and a phylogenetic tree was constructed based on Unweighted Pair Group Method with Arithmetic Averages (UPGMA) algorithm using the TREECONW software [10]. The reliability of the tree was tested by bootstrap analysis [11].



Results and discussion


Abundance of microsatellites

Total number of microsatellites in the mitochondrial genomes ranged from 2147 to 2706 and only 5.8% (rice) to 14.3% (sorghum) of them reside in the coding region (Tab. 1). The density of microsatellites ranged from 26-34 bp/kb in the coding region while it was 32-36 bp/kb in the non-coding region. The frequency of microsatellites in the coding region ranged from 3.9 to 5.0 per kb while it was 4.6 to 5.3 per kb in the non-coding region. Among the mitochondrial genomes studied, ~85% of genes possessed at least one repeat (Supplementary Data Tab. 1).


Table 1: Distribution of microsatellites in organellar genomes
Repeat motifRiceWheatMaizeSorghum
CNCNCNCN
MonoMT 739307572979847152722
CP156398185353189354234313
DiMT611139611007791233143980
CP86183102179104188122174
TriMT10262182031727624206
CP2418231930222819
TetraMT240336260534
CP18174735
PentaMT-10110279-4
CP---3----
HexaMT-1 -4-32-1
CP-1----1-
MT - Total1462382158198917925273241947
CP - Total267608311561327571388511
MT: Mitochondria, CP: Chloroplast, C: Coding, N: Non-coding


The chloroplast genomes of cereals possessed microsatellites which ranged from 872 (wheat) to 899 (sorghum). Rice chloroplast had the least number of microsatellites (267) in the coding region as compared to wheat (311), maize (327) and sorghum (388) (Tab. 1). The density of microsatellites ranged from 34-38 bp/kb in the coding region and 50-53 bp/kb in the non-coding region. About 6.4-6.5 microsatellites were observed per kb of DNA in the coding region, while in the non-coding region it ranged from 4.9-5.5. Among the four chloroplast genomes studied, ~80% of genes possessed microsatellites (Supplementary Data Tab. 2).

Approximately, 3.5% of the mitochondrial genomes and 4.5% of the chloroplast genomes possessed microsatellites. A comparative analysis in the coding and non-coding region revealed that the mitochondrial genome had higher proportion of dinucleotide repeats while it was mononucleotides in the chloroplast genome (Supplementary Data Fig. 1). The difference in the relative abundance of different repeats in different species was also reported earlier [12]. This non-random distribution of repeats may be due to differences in mutability and the bias in repair efficiency of the mismatch repair system, which could lead to overrepresentation of microsatellites in certain genomes [13].


Most frequent repeats

Among the most frequent repeat types (mono-, di- and trinucleotides) in mitochondria, dinucleotides were the most abundant (47-49%), with only 5.1% (rice) to 12.7% (sorghum) of them present in the coding region. The repeat motif AT/TA was the most abundant in coding region, followed by TC/GA (Supplementary Data Fig. 2). Mononucleotides were the second most abundant repeat type accounting for ~40% of repeats with the abundance of poly (A) or (T). Trinucleotides were the next most abundant repeats accounting for ~10% of the repeats. Significant variation in their abundance in the coding region (3.7% for rice and 10.4% for sorghum) was observed.

In the chloroplast genomes, mononucleotides were the most abundant accounting for 60-63% of microsatellites with high frequency of poly (A) motif. Next to mononucleotides, dinucleotides (30-32%) were predominant. The AT/TA repeat motif was the most predominant in the coding region (Supplementary Data Tab. 3) and this observation was similar to liverworts and pea chloroplasts [14]. While sorghum possessed a significantly higher number (122) of dinucleotides repeats in the coding region, rice (86), wheat (102) and maize (104) had lower numbers. This might be due to the fact that the sorghum chloroplast genome had a longer coding region. With respect to trinucleotide repeats, maize had the maximum number of repeats (52) followed by sorghum (47), rice (42) and wheat (42). Similar trend was noticed in the coding region also with respect to trinucleotide repeats. The motif TTC was predominant in rice and sorghum while AAC was predominant in wheat. Repeat motifs TTC and AGA were found to occur in equal frequency in maize chloroplast genome (Supplementary Data Fig. 4).

A comparative analysis revealed that poly (A/T) was more abundant than poly (G/C) in both the organellar genomes. Among dinucleotide repeats, CG/GC repeats were extremely rare in both the organellar genomes while the motif AT/TA was the most abundant. The higher AT/TA frequencies may be due to high A/T content of the genomes and the relative ease of strand separation compared with C/G tracts [15]. Among trinucleotide repeats, mitochondria possessed ~50 different types whereas chloroplast had ~20. The motif AAG was most common in three of the mitochondrial genomes studied except wheat where it was TTC (Supplementary Data Fig. 5). In the case of chloroplast, higher proportion of TTC was observed, except wheat where AAC was predominant. In contrast to mitochondria, chloroplast possessed majority of trinucleotide repeats in the coding region. Recent studies have shown that certain trinucleotides and hexanucleotides are more abundant in coding regions of higher eukaryotic genomes [16, 17]. Dinucleotides were higher than trinucleotides in the coding regions of both organellar genomes studied, which are different from the nuclear genomes.


Least frequent repeat

Among these least frequent repeats (tetra-, penta- and hexanucleotides) in the mitochondrial genome, tetranucleotide repeats occur more in number (39-62) followed by penta- (4-81) and hexanucleotide repeats (1-32). Maize had higher number of repeats in all three classes with a significantly higher number of penta (81) and hexa repeats (32). The coding region of mitochondrial genomes possessed 2-5 tetranucleotide repeats while pentanucleotide repeats were present only in maize and wheat. Notably, all the hexanucleotide repeats were present in the non-coding region.

With respect to chloroplast genomes, maize possessed more number of tetranucleotide repeats (11) than rice (9), wheat and sorghum (8 each). Of these, maize and sorghum had 4 and 3 tetranucleotide repeats in the coding region respectively, while rice and wheat had a single tetranucleotide repeat. Among the chloroplast genomes, only wheat possessed 3 pentanucleotide repeats, which were localized in non-coding region. Interestingly, a single hexanucleotide repeat was present in the non-coding region of rice (atagaa)3 and coding region of sorghum (attagt)3.

A comparative analysis showed that mitochondria possessed 25-48 different types of tetra-, 4-55 types of penta- and 1-31 types of hexanucleotide repeats. The chloroplast genome had only 8-11 types of tetranucleotide repeats. Maize had 3 unique pentanucleotide repeats while a hexanucleotide repeat was unique to rice and sorghum (Supplementary Data Tab. 3). The proportion of different classes of least frequent repeats in the mitochondrial and chloroplast genomes is shown in Supplementary Data Figs. 6 and 7. Generally, dinucleotide and trinucleotide repeats tend to be longer than other repeats. But, the penta- and hexanucleotides were longer than other classes of repeats in the present study. The lack of longer di- and trinucleotide repeats could possibly be explained by the downward mutation bias and short existence time [18].


Implications of microsatellites in the genome

Role of microsatellites in regulation of gene expression [19, 20] and in the evolution of gene regulation [21] are well documented. Except mono- and dinucleotide repeats, other classes of repeats were extremely low in number in the organellar genomes. Interestingly, maize had a significantly higher number of penta- and hexanucleotide repeats, which may be due to the larger genome size of mitochondrial genome. Similar positive correlation between microsatellite content and genome size was reported earlier [6, 22]. In mitochondria, dinucleotide is repeated up to 8 times, tri- up to 6 times and tetra- repeated up to 4 times. The penta- and hexanucleotides were found up to 4 times except for maize where it was 8 and 7 times respectively. In the chloroplast genomes, di-, tri- and tetranucleotides were repeated up to 6, 5 and 4 times respectively, while penta- and hexanucleotides were repeated up to 3 times. The implications of excess numbers of short iterated repeats (<8 units) could be extremely important not only for genomic stability, but also for the evolution of additional genomic features such as codon usage [23].

The microsatellites identified in the present study could be used for the development of organellar genome-specific markers for tagging specific traits such as cytoplasmic male sterility, herbicide tolerance etc. Recently, the development of molecular marker for distinguishing male sterile lines from their cognate maintainer lines was reported in rice [24]. Some unique repeats in these genomes could be targeted for development of crop-specific markers (Supplementary Data Tab. 4), which could be of immense help for easy identification of these four crop species.


Understanding the phylogeny of major cereals

Microsatellites identified in this study were classified into Class I, Class II and Class III based on the length of repeat motif [25]. About 70 (sorghum) to 182 (maize) mitochondrial microsatellites and 15 (rice) to 25 (wheat) chloroplast microsatellites belonged to class II type (Supplementary Data Tab. 5 and 6). No Class I microsatellites were identified in chloroplast genomes, while maize and wheat mitochondrial genomes possessed 28 and 2 Class I microsatellites, respectively. The maximum repeat length of microsatellites was 48 nt as noticed in maize mitochondrial genome while it was ≤20 nt for other cereals. Lack of very long microsatellites has been considered as evidence to show that selection is also involved in maintaining microsatellites within a certain range [26].

Cross-genome comparisons indicated that some microsatellite loci are highly conserved and some were highly unique to a particular species. Conservation of microsatellite loci across species over long evolutionary time periods with the number of repeats never reaching long values was also reported [27]. The phylogenetic tree constructed using the microsatellite data of both the organellar genomes corroborated with each other (Fig. 1). Both the genomes indicated a similar phylogeny where rice and sorghum are closer to each other as compared to maize and wheat, while wheat came as out-group. The phylogenetic relationship of major cereals determined in this study matched with the earlier reports based on nuclear genome co-linearity [28] and analysis of chloroplast genes [29].



Click on the thumbnail to enlarge the picture
Figure 1: Phylogenetic tree based on organellar SSRs.


Through the present study, we have analyzed the microsatellites in organellar genomes of four major cereals viz., rice, wheat, sorghum and maize. Similar studies could not be carried out earlier since the sequence information of organellar genomes of the four major cereals was made publicly available only recently.



Conclusion

The present study is a step forward towards a better understanding of the distribution of microsatellites in the organellar genomes of major cereals. The study has identified the pattern of distribution of microsatellites in organellar genomes and validated the established syntenic relationships among the cereal genomes based on RFLP analysis [30]. It is interesting to note that the syntenic relationships revealed by these studies are identical, even though organellar genomes are inherited maternally unlike the nuclear genome. We have also identified a few class II microsatellites which will be highly useful with respect to their marker potential. These microsatellites could be used for the development of PCR based markers for targeting organellar genome-specific traits [24] and for carrying out genetic and phylogenetic studies [5, 31].



Acknowledgements

We sincerely thank Dr. B. C. Viraktamath, Project Director, Directorate of Rice Research, Hyderabad for the facilities and encouragement provided to us for carrying out the study. We also thank Dr. J. S. Bentur for critically reviewing the manuscript.




References


  1. Doebley, J., Durbin, M., Golenberg, E. M., Clegg, M. T. and Ma, D. P. (1990). Evolutionary analysis of the large subunit of carboxylase (rbcL) nucleotide sequence among the grasses (Gramineae). Evolution 44, 1097-1108.

  2. Hilu, K. W. and Alice, L. A. (1999). Evolutionary implications of matK indels in Poaceae. Am. J. Bot. 86, 1735-1741.

  3. Wolfe, K. H., Gouy, M., Yang, Y.-W., Sharp, P. M. and Li, W.-H. (1989). Date of the monocot-dicot divergence estimated from chloroplast DNA sequence data. Proc. Natl. Acad. Sci. USA 86, 6201-6205.

  4. Gaut, B. S., Muse, S. V. and Clegg, M. T. (1993). Relative rates of nucleotide substitution in the chloroplast genome. Mol. Phylogenet. Evol. 2, 89-96.

  5. Nishikawa, T., Vaughan, D. A. and Kadowaki, K. (2005). Phylogenetic analysis of Oryza species, based on simple sequence repeats and their flanking nucleotide sequences from the mitochondrial and chloroplast genomes. Theor. Appl. Genet. 110, 696-705.

  6. Hancock, J. M. (1999). Microsatellites and other simple sequences: genomic context and mutational mechanisms. In: Goldstein, D. B. and Schlotterer,C. (eds), Microsatellites: evolution and applications. Oxford University Press, Oxford, pp. 1-9.

  7. Rajendrakumar, P., Biswal, A. K., Balachandran, S. M., Srinivasarao, K. and Sundaram, R. M. (2007). Simple sequence repeats in organellar genomes of rice: frequency and distribution in genic and intercoding regions. Bioinformatics 23, 1-4.

  8. Temnykh, S., DeClerck, G., Lukashova, A., Lipovich, L., Cartinhour, S. and McCouch, S. R. (2001). Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome Res. 11, 1441-1452.

  9. Kalendar, R. (2006). FastPCR, PCR primer design, DNA and protein tools, repeat and own database searches program. http://www.biocenter.helsinki.fi/bi/programs/fastpcr.htm.

  10. Van de Peer, Y. and De Wachter, R. (1994). TREECONW: a software package for the construction and drawing evolutionary trees for the MS Windows environment. Comput. Applic. Biosci. 10, 569-570.

  11. Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783-791.

  12. Tóth, G., Gáspári, Z. and Jurka, J. (2000). Microsatellites in different eukaryotic genomes: survey and analysis. Genome Res. 10, 1967-1981.

  13. Harr, B., Todorova, J. and Schlötterer, J. (2002). Mismatch repair driven mutational bias in D. melanogaster. Mol. Cell. 10, 199-205.

  14. Powell, W., Machray, G. C. and Provan, J. (1996). Polymorphism revealed by simple sequence repeats. Trends Plant Sci. 1, 215-222.

  15. Gur-Arie, R., Cohen, C. J., Eitan, Y., Shelef, L., Hallerman, E. M. and Kashi, Y. (2000). Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. Genome Res. 10, 62-71.

  16. Metzgar, D., Bytof, J. and Wills, C. (2000). Selection against frame-shift mutations limits microsatellite expansion in coding DNA. Genome Res. 10, 72-80.

  17. Subramanian, S., Mishra, R. K. and Singh, L. (2003). Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions. Genome Biol. 4, R13.

  18. Harr, B. and Schlötterer, C. (2000). Long microsatellite alleles in Drosophila melanogaster have a downward mutation bias and short persistence times, which cause their genome-wide under-representation. Genetics 155, 1213-1220.

  19. Künzler, P., Matsuo, K. and Schaffner, W. (1995). Pathological, physiological, and evolutionary aspects of short unstable DNA repeats in the human genome. Biol. Chem. Hoppe Seyler 376, 201-211.

  20. Ayers, N. M., McClung, A. M., Larkin, P. D., Bligh, H. F. J., Jones, C. A. and Park, W. D. (1997). Microsatellites and a single nucleotide polymorphism differentiate apparent amylose classes in an extended pedigree of US rice germplasm. Theor. Appl. Genet. 94, 773-781.

  21. Huang, T.-S., Lee, C.-C., Chang, A.-C., Lin, S., Chao, C.-C., Jou, Y.-S., Chu, Y.-W., Wu, C.-W., Whang-Peng, J. (2003). Shortening of microsatellite deoxy (CA) repeats involved in GL331-induced down-regulation of matrix metalloproteinase-9 gene expression. Biochem. Biophys. Res. Commun. 300, 901-907.

  22. Primmer, C. R., Raudsepp, T., Chowdhary, B. P., Møller, A. P. and Ellegren, H. (1997). Low frequency of microsatellites in the avian genome. Genome Res. 7, 471-482.

  23. Field, D. and Wills, C. (1998). Abundant microsatellite polymorphism in Saccharomyces cerevisiae, and the different distributions of microsatellites in eight prokaryotes and S. cerevisiae, result from strong mutation pressures and a variety of selective forces. Proc. Nat. Acad. Sci. USA 95, 1647-1652.

  24. Rajendrakumar, P., Biswal, A. K., Balachandran, S. M., Ramesha, M. S., Viraktamath, B. C. and Sundaram, R. M. (2007). A mitochondrial repeat specific marker for distinguishing wild abortive type cytoplasmic male sterile rice lines from their cognate isogenic maintainer lines. Crop Sci. 47, 207-211.

  25. McCouch, S. R., Temnykh, S., Lukashova, A., Coburn, J., DeClerck, G., Cartinhour, S., Harrington, S., Thomson, M., Septiningsih, E., Semon, M., Moncada, P. and Li, J. (2001). Microsatellite markers in rice: abundance, diversity and applications. In: Khush, G. S., Brar, D. S. and Hardy, B. (eds), Rice Genetics IV. International Rice Research Institute, Manila, pp. 117-136.

  26. Nauta, M. J. and Weissing, F. J. (1996). Constraints on allele size at microsatellite loci: implications for genetic differentiation. Genetics 143, 1021-1032.

  27. Schlötterer, C., Amos, B. and Tautz, D. (1991). Conservation of polymorphic simple sequence loci in cetacean species. Nature 354, 63-65.

  28. Hulbert, S. H., Richter, T. E., Axtell, J. D. and Bennetzen, J. L. (1990). Genetic mapping and characterization of sorghum and related crops by means of maize DNA probes. Proc. Natl. Acad. Sci. USA 87, 4251-4255.

  29. Clark, L. G., Zhang, W. and Wendel, J. F. (1995). A phylogeny of the grass family (Poaceae) based on ndhF sequence data. Syst. Bot. 20, 436-460.

  30. Gale, M. D. and Devos, K. M. (1998). Comparative genetics in the grasses. Proc. Natl. Acad. Sci. USA 95, 1971-1974.

  31. Flannery, M. L., Mitchell, F. J. G., Coyne, S., Kavanagh, T. A., Burke, J. I., Salamin, N., Dowding, P. and Hodkinson, T. R. (2006). Plastid genome characterisation in Brassica and Brassicaceae using a new set of nine SSRs. Theor. Appl. Genet. 113, 1221-1231.