Microsatellite motifs with moderate GC content are clustered around genes on Arabidopsis thaliana chromosome 2
Atul Grover and Prakash C. Sharma*
University School of Biotechnology, Guru Gobind Singh Indraprastha University, Kashmere Gate, Delhi 110 006, India
Microsatellites, arrays of 1-6 bp sequences, are abundant in almost all the eukaryotic genomes. Their distribution in the genome is widely accepted to be differential and non random along the axis of the chromosomes. Arabidopsis thaliana genome is dominated by mononucleotide repeats, (A)n being the most abundant motif. In total, 39 microsatellite motifs extended to more than 100 bp in length. Of these, 8 loci are devoid of any gene in their proximity. (AG)n is the most abundant motif among longer repeats. The non-random distribution of microsatellite in the genome is reflected as occurrence of microsatellite clusters in the genome. In total, 3400 microsatellite clusters have been identified in the Arabidopsis genome. Chromosome 2, which is 19.7 Mb long, harbors 550 clusters accommodating 29% of all the microsatellites present on this chromosome. Further, 409 of the 6239 genes on chromosome 2 are associated with 323 microsatellite clusters. Motifs like (AGG)n and (ACT)n, show preferential accommodation in clusters that overlap with genes. Among all the microsatellite clusters that show an overlap with genes, 80% of the clusters show an overlap in such a way that the cluster ends beyond the 3'-end of the gene or starts before the 5'-end of a gene. Genes with diverse functions show association with the clusters. However, not all members of a gene family show similar associations.