An in silico mining for simple sequence repeats from expressed sequence tags of zebrafish, medaka, Fundulus, and Xiphophorus
Zhenlin Ju*, Melissa C. Wells, Al Martinez, Leona Hazlewood and Ronald B. Walter
Molecular Biosciences Research Group, Department of Chemistry and Biochemistry, Texas State University, 601 University Drive, San Marcos, TX 78666, USA
Teleost fish genome projects involving model species are resulting in a rapid accumulation of genomic and expressed DNA sequences in public databases. The expressed sequence tags (ESTs) collected in the databases can be mined for the analysis of both structural and functional genomics. In this study, we in silico analyzed 49,430 unigenes representing a total of 692,654 ESTs from four model fish for their potential use in developing simple sequence repeats (SSRs), or microsatellites. After bioinformatical mining, a total of 3,018 EST derived SSRs (EST-SSRs) were identified for 2,335 SSR containing ESTs (SSR-ESTs). The frequency of identified SSR-ESTs ranged from 1.5% for Xiphophorus to 7.3% for zebrafish. The dinucleotide repeat motif is the most abundant SSR, accounting for 47%, 52%, 64%, and 78% for medaka, Fundulus, zebrafish, and Xiphophorus, respectively. Simulation analysis suggests that a majority of these EST-SSRs have sufficient flanking sequences for polymerase chain reaction (PCR) primer design. Comparative DNA sequence analyses of SSR-ESTs identified several cross-species SSRs and sequences that may be used as cross-reference genes in comparative studies. For example, the flanking sequences of one SSR (CTG)n within the pituitary tumor-transforming gene (PTTG) 1 interacting protein (PTTGIP), showed conservation spanning the medaka, Fundulus, human, and mouse genomes. This study provides a large body of information on EST-SSRs that can be useful for the development of polymorphic markers, gene mapping, and comparative genome analysis. Functional analysis of these SSR-ESTs may reveal their role in metabolism and gene evolution of these model species.