| In Silico Biology 7, 0026 (2007); ©2007, Bioinformation Systems e.V. |
1 School of Biosciences and Biotechnology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
2 Malaysia Genome Institute, UKM-MTDC Smart Technology Centre, 43600 Bangi, Selangor, Malaysia
* Corresponding author
Email: sheila@pkrisc.cc.ukm.my
Edited by H. Michael; received November 02, 2006; revised April 17, 2007; accepted April 25, 2007; published June 16, 2007
Many members of the AraC/XylS family transcription regulator have been proven to play a critical role in regulating bacterial virulence factors in response to environmental stress. By using the Hidden Markov Model (HMM) profile built from the alignment of a 99 amino acid conserved domain sequence of 273 AraC/XylS family transcription regulators, we detected a total of 45 AraC/XylS family transcription regulators in the genome of the Gram-negative pathogen, Burkholderia pseudomallei. Further in silico analysis of each detected AraC/XylS family transcription regulatory protein and its neighboring genes allowed us to make a first-order guess on the role of some of these transcription regulators in regulating important virulence factors such as those involved in three type III secretion systems and biosynthesis of pyochelin, exopolysaccharide (EPS) and phospholipase C. This paper has demonstrated an efficient and systematic genome-wide scale prediction of the AraC/XylS family that can be applied to other protein families.
Key words: AraC/XylS, α-helix-turn-α-helix, hidden Markov model, Burkholderia pseudomallei
Melioidosis is a disease of man and livestock caused by infection with the Gram-negative, saprophytic soil bacterium, Burkholderia pseudomallei. The bacterium is an important cause of community-acquired septicemia and melioidosis is endemic in Southeast Asia and northern Australia [Dance, 2000]. B. pseudomallei is known to produce many effector molecules with a variety of virulence determinants, including those with the ability to cause tissue necrosis, hemolysis, cytolysis and cell death. The bacterium has been shown to carry genes that produce and secrete both toxic molecules and hydrolytic enzymes including toxin, hemolysin, lecithinase, lipase, acid phosphatase and protease. Other virulence determinants such as siderophore, adhesin, flagella, lipopolysaccharide, capsule, biofilm [Vorachit et al., 1993] and the type III secretion system [Winstanley, 1999] responsible for adaptation, intracellular survival, multiple drug resistance and cell invasion are considered as the fundamental pre-requirements for host colonization and infection by B. pseudomallei (reviewed by Brett and Woods, 2000).
The AraC/XylS family of transcription regulators is one of the most common positive regulators. It was named after the first member, AraC, a regulator of the L-arabinose operon in Escherichia coli [Schleif, 1969]. Members of the family have been categorized into three main common regulatory functions: carbon metabolism, stress response and pathogenesis. However, members of virulence importance such as AfrR, AggR, CfaD, CsvR, FapR, PerA, MaoB and Rns from Escherichia coli, CafR and LcrF from Yersinia pestis, ExsA and PchR from Pseudomonas aeruginosa, HrpB from Ralstonia solanacearum, InvF, from Salmonella typhimurium, MxiE from Shigella flexneri, TcpN from Vibrio cholerae, UreR from Pseudomonas mirabilis and VirF from Shigella and Yersinia pestis have been a focus of intense attention (reviewed by Gallegos et al., 1997). Currently, 280 members of the family identified from Swiss-Prot and TrEMBL databases by the PROSITE profile matrix PS01124 are available for public access at http://www.AraC-XylS.org [Gallegos et al., 1997; Tobes and Ramos, 2002]. Members of this family could potentially provide enormous opportunities for the development of low side-effect therapeutic and preventive strategies due to their absence from archaebacteria and eukaryotes [Olsen et al., 1994].
To predict the presence of AraC/XylS family regulators within the B. pseudomallei genome and to survey its role in B. pseudomallei virulence, here we describe a genome-scale prediction of 45 AraC/XylS family transcription factor genes by using a hidden Markov model (HMM) profile matrix followed by in silico functional analysis.
The complete genome sequence of B. pseudomallei strain K96243 made available by the Wellcome Trust Sanger Institute was obtained from ftp://ftp.sanger.ac.uk/pub/pathogens/bps/ and all putative genes within the genome were predicted using GeneMark.hmm for Prokaryotes (Version 2.4), accessible via http://opal.biology.gatech.edu/GeneMark/gmhmm2_prok.cgi [Lukashin and Borodovsky, 1998]. Predicted genes were visualized by Artemis 5.0 [Rutherford et al., 2000]. Conceptually translated amino acid sequences of all genes were compiled in a fasta format file as a B. pseudomallei putative virtual proteomic database.
By using the HMMER 2.3.2 algorithm [Eddy, 1998], a profile HMM representing the helix-turn-helix (HTH) AraC/XylS domain was reconstructed from the 99-residue conserved peptide sequence alignment of the AraC/XylS family members downloaded from the AraC/XylS database at http://www.arac-xyls.org/xa-pro-fas.txt [Tobes and Ramos, 2002]. The resulting HTH_AraC profile HMM was then used to search for new members of the family within the B. pseudomallei putative virtual proteomic database. Clustal_X [Thompson et al., 1997] was used to align identified new members displaying the AraC/XylS conserved domain.
To glean the associated regulatory target, the probable functions of flanking genes clustered together with each detected AraC/XylS family member were also investigated by performing psi-BLAST against the non-redundant [NR] database at GenBank® [Altschul et al., 1997]. In addition, a newly described operon prediction software developed by Alm et al., 2005, (http://www.microbesonline.org) was also used to predict if these AraC/XylS family members form an operon structure with the flanking neighboring genes.
In silico data mining and annotation
GeneMark.hmm for Prokaryotes (Version 2.4) had predicted that chromosome 1 and chromosome 2 of B. pseudomallei K96243 contained 3664 and 2817 putative genes, respectively. This number of predicted coding sequences (CDSs) differs from that of the manually curated prediction reported by Holden et al., 2004, whereby chromosome 1 and chromosome 2 were designated to encode 3,460 and 2,395 coding sequences respectively. The objective of our study is to focus on an efficient and systematic method to perform a genome-wide prediction of any type of protein family. For this exercise, we chose to use the annotation of the GeneMark.hmm rather than the manually curated prediction. Thus, any arguments in this work, herewith, regarding the genome of B. pseudomallei K96243 shall refer to the total number of 5481 genes predicted by GeneMark.hmm.
By using the HMM built profile, we have identified 45 AraC/XylS family members throughout the genome of B. pseudomallei K96243. The score values assigned by the profile to 45 top ranked putative genes in the list ranged from 129.4 to 70.3, with small variations between two consecutive putative genes. However, a difference of 42.3 points was observed between the 45th top score putative gene (profile hit-score 70.3) and the 46th (profile hit-score 28.0). Manual psi-BLAST curation showed that all 45 of these identified AraC/XylS family members contained the 99-amino acid conserved sequence stretch constituting the HTH DNA binding domain but all other putative genes ranked from the 46th putative gene onwards did not. Thus, we propose the value of 70.3 as the threshold value for new AraC/XylS member identification in B. pseudomallei when using this profile.
Data comparison with Burkholderia pseudomallei GeneDB hosted by the Sanger Institute shows that, except for the Ada gene, which codes for probable O-6-methylguanine-DNA-alkyltransferase, (profile hit-score 89.3), results from this study have been coherent with the previous findings of 44 putative AraC/XylS transcription regulators [Holden et al., 2004]. Since the encoded protein of Ada had been previously characterized as a bifunctional protein involved in the repair of alkylated DNA as well as a positive regulator of its own synthesis [Lemotte and Walker, 1985; Nakabeppu and Sekiguchi, 1986], we propose to include Ada as a member of the AraC/XylS family which had earlier been overlooked by Holden et al., 2004, from the technical nomenclature aspect.
Tab. 1 summarizes the in silico prediction of each AraC/XylS family member and its respective proposed regulatory target. A few members important for virulence such as those responsible for regulating the type III secretion systems [Warawa and Woods, 2005], pyochelin biosynthesis [Serino et al., 1997], exopolysaccharide (EPS) biosynthesis and/or export [DeShazer et al., 2001] and phospholipase C biosynthesis and/or export [Korbsrisate et al., 1999] have been proposed based on analogous similarity of their flanking genes to previously described gene cluster organization.
| Table 1: | Summary of 45 B. pseudomallei AraC/XylS family transcription regulators. |
| CDS Name | B. pseudomallei K06243 genome position | Gene size (bp) | Theoretical Molecular Weight (Da) | G+C content (%) | Profile Hit score | AraC/Xyls domain position | Chromosome | Probable regulatory target |
| BPtf0001 | 110500:111594 forward | 1095 | 39277 | 70.04 | 89.3 | 85..182 | 1 | Alkylated DNA repair mechanism |
| BPtf0002 | 199931:200875 forward | 945 | 34150 | 70.05 | 103.7 | 201..299 | 1 | Unknown |
| BPtf0003 | 952965:953999 reverse | 1035 | 37615 | 69.27 | 99.9 | 230..328 | 1 | Unknown |
| BPtf0004 | 1441141:1442184 forward | 1011 | 38892 | 64.78 | 105.8 | 250..347 | 1 | Unknown |
| BPtf0005 | 1573837:1574889 forward | 1023 | 38615 | 70.28 | 92.5 | 252..349 | 1 | Unknown |
| BPtf0006 | 2037625:2038425 forward | 801 | 28032 | 71.91 | 96.1 | 169..266 | 1 | Transmembrane protein |
| BPtf0007 | 2133238:2134254 reverse | 990 | 37740 | 68.08 | 97.2 | 193..291 | 1 | ABC-type transport system |
| BPtf0008 | 2634115:2634978 forward | 864 | 31232 | 72.10 | 84.1 | 169..266 | 1 | Unknown |
| BPtf0009 | 3269922:3270869 forward | 948 | 35335 | 71.09 | 93.6 | 213..311 | 1 | Unknown |
| BPtf0010 | 3320835:3321854 reverse | 1020 | 36306 | 66.86 | 91.5 | 217..315 | 1 | Exopolysaccharide (EPS) biosynthesis and/or export |
| BPtf0011 | 4763900:4764865 reverse | 966 | 34987 | 71.22 | 91.4 | 219..317 | 2 | Unknown |
| BPtf0012 | 2888093:2889064 reverse | 1002 | 36137 | 68.56 | 129.4 | 200..298 | 1 | ABC-type histidine transport system |
| BPtf0013 | 3477697:3478599 forward | 888 | 31524 | 70.49 | 83.2 | 180..277 | 1 | Unknown |
| BPtf0014 | 4327998:4328435 forward | 393 | 16051 | 71.24 | 86.9 | 41..139 | 2 | Dehydrogenase |
| BPtf0015 | 4542447:4543163 forward | 717 | 26839 | 67.08 | 105.1 | 134..231 | 2 | Unknown |
| BPtf0016 | 4672051:4672977 forward | 927 | 32749 | 71.43 | 108.8 | 207..305 | 2 | Unknown |
| BPtf0017 | 3996544:3997623 reverse | 1029 | 39257 | 73.37 | 82.0 | 256..355 | 1 | Unknown |
| BPtf0018 | 4673036:4673938 reverse | 903 | 32877 | 70.43 | 98.8 | 198..297 | 2 | Unknown |
| BPtf0019 | 49289:50248 forward | 960 | 35828 | 66.35 | 113.4 | 214..315 | 1 | Unknown |
| BPtf0020 | 66187:67281 reverse | 1095 | 38339 | 73.33 | 93.2 | 231..329 | 1 | Unknown |
| BPtf0021 | 4149838:4150344 forward | 507 | 19214 | 69.82 | 83.2 | 70..167 | 2 | Non-hemolytic phospholipase C precursor |
| BPtf0022 | 4820849:4821874 reverse | 1110 | 37571 | 67.02 | 109.4 | 220..318 | 2 | Unknown |
| BPtf0023 | 4846261:4847205 reverse | 945 | 35296 | 67.08 | 102.1 | 214..312 | 2 | Unknown |
| BPtf0024 | 4869379:4870320 forward | 900 | 33769 | 68.88 | 107.1 | 202..299 | 2 | Pyochelin biosynthesis and/or export |
| BPtf0025 | 4914887:4916014 reverse | 1128 | 41590 | 69.94 | 75.5 | 259..357 | 2 | Membrane proteins |
| BPtf0026 | 4954677:4955633 forward | 957 | 34896 | 72.20 | 85.5 | 215..314 | 2 | Unknown |
| BPtf0027 | 4996197:4997171 forward | 975 | 35756 | 70.15 | 112.8 | 217..315 | 2 | Unknown |
| BPtf0028 | 5014042:5014884 reverse | 843 | 30405 | 72.95 | 92.2 | 160..257 | 2 | Unknown |
| BPtf0029 | 5239800:5240819 forward | 1020 | 38121 | 69.90 | 89.4 | 237..338 | 2 | Amino acid permease |
| BPtf0030 | 5388866:5390425 reverse | 1560 | 58057 | 67.94 | 78.4 | 380..483 | 2 | Unknown |
| BPtf0031 | 5468233:5469279 reverse | 1047 | 38036 | 71.69 | 116.0 | 198..297 | 2 | Unknown |
| BPtf0032 | 5744541:5745497 forward | 957 | 35694 | 67.71 | 102.1 | 184..282 | 2 | ABC-type sugar transport system and metabolism |
| BPtf0033 | 5885019:5885891 forward | 804 | 31720 | 68.15 | 85.8 | 180..277 | 2 | Unknown |
| BPtf0034 | 5950579:5951745 forward | 1167 | 41484 | 71.20 | 105.8 | 237..335 | 2 | Unknown |
| BPtf0035 | 5980082:5981533 reverse | 1443 | 54742 | 61.05 | 76.7 | 380..483 | 2 | Type III secretion system 1 |
| BPtf0036 | 6012776:6013774 reverse | 999 | 36440 | 70.87 | 105.7 | 231..329 | 2 | ABC-type proline/glycine betaine transport system |
| BPtf0037 | 6146667:6147728 reverse | 993 | 38910 | 70.79 | 106.5 | 247..347 | 2 | Unknown |
| BPtf0038 | 6175107:6175865 reverse | 666 | 27861 | 66.51 | 84.0 | 153..251 | 2 | Type III secretion system3 |
| BPtf0039 | 6261115:6262560 reverse | 1446 | 52644 | 72.95 | 84.7 | 379..481 | 2 | Type III secretion system2 |
| BPtf0040 | 6427896:6428870 forward | 975 | 35876 | 73.64 | 101.2 | 187..285 | 2 | Unknown |
| BPtf0041 | 6536877:6537827 forward | 777 | 34736 | 65.12 | 70.3 | 207..304 | 2 | Unknown |
| BPtf0042 | 6556300:6557067 forward | 768 | 27758 | 72.00 | 107.6 | 155..253 | 2 | Exopolysaccharide (EPS) biosynthesis and/or export |
| BPtf0043 | 6606291:6607322 forward | 966 | 36614 | 73.60 | 89.1 | 237..335 | 2 | Unknown |
| BPtf0044 | 6639284:6640309 forward | 966 | 37470 | 72.36 | 94.5 | 237..337 | 2 | Unknown |
| BPtf0045 | 6650605:6651657 forward | 1053 | 39297 | 72.93 | 81.1 | 247..344 | 2 | Unknown |
In addition, based on the MicrobesOnline Operon Prediction results, a number of identified members are shown to constitute individual operons with membrane proteins, transmembrane proteins, a dehydrogenase, amino acid permease and various ABC-type transport systems including those that are capable of transporting sugars, amino acids and alkaloids. The function and/or regulatory target of the other members remain unclear due to the limitation of current functional genomic understanding. These include members that clustered with hypothetical proteins or those that do not cluster in a typical operon structure with the adjacent genes.
Figs. 1a and 1b display the location of the 45 identified AraC/XylS members within B. pseudomallei's chromosome 1 and chromosome 2 respectively. One-third or 15 members were located in chromosome 1 while 67% (30 members) have been delineated to chromosome 2. Chromosome 1 contains a higher proportion of CDSs involved in core functions, such as macromolecule biosynthesis, amino acid metabolism, cofactor and carrier synthesis, nucleotide and protein biosynthesis, chemotaxis, and mobility; chromosome 2, contains a greater proportion of CDSs encoding accessory functions: adaptation to atypical conditions, osmotic protection and iron acquisition, secondary metabolism, regulation, and laterally acquired DNA [Holden et al., 2004]. The higher proportion (67%) of the AraC/XylS members in chromosome 2 hints at their active involvement during stress adaptation and virulence.
Fig. 2 represents the Clustal-X multiple alignment of the 45 newly identified AraC/XylS members in B. pseudomallei specifically cropped to the 99 amino acid conserved domain sequence. The non-conserved N-terminal and C-terminal sequence of the alignment has been truncated to amino acid position 840-940 to maintain the focus on the 99-amino acid highly conserved sequence stretch constituting the HTH DNA binding domain involved in DNA binding and stimulation of transcription. It is believed that the entire alignment of this conserved domain consists of two α-helix-turn-α-helix motifs [Wintjens and Rooman, 1996]. The first HTH motif (846-872) has lower sequence conservation compared to the second HTH motif (901-928) as demonstrated by AraC [Niland et al., 1996]. The second HTH motif contains an extra amino acid in the turn with respect to canonical HTH DNA binding motifs but the biochemical role is unknown. It has been suggested that the higher variation in the first HTH motif might be crucial in the recognition of specific target DNA sequences at the cognate promoters by different regulators; and conservation at the second HTH motif thus may represent a common function for all members of the family, e. g., contact with the transcriptional machinery [Gallegos et al., 1997]. With the exception of BPtf0001, the HTH DNA binding domain of BPtf0002 - BPtf0045 are located at the C-terminus in the third quarter of each gene.
In conclusion, this study has generated a hidden Markov model (HMM) profile matrix specific for the AraC/XylS family transcription regulator and successfully identified 45 highly conserved AraC/XylS family transcription regulator genes within the B. pseudomallei K96243 genome. The model has provided an efficient and systemic scheme for the identification of any protein family within a bacterial genome by utilizing characterized protein families sequence data. Potential AraC/XylS transcription regulators of virulence determinants such as type III secretion systems, pyochelin biosynthesis, exopolysaccharide (EPS) biosynthesis and phospholipase C could potentially be used for the development of therapeutic and preventive strategies for melioidosis.
We acknowledge The Wellcome Trust Sanger Institute for providing the B. pseudomallei K96243 complete genome sequence. This project was funded by the Ministry of Science, Technology and Innovation Malaysia under the IRPA grant 09-02-02-002 (BTK/TD/003) awarded to Rahmah Mohamed. Bioinformatics facilities were generously provided by the National Bioinformatics and Biotechnology Network (NBBnet), Ministry of Science, Technology and Innovation Malaysia.