| In Silico Biology 6, 0005 (2006); ©2006, Bioinformation Systems e.V. |
1 Department of Biotechnology, Bengal College of Engineering & Technology, Durgapur - 713 212, India
2 Indian Association for the Cultivation of Science, Kolkata - 700 032, India
* Corresponding author
Email: bdebashis@excite.com
Phone: +91-33-2473 4971 (Extn:106)
Fax: +91-33-2483 6561
Edited by E. Wingender; received August 31, 2005; revised October 27 & December 29, 2005; accepted December 29, 2005; published January 03, 2006
Availability of genome sequences of pathogens has provided a tremendous amount of information that can be useful in drug target and vaccine target identification. One of the recently adopted strategies is based on a subtractive genomics approach, in which the subtraction dataset between the host and pathogen genome provides information for a set of genes that are likely to be essential to the pathogen but absent in the host. This approach has been used successfully in recent times to identify essential genes in Pseudomonas aeruginosa. We have used the same methodology to analyse the whole genome sequence of the human gastric pathogen Helicobacter pylori. Our analysis revealed that out of the 1590 coding sequences of the pathogen, 40 represent essential genes that have no human homolog. We have further analysed these 40 genes by the protein sequence databases to list some 10 genes whose products are possibly exposed on the pathogen surface. This preliminary work reported here identifies a small subset of the Helicobacter proteome that might be investigated further for identifying potential drug and vaccine targets in this pathogen.
Keywords: Helicobacter, comparative microbial genomics, subtractive genomics, novel drug targets, putative vaccine targets, surface proteins
The availability of genome-scale sequenced data of more than 60 microbes in the past decade [De Groot, 2002] and the completion of the human genome project has revolutionised the field of drug-discovery against threatening human pathogens [Miesel et al., 2003]. The strategies for drug design and development are progressively shifting from the genetic approach to the genomic approach [Galperin and Koonin, 1999]. Novel drug targets are required in order to design new defence against antibiotic sensitive pathogens. Comparative genomics and bioinformatics provide new opportunities for finding optimal targets among previously unexplored cellular functions based on a understanding of their related biological processes in bacterial pathogens and their hosts. In general, a target should provide adequate selectivity; yielding a drug which is specific or highly selective against the pathogen with respect to the human host. Moreover, the target should be essential for growth and viability of the pathogen at least under the condition of infection.
The search for potential drug targets has increasingly relied on genomic approaches. The entire approach is built on the assumption that the potential target must play an essential role in the pathogen's survival and constitute a critical component in its metabolic pathway. At the same time, this target should not have any well-conserved homolog in the human host. This would preclude possibilities of unacceptable cross-reactivity that might prove detrimental to the host. The above approach to target identification is essentially subtractive because we use a subtraction dataset while comparing the two genomes under consideration. The focus is on the complement of the genome of the pathogen that is essential for it but is not present in human. Multiple approaches to locate essential genes in a given organism exist, some of which focus on the concept that essential genes tend to be evolutionarily conserved over species [Itaya, 1995; Tatusov et al., 1997; Koonin et al., 1998; Kobayashi et al., 2003].
Subtractive genomics has been successfully used by authors to locate novel drug targets in Pseudomonas aeruginosa [Sakharkar et al., 2004]. The work has been effectively complemented with the compilation of the Database of Essential Genes (DEG) for a number of pathogenic microorganims [Zhang et al., 2004]. Further, essential gene analysis for Helicobactor pylori has recently been carried out [Salama et al., 2004].
The present work makes use of the DEG and the subtractive genomics approach to analyse the completed genome of Helicobacter pylori to look for potential surface epitopes that might be used to design drugs and vaccines.
Helicobacter pylori is one of the most common bacterial pathogens in humans whose seropositivity increases with age and low socio-economic status. As such, the pathogen is of serious concern for developing countries. The mechanism by which H. pylori is acquired and its route of transmission is unclear. The organism causes chromic persistent and atrophic gastritis in adults and children that often culminate in development of gastric and duodenal ulcers. Studies indicate that infected individuals have 2 - 6 fold increased risk of developing gastric cancer and mucosal associated lymphoid tissue lymphoma compared to their uninfected counterparts.
The complete genome of H. pylori has been sequenced. The strain 26695 has a circular genome of 1,667,867 base pairs with approximnately 1,590 predicted coding sequences. Sequence analysis indicates that the bacterium has a well-developed system of motility, iron-scavenging, DNA restriction and modification. Most putative adhesion proteins, lipoproteins and outer membrane proteins has been identified indicating a rich surface topography and a potentially complex mechanism of host pathogen interaction [Tomb et al., 1997].
The completed genome and protein table for Helicobacter pylori strain 26695 that was sequenced at the Institute of Genome Research (TIGR) was downloaded from the NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/). The Database of Essential Genes was accessed from its location (http://tubic.tju.edu.cu/deg1). The sequence alignments were carried out by the standard BLASTX.
From the complete genome sequence data, the genes of the organism that coded for proteins whose sequence were greater than 100 amino acids were selected out. This was on the assumption that proteins less than 100 amino acids in length were unlikely to represent essential proteins, yet be unique to the organism. These selected genes were subjected to BLASTX against the DEG. A random expectation value (E-value) cut-off of 10 -100 and a minimum bit-score cut-off of 100 was used to screen out genes that appeared to represent essential genes. These genes were listed along with their encoded product names. The screened genes, which are possibly the essential genes of H. pylori, were thus subjected to BLASTX against the human genome in the NCBI server. The homologs were excluded and the list of non-homologs were compiled. The protein products corresponding to the final selected genes were further analysed with the Swiss-Prot Protein Database (http://us.expasy.org/sprot) to compile the final list of proteins which were presumably located on the surface.
The results that were obtained by the approach mentioned above are summarised in Table 1. The objective of the work was to find and locate those essential genes of H. pylori that play important roles in the normal functioning of the bacterium within the host and to shortlist them in the view of drug targeting. The symptoms of Helicobacter infection are usually disguised by those of gastritis and acidity and hence there is a high risk of wrong diagnosis and medication [Harris et al., 2001]. This may lead to unnecessary development of resistance to the prescribed drug regimens. Moreover, till date there is no specific drug to be administered for H. pylori infection. Identification of non-human homologs in the essential genes of H. pylori with subsequent screening of the proteome to find the corresponding protein product are likely to lead to development of drugs that specifically interact with the pathogen. The non-human homologs of the surface proteins would represent ideal vaccine targets.
| Table 1: | Classification of the Genes in Helicobacter pylori |
| Total number of genes | 1590 |
| Genes whose products are > 100 a.a. | 1395 |
| Essential genes [cut-off E-value < 10 -100] | 178 |
| Essential genes having no human homologs | 40 |
| Membrane associated non-human homologs of essential genes | 10 |
Our study has identified 178 essential genes in H. pylori. Interestingly, among these 178 genes, 37 were not listed in the DEG, possibly because of the random nature of insertion mutagenesis that forms the basis of compilation of the DEG. However these genes were characterised by a very high degree of identity scores with other organisms in the DEG. Going by the notion of great evolutionary conservation of essential genes among the species, we have included these 37 genes as 'essential' in our work.
Forty of the essential genes were without human homologs. A search of these 40 gene products in the TIGR database to ascertain their paralogous nature in H. pylori showed a negative result at a cut-off value of 60% identity. Thus, these 40 genes might be concluded to be unique. Ten of these genes coded for protein products that were membrane bound and located on the surface. They were found to represent either integral membrane proteins or outer membrane proteins that was linked to the membrane through some other molecule (summarised in Table 2). Out of the 37 genes that were not listed in the DEG as essential for H. pylori, the product of 26 of them were enzymes, 6 were regulatory factors, 2 represented signalling proteins and 3 coded for hypothetical proteins. Out of this entire set, only two, namely a GTP binding membrane protein (lepA) and a conserved hypothetical integral membrane protein were localised in the membrane. Nevertheless, most of these had human homologs and therefore might be thought of having essentially housekeeping functions. Only six of the 37 gene products did not have human homologs, of which four represented enzymes and two represented intracellular regulatory factors. None of them were localised in the membrane.
| Table 2: | List of the possible membrane proteins of Helicobacter pylori identified by subtraction |
| Gene product | Swiss-Prot Accession No. |
Subcellular location |
| Rare lipoprotein A (rlpA) | O26091 | Attached to membrane through lipid anchor |
| Iron (III) dicitrate transport protein (fecA) | O25487 | Outer membrane |
| Toxin-like outer membrane protein | O25331 | Outer membrane |
| Iron (III) dicitrate transport protein (fecA) | O25395 | Outer membrane |
| Iron (III) dicitrate transport protein (fecA) | O25950 | Outer membrane |
| Dipeptide ABC transporter, periplasmic dipeptide binding protein (dppA) | O25069 | Periplasmic membrane |
| Na+/H+ antiporter (nhaA) | O26076 | Integral to membrane |
| Dipeptide ABC transporter permease protein (dppC) | O25071 | Integral membrane protein |
| Signal-transducing protein histidine kinase | O24971 | Membrane |
| Cell-division membrane protein (ftsX) | O25445 | Membrane |
We have mentioned previously that that the surface topography of H. pylori is known to be complex and underlies the important aspects of host-pathogen interaction [Logan, 1996]. It might therefore be legitimate to consider that inactivation of some of these protein through vaccines would likely result in inactivation of the pathogen. The 10 protein components short-listed in our study would therefore represent promising candidates for further study and characterisation with an intention for vaccine design.
The influence of the completed genome sequence on vaccine design approaches has been effective for organisms like Mycobacterium tuberculosis [Montgomery, 2000]. A number of approaches for new vaccine development exist, including sub-unit protein and DNA vaccines; recombinant vaccines; auxotrophic organisms to deliver genes and so on. Testing such candidates is tiresome and expensive. Bioinformatics enables us to reduce substantially the number of such candidates to test.
The computational genomics approach [Sakharkar et al., 2004] stated herein, is likely to speed up drug discovery process by removing hindrances like dead ends or toxicity that are encountered in classical approaches. The ten membrane associated proteins of H. pylori are invariably linked with essential metabolic and signal transduction pathways. We are currently considering subjecting the above proteins to fold-level homology searches and structural modelling so as to elucidate as to which of these may function as the most effective epitope. Presumably, screening against such novel targets for functional inhibitors will result in discovery of novel therapeutic compounds active against bacteria, including the increased number of antibiotic resistant clinical strains [Thanassi et al., 2002]. The strategy is also likely to locate critical pathways and stages in the pathogenecity of H. pylori.