| In Silico Biology 8, 0044 (2008); ©2008, Bioinformation Systems e.V. |
Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
* Corresponding author
Phone: +91-80-2293 2837; Fax: +91-80-2360 0535
Email: ns@mbu.iisc.ernet.in
Edited by E. Wingender; received April 25, 2008; revised October 29, 2008; accepted October 31, 2008; published November 15, 2008
Using a large database of protein domain families of known 3-D structure we present an analysis on the relationships among sequences, structures and functions of closely-related enzymes performed at the level of catalytic domains. Only in 38% of the pairs of homologous catalytic domains characterized by over about 60% of sequence identity the functions are almost completely identical. Nearly 43% of the pairs differ in their substrate specificity. Hence the most common variation of enzyme function among the closely-related homologues is the differences in the substrate specificity. For homologous pairs characterized by a sequence identity of 30-60%, if the structural difference metric is less than about 30, the functions are highly conserved. For clearly homologous protein domain pairs, usually sharing less than 40% sequence identity, we observe that often the chemical groups involved in the functions, and the cofactors differ. We also report of extremely unusual cases of closely-related homologues belonging to entirely different classes of enzymes. Such drastic shifts in the gross functions of homologues seem to be achieved by retooling of catalytic residues or by altering the stability of the intermediates in the biochemical reactions. Our work provides guidelines on the functional annotation based on homology searches and in structural genomics initiatives.
Keywords: enzyme classification, homologous proteins, protein evolution, protein function, protein structure
One of the most challenging tasks in the post genomic era is the prediction of functions of the putative gene products encoded in genomes [Andrade and Sander, 1997]. In general, about 40% of the sequences in the genomic data correspond to the open reading frames whose annotations are unavailable or incomplete or incorrect [Kenyon et al., 2002]. Experimental characterization of these putative proteins is laborious and time consuming. Having at least a rough initial idea of the function of a protein can help immensely in designing experiments. Use of bioinformatics tools enables rational design of experiments to identify the functions of these proteins. Popular bioinformatics approaches for function annotation include homology-based search [Rost et al., 2003]. This process is aided by powerful algorithms such as FASTA [Pearson, 2000], BLAST [Altschul and Gish, 1996], PSI-BLAST [Altschul et al., 1997], HMM [Krogh et al., 1994; Eddy, 1998; Karplus et al., 1998; Söding, 2005], MulPSSM [Gowri et al., 2006], Cascade PSI-BLAST [Sandhya et al., 2005; Bhadra et al., 2006], which are known to identify distant relationships in sequence space. Underlying assumption in these approaches is that related sequences fold similarly and chances of functional similarity between related proteins can be explored. However, structural and functional annotations should be transferred cautiously as errors in annotation can be spread easily [Brenner, 1999; Devos and Valencia, 2001; Tian and Skolnick, 2003].
The definition of protein function is subjective and contextual. Sequence and structural similarity are correlated when there is a clear homology between the proteins [Chothia and Lesk, 1986; Russell and Barton, 1994; Wood and Pearson, 1999; Wilson et al., 2000]. When the proteins share high sequence and structural similarities, then often they also share good functional similarity. However, exceptions to this rule exist in divergent sequences where the similarity is poor at the level of amino acid sequences, while the structural and functional similarities are retained. The classical example is that of the globin family sequences wherein two members of the globin family could have less than 10% sequence identity. Members of a family which are related by superfamily also share poor sequence similarity retaining similar fold. However, the functions and functional sites in the 3-D structure suggest divergent evolution in these cases [Murzin et al., 1995; Orengo et al., 2003].
Functional relationships between the proteins that share significant sequence and structural similarity have been already explored [Chothia and Lesk, 1986; Russell and Barton, 1994; Wood and Pearson, 1999; Wilson et al., 2000]. Sequence similarity thresholds for the transfer of functional annotations have already been proposed [Devos and Valencia, 2000; Pawlowski et al., 2000; Wilson et al., 2000; Todd et al., 2001; Tian and Skolnick, 2003]. Another approach to infer protein function from sequence is to look for the presence of patterns or motifs. However, similar motifs can be present in completely different enzymes. An approach has been proposed for improving the specificity of function prediction using pattern matching procedures [Via and Helmer-Citterich, 2004]. However, this method is applicable only for enzymes with known 3-D structures.
Protein domains can be defined as autonomous folding units with evolutionary and functional independence [Doolittle, 1995]. It has been well documented that database searches at the domain level are more effective compared to the complete sequence level [George and Heriga, 2002]. Currently available protein domain structure classification databases include SCOP (Structural Classification of Proteins) [Murzin et al., 1995], CATH (Class, Architecture, Topology and Homology) [Orengo et al., 2003], FSSP (Families of Structurally Similar Proteins) [Holm and Sander, 1996] and PFAM (Protein Families Database) [Bateman et al., 2002]. In CATH and FSSP databases, proteins are grouped based on the results of the structure comparison algorithms and do not reflect any functional similarity or evolutionary relationship. PFAM is derived using the protein sequence alignments and profile HMMs. Hence, a detailed analysis on the sequence, structure, functional relationships cannot be facilitated by PFAM database. SCOP database is constructed manually, by consideration of evolutionary and functional information from literature. Hence, it provides a useful division for sequence, structure comparisons of homologous protein domains. Our analysis is confined at the level of protein domain families in SCOP and at this level the homology is inferred largely from high similarity in the amino acid sequences and 3-D structures. The recent release of SCOPEC [George et al., 2004], a database of catalytic domains has led to the better understanding of sequence, structure and functional relationships.
The availability of large number of known structures, biochemical characterization of sequences and high quality mapping of functions at the level of protein domains enabled us to probe this relationship in larger depths. Distribution of the six enzyme classes in the dataset of closely-related homologous protein domain families and the interchange between these enzyme classes within the closely-related homologous protein domains are analyzed. Examples of homologous enzyme pairs within a family performing different functions in terms of their mechanisms and hence belonging to different enzyme classes are discussed in detail.
Dataset
The dataset includes 221 multi-member enzyme families (families with two or more members) from the PALI (Phylogeny and ALIgnment of homologous protein structures) database set-up by us [Balaji et al., 2001; Gowri et al., 2003]. Enzyme Commission codes (EC codes) are used for functional assignment. SCOPEC (Version 1.0) database is used for mapping the EC codes to the protein domains in homologous enzyme families. For the present analysis, those catalytic domains for which an unambiguous EC code has been assigned using SCOPEC database are considered. 1208 catalytic domains from 221 protein domain families have been considered for the present analysis.
Structural similarity between domains
Sequence similarity is expressed in terms of percentage sequence identity between homologues. The sequence-based alignments are generated using MALIGN program [Johnson et al., 1993]. Structural similarity is generally expressed in terms of the root-mean square deviation (RMSD) of Cα atoms, which is calculated from the distance between Cα atoms of equivalent residues in the two structures. Since the RMSD calculation weighs the distance between all the residue-residue alignments equally, a small number of local structural deviations could result in a high RMSD even when the global topologies of the two structures compared are similar. Furthermore, the average RMSD of the randomly related proteins depends on the lengths of the structures compared. This renders the absolute magnitude of the RMSD meaningless [Betancourt and Skolnick, 2001]. In order to overcome these problems, structural dissimilarity measure abbreviated as SDM using Levitt-Gerstein (LGM) metric [Levitt and Gerstein, 1998] that weighs the residue pairs at smaller distances relatively strongly than those at larger distances is used. STAMP (Structural Alignment of Multiple Protein structures) suite of programs is used for structural alignment generation [Russell and Barton, 1992].
Degree of functional conservation within protein domain families
Enzyme Commission (EC) codes are commonly used for functional classification of enzymes [Bairoch, 2000]. Each enzyme function in the EC code is represented by the letters "EC" followed by four numbers, representing hierarchy of functional classification. EC codes do not specify the enzymes, but the functions characterized by the enzyme. The first number represents the type of enzymatic activity such as hydrolases (those enzymes that cleave the substrate by hydrolysis), isomerases (which participate in intramolecular rearrangement of the substrates) etc. The second number corresponds to the nature of chemical bonds or groups in the substrate on which the enzyme acts. The third number refers to nature of the cofactors which the enzyme requires in catalyzing the reaction. The fourth number corresponds to the nature of the substrate on which they act.
The diversity of the enzyme functions with limited number of protein folds [Chothia, 1992] suggests that new enzymes have likely evolved from the pre-existing ones by gene duplication followed by retooling of the active site to catalyze new reactions. Although, divergence in function is accompanied by divergence in sequence in protein superfamilies (distantly-related sequences) [Babbitt, 2003; Bartlett et al., 2003], the members within a family are expected to perform similar functions, in case of enzymes it is the enzymatic reactions. In order to understand the divergence in enzyme functions among the members of protein families (closely-related sequences), the functional divergence is analyzed as a function of Enzyme Commission classifier "EC code". The variations at each of the four levels of hierarchy of EC classification are analyzed for every possible enzyme pairs in all the 221 protein structural families. The frequency of variation of enzyme functions within the homologous protein domain families is plotted as a pie-chart (Fig. 1). In this figure we have represented the number of pairs of homologous enzyme domains with all the four numbers in their EC codes same, only the first three numbers same, only the first two numbers same, only the first number same and not even the first number same. This figure, thus indicates frequency of homologous enzymatic domains of diverse levels of functional similarity.
38% of pairs show conservation of enzyme functions with EC codes of protein domains in the pairs identical. Change in substrate specificity is the most common functional variation observed among the homologous domain pairs accounting approximately for 43% of the observed cases. These examples correspond to conservation of catalytic residues with variations occurring at the substrate or cofactor binding sites. In 10% of the enzyme pairs, a paradoxical behavior in which the functional class of the enzyme pairs is different i. e. mismatching at the first number in the EC code was observed. The 16 SCOP enzyme families in which the domain pairs differ in their enzyme class are listed in Tab. 1.
| Table 1: List of homologous protein domain pairs with very different functions. |
| SCOP FAMILY | DOMAIN 1 (EC CODE) |
SCOP DOMAIN NAME | DOMAIN 2 (EC CODE) |
SCOP DOMAIN NAME | SID % |
SDM |
| Class I aldolase | 1f05a_ (2.2.1.2) |
Transaldolase | 1l6wa_ (4.1.2.14) |
Fructose-6-phosphate aldolase | 26.3 | 99.02 |
| D-Glucarate dehydratase-like | 1bqg_1 (4.2.1.40) |
D-Glucarate dehydratase | 2chr_1 (5.5.1.7) |
Chlormuconate cycloisomerase | 23.8 | 81.54 |
| Phosphoenolpyruvate mutase/Isocitrate lyase-like | 1f6la_ (4.1.3.1) |
Isocitrate lyase | 1pyma_ (5.4.2.9) |
Phosphoenolpyruvate mutase | 25.7 | 59.24 |
| Tryptophan biosynthesis enzymes | 1nsj__ (5.3.1.24) |
N-(5'-Phosphoribosyl)antranilate isomerase, PRAI | 1pii_1 (4.1.1.48) |
Indole-3-glycerophosphate synthase, IPGS | 29.8 | 27.06 |
| Crotonase-like | 1hzda_ (4.2.1.17) |
AUH protein | 1nzya_ (3.8.1.6) |
4-Chlorobenzoyl-CoA dehalogenase | 26.7 | 33.47 |
| Alanine racemase-like, N-terminal domain | 1bd0a2 (5.1.1.1) |
Alanine racemase | 2toda2 (4.1.1.17) |
Eukaryotic ornithine decarboxylase | 18.1 | 66.52 |
| Amylase, catalytic domain | 1ciu_4 (2.4.1.19) |
Cyclodextrin glycosyltransferase | 1qhoa4 (3.2.1.133) |
α Amylase | 45.8 | 12.87 |
| Tyrosine-dependent oxidoreductases | 1ek6a_ (5.1.3.2) |
Uridine diphosphogalactose-4-epimerase (UDP-galactose 4-epimerase) | 1hdoa_ (1.3.1.24) |
Biliverdin IX β reductase | 32.7 | 37.08 |
| Class I glutamine amidotransferases (GAT) | 1gpma2 (6.3.5.2) |
GMP synthetase | 1qdlb_ (4.1.3.27) |
Anthranilate synthase GAT subunit, TrpG | 26.2 | 28.44 |
| Nitrogenase iron protein-like | 1ihua2 (3.2.3.16) |
Arsenite-translocating ATPase ArsA | 1iwea_ (6.3.4.4) |
Adenylosuccinate synthetase, PurA | 30.4 | 106.28 |
| *AAT-like | 1bjwa_ (2.6.1.1) |
Aspartate aminotransferase, AAT | 1m6sa_ (4.1.2.5) |
Low-specificity threonine aldolase | 24.3 | 65.72 |
| *Cystathione synthase-like | 1cs1a_ (2.5.1.48) |
Cystathionine gamma-synthase, CGS | 1cl1a_ (4.4.1.8) |
Cystathionine β-lyase, CBL | 31.1 | 29 |
| *GABA aminotransferase like | 2dkb__ (4.1.1.64) |
Dialkylglycine decarboxylase | 2oata_ (2.6.1.13) |
Ornithine aminotransferase | 30.2 | 32.08 |
| AraD-like aldolase/epimerase | 1e4cp_ (4.1.2.17) |
L-Fuculose-1-phosphate aldolase | 1k0wa_ (5.1.3.4) |
L-Ribulose-5-phosphate 4-epimerase | 30.5 | 30.5 |
| DNase I-like | 1ako__ (3.1.11.2) |
DNA repair enzyme exonuclease III | 1hd7a_ (4.2.99.18) |
DNA repair endonuclease Hap1 | 24.7 | 29.91 |
| Class II glutamine amidotransferases | 1ct9a2 (6.3.5.4) |
Asparagine synthetase B,N-terminal domain | 1ecfa2 (2.4.2.14) |
Glutamine PRPP amidotransferase, N-terminal domain | 30.2 | 39.43 |
| * PLP-dependent enzyme families. |
Sequence - structure - function relationship within the closely-related homologous enzyme families
The sequence-structure relationship, as expressed by the root-mean-square (RMS) deviations of the aligned Cα atoms and percent sequence identity, has been characterized previously as an exponential function [Chothia and Lesk, 1986; Flores et al., 1993; Russell and Barton, 1994]. Sequence identity threshold of 60% has been suggested for function annotation through homology [Tian and Skolnick, 2003]. However, identification of protein structural similarity/dissimilarity thresholds for function transfer would be appropriate; as proteins sharing similar folds generally perform similar functions. In order to explore the interdependencies of protein sequence and structural similarity upon the functional conservation, structural dissimilarity measure (SDM) is plotted as a function of sequence identity for the catalytic domain pairs (Fig. 2).
Information on extents of functional similarities for all the pairs has been incorporated by plotting the values in three different colors. It is obvious from the plot that, domain pairs with identical EC codes show variability in sequence and structural similarity even among the members within a family. 65% of the domain pairs with identical EC codes cluster below an SDM value of 40 and a sequence identity greater than 30%. Very few pairs with SDM value less than 30 have completely different functions even if sequence identity is less than 60%. In other words, homologous pairs with sequence identity in the range of 30-60% with good gross structural similarity (SDM lower than 30) correspond to same function. 18% of the pairs with identical EC codes share good structural similarity (below an SDM value of 40) though share poor sequence similarity. The most obvious reason for such an observation could be that although the sequences have diverged in that family, chemistry of the protein activity in the protein family demands structural conservation. In 11% of the domain pairs with identical EC codes, structural dissimilarity is very high while the sequence identity is poor. High SDM values could be due to the structural plasticity, mutation and the different functional forms of the enzymes (such as active, inactive conformation). While the pairs with very different functions usually have SDM values higher than 30 not all the pairs with SDM greater than 30 might correspond to different functions. However, a good indication of retention or variation of functions in pairs with SDM greater than 30 is provided by conservation or otherwise of functionally important residues. Hence, it is important to analyze the conservation or otherwise of functional residues in such pairs in order to predict if the functions of such homologous could be same or different.
Distribution and interchange of enzyme functional classes within the protein domain families
In order to identify the most preferred enzyme class across the different folds and families in the protein structural classification, the distribution of enzyme functional classes in the various fold, superfamilies and families are tabulated (Tab. 2). Nearly 84 enzyme families with known 3-D structures are hydrolases followed by isomerase and oxidoreductase functions. However, the enzyme families with lyase function are more prone to enzyme class variations (Tab. 2). The frequency of observed inter-conversion between the different enzyme functional classes within the protein domain families is shown in Fig. 3. There are 8 conversions between the enzyme functional classes that are not so far observed (represented by 0). 9 conversions have been observed only once (denoted by 1) in the figure below. Commonly observed inter-conversion is lyase to isomerase (observed in 4 cases). Probable inter-conversions in the enzyme functions in the various families, superfamilies and folds are listed in Tab. 3. In this table, function 1 refers to the function of most of the members of a protein family. Functions 2 and 3 refer to the different functional classes to which the other members of the family belong. Among the various fold types, TIM β/α barrel fold and PLP-dependent transferase fold both belonging to α/β structural classification accommodate large functional variations as seen in Tab. 3.
| Table 2: Distribution of enzyme functional class in various protein domain families, superfamilies and folds. |
| Enzyme class | No. of families | Families with change in enzyme class* |
No. of Superfamilies |
No. of folds |
| Oxidoreductase | 41 | 1 | 34 | 34 |
| Transferase | 41 | 4 | 32 | 28 |
| Hydrolase | 84 | 3 | 54 | 48 |
| Lyase | 28 | 8 | 23 | 20 |
| Isomerase | 15 | 1 | 15 | 10 |
| Ligase | 12 | 1 | 10 | 9 |
| Total | 221 | 18 | 168 | 149 |
| * Change in enzyme functional class with the enzyme class in the first column as the predominant function. |
![]() Click on the thumbnail to enlarge the picture |
Figure 3: Frequency of inter-conversions between the enzyme functional classes within the protein domain families. |
| Table 3: Interconversions between enzyme functional classes in protein domain families. |
| Function 1 | Enzyme Family name | Enzyme superfamily name | Fold | Function2 | Function3 |
| Transferases | Citrate synthase | Citrate synthase | Citrate synthase | Lyases | - |
| Lyases | Class I aldolase | Aldolase | TIM β/α-barrel | Transferases | - |
| Isomerases | D-Glucarate dehydratase-like | Enolase C-terminal domain-like | TIM β/α-barrel | Lyases | - |
| Lyases | Phosphoenolpyruvate mutase/Isocitrate lyase-like | Phosphoenolpyruvate/pyruvate domain | TIM β/α-barrel | Isomerases | - |
| Lyases | Tryptophan biosynthesis enzymes | Ribulose-phoshate binding barrel | TIM β/α-barrel | Isomerases | - |
| Lyases | Crotonase-like | ClpP/crotonase | ClpP/crotonase | Isomerases | Hydrolases |
| Lyases | Alanine racemase-like, N-terminal domain | PLP-binding barrel | TIM β/α-barrel | Isomerases | - |
| Hydrolases | Amylase, catalytic domain | (Trans)glycosidases | TIM β/α-barrel | Transferases | - |
| Oxidoreductases | Tyrosine-dependent oxidoreductases | NAD(P)-binding Rossmann-fold domains | NAD(P)-binding Rossmann-fold domains | Lyases | Isomerases |
| Lyases | Class I glutamine amidotransferases (GAT) | Class I glutamine amidotransferase-like | Flavodoxin-like | Ligases | Hydrolases |
| Ligases | Nitrogenase iron protein-like | P-loop containing nucleoside triphosphate hydrolases | P-loop containing nucleoside triphosphate hydrolases | Oxidoreductases | Hydrolases |
| Hydrolases | DnaQ-like 3'-5' exonuclease | Ribonuclease H-like | Ribonuclease H-like motif | Transferases | - |
| Transferases | AAT-like | PLP-dependent transferases | PLP-dependent transferases | Lyases | - |
| Lyases | Cystathionine synthase-like | PLP-dependent transferases | PLP-dependent transferases | Transferases | - |
| Transferases | GABA-aminotransferase-like | PLP-dependent transferases | PLP-dependent transferases | Lyases | Isomerases |
| Lyases | Tryptophan synthase β subunit-like PLP-dependent enzymes | Tryptophan synthase β subunit-like PLP-dependent enzymes | Tryptophan synthase β subunit-like PLP-dependent enzymes | Hydrolases | Transferases |
| Hydrolases | DNase I-like | DNase I-like | DNase I-like | Lyases | - |
| Transferases | Class II glutamine amidotransferases | N-Terminal nucleophile aminohydrolases (Ntn hydrolases) | Ntn hydrolase-like | Ligases | - |
The functional variations observed within some protein domain families (Tab. 1) are mainly due to the elusive nature of the EC nomenclature. Within the same family the homologous enzymes mostly use the same chemical strategy. By retooling of the active site by addition, deletion or modification of a single residue/prosthetic group, these enzymes bring changes in the stability of intermediates and thus result in different overall reactions. Thus a small change in these protein domain families lead to diversity in the reactions they catalyze. Few of these protein domain families in which the members share high sequence similarity (expressed by their percentage sequence identity values) and high structural similarity (exemplified by their lower SDM values) are discussed in the sections below.
Amylase catalytic domain family
The catalytic domains of cyclodextrin glycosyltransferase (CGTase; 1ciu_4) and α-amylase (1qh0a4) both belong to amylase catalytic domain family. CGTase (EC 2.4.1.19) catalyzes transglycosylation reactions, whereas α-amylase (EC 3.2.1.133) and many other members of this family are known to be hydrolases. Previous sequence analyses led to several cases of incorrect classification of CGTase as α-amylase [Janeček et al., 1995]. Comparison of their catalytic domains shows high sequence and structural similarity (45.8% sequence identity and 12.9 SDM). However, their EC code shows a large functional divergence.
Both the proteins degrade starch by breaking α(1→4)-glycosidic bonds into linear products and the active site residues namely Asp230, Glu258, Asp329 (CGTase numbering) are absolutely conserved (Fig. 4). Biochemical investigations show that CGTase is known to catalyze four different reactions namely: cyclization, coupling, disproportionation and saccharification, but preferably catalyzes transglycosylation reaction [Penninga et al., 1995]. Residue 196 present in the active site cleft in CGTase is either Tyr or Phe, whereas small residue like Gly, Leu, Ser, Thr or Val is present in amylases [Wind et al., 1998]. Phe196 and Phe184 expose a hydrophobic surface and stabilize the intermediate important for cyclization reaction. This has been confirmed by the site directed mutagenesis studies of F196G. Such a mutation in CGTase results in drastic reduction in cyclization activity and saccharification activity is doubled [Penninga et al., 1995].
![]() Click on the thumbnail to enlarge the picture |
Figure 4: Structural superposition of α-amylase (1qh0a4, green) and CGTase (1ciu_4, violet) catalytic domains. Active site residues and the phenylalanine residues of CGTase (Phe196 & Phe 184; pink color) and the structurally equivalent residues from α-amylase (red color) are shown as ball and stick. This figure is generated using SETOR software [Evans, 1993]. |
Class I aldolase family
The catalytic domains of fructose-6-phosphate aldolase (1l6wa_; EC 4.1.2.-) and transaldolase (1f05a_; EC 2.2.1.2) belong to class I aldolase family. All the members of this family are known to employ a Schiff-base mechanism in which an active site lysine residue catalyses the formation of Schiff-base intermediate with the substrate. Transaldolases catalyze the reversible transfer of a dihydroxyacetone moiety, derived from fructose-6-phosphate to erythrose-4-phosphate yielding sedoheptulose-7-phosphate and glyceraldehydes-3-phosphate. Fructose-6-phosphate aldolases, however, catalyze the cleavage and formation of fructose-6-phosphate from dihydroxyacetone and glyceraldehydes-3-phosphate and lacks the transaldolase activity [Schurmann and Sprenger, 2001]. These protein domains share a sequence identity of 31.8% and their structural dissimilarity (SDM) is 89.6.
A comparison of these two domains shows that the mechanism of fructose-6-phosphate aldolase enzyme is similar to that of transaldolases (Fig. 5). However, subtle differences in the active site compositions within the family might be responsible for the switch in the enzyme activity. It has been proposed that modifying the affinity for the acceptor substrate (Arg134/Arg228) or differences in the hydrogen bonding patterns of the catalytic water molecule in the active site (Glu, Thr in transaldolases and Gln, Thr and Tyr in aldolases) change the stability of the Schiff-base intermediate [Thorell et al., 2002]. Control of transaldolase versus aldolase activity would be exerted through the relative stability of Schiff-base intermediate as transaldolases require sufficient life-time of the intermediate to allow product release and binding of the acceptor substrate.
![]() Click on the thumbnail to enlarge the picture |
Figure 5: Structural superposition of fructose-6-phosphate aldolase (1l6wa_; green color) and transaldolase (1f05a_; violet color) catalytic domains. Active site residues involved in hydrogen bonding of fructose-6-phosphate and transaldolase are shown in pink and red colors respectively. This figure is generated using SETOR software [Evans, 1993]. |
Pyridoxal phosphate dependent enzyme families
Three of the examples mentioned in Tab. 3 have pyridoxal phosphate (PLP) as cofactor, which gives rise to the functional promiscuity within these families. PLP-dependent enzymes catalyze a wide variety of reactions (Tab. 3) involving amines and amino acids generally, by stabilizing the carbanionic intermediates. The functional diversity of PLP-dependent enzymes is illustrated by the fact that the enzyme commission has more than 140 EC codes assigned to PLP-dependent enzymes [Toney, 2005]. Functions of PLP-dependent enzymes range from racemase, decarboxylase, transaminase, aldolase to synthase. Despite the diversity of their overall enzyme catalyzed reaction, all the PLP-dependent enzymes share a common step in their function which is the Schiff-base exchange. All known PLP-dependent enzymes exist as Schiff-base with an active site lysine residue (internal aldimine). The incoming amine-containing substrate displaces the lysine ε-amino group from the lysine residue and forms a new Schiff-base with the substrate (external aldimine).
Divergence in reaction specificity occurs from this point as the external aldimine undergoes covalency changes at α-, β- or γ-carbon atoms leading to diverse reactions as indicated by their EC codes [Alexander et al., 1994]. Hence, assigning the function of a PLP-dependent enzyme simply on the basis of sequence/structure criteria is not straightforward as the versatility of function in these enzymes is conferred upon a cofactor by the intricate design of their active site. For example, the catalytic domains of cystathione γ-synthase (CGS; EC 2.5.1.48) and cystathione β-lyase (CBL; EC 4.4.1.8) belong to the cystathione synthase-like family. CGS catalyzes the first committed step in methionine biosynthesis which is the transsulfuration of cysteine to form the intermediate cystathione. CBL catalyzes the subsequent step in the pathway by a β-cleavage of L-cystathione to homocysteine, pyruvate and ammonia. The domains share 31% sequence identity and a structural distance of 29.
Both the enzymes require PLP as cofactor for their activity. It has been proposed that the two enzymes would have probably evolved from a common ancestor [Martel et al., 1987] which is paralleled by a number of common structural and biochemical characteristics such as: both the enzymes function in vivo as homo-tetramers with one PLP cofactor per monomer. In spite of the complete conservation in the active site architecture, the differences in the substrate-binding characteristics are responsible for the different reaction chemistry. There are variations in the size of active site entrance between CBL and CGL. Hence, in this example of homologous pair of enzymes the active site residues are completely conserved but the size of the entrance to the active site brings differences in substrate binding and changes the reaction chemistry.
Enoyl-CoA-isomerase enzymes
Fatty acids are one of the main sources of metabolic energy, precursor of hormones and are sometime intracellular messengers and also the building blocks of biological membranes. Fatty acids are metabolized by the sequential removal of two-carbon units by the β-oxidation pathway. Only saturated fatty acids or cis- or trans-unsaturated fatty acids with double bonds extending from even-numbered carbon atoms can enter into the main β-oxidation pathway. But most naturally occurring fatty acids are cis-unsaturated either at odd- or even-numbered positions [Stryer, 1995]. Thus, a number of auxiliary pathways exist that allow the fatty acid derivatives to enter the main β-oxidation pathway [Tsai et al., 1969; Müller-Newen and Stoffel, 1991; Tserng and Jin, 1991; van Veldhoven et al., 1991].
The auxiliary pathway responsible for the fatty acid metabolism involves three enzymes [Luthria et al., 1995]. Δ3,5,Δ2,4-dienoyl-coenzyme A (dienoyl-CoA) isomerase shifts two double bonds from position 3 and 5 to positions 2 and 4, respectively [Luo et al., 1994]. The product is passed on to 2,4-dienoyl-CoA reductase, an NADPH-dependent enzyme which, in turn produces 3-enoyl-CoA. The third enzyme in the auxiliary pathway is Δ3,Δ2-enoyl-CoA-isomerase.
Δ3,5,Δ2,4-dienoyl-CoA isomerase (1dcia_; EC 4.2.1.17) and Δ3,Δ2-enoyl-CoA-isomerase (1hnua_; EC 5.3.3.8) belong to the enoyl-CoA-isomerase protein domain family. Sequence comparisons of dienoyl-CoA isomerase with enoyl-CoA isomerase share similar sequence motifs and same overall fold [Engel et al., 1996]. In enoyl-CoA isomerase the catalyzed reaction concerns shifting of a double bond by proton abstraction (from atom C2) and subsequent proton donation (to atom C4). This reaction is similar to the reaction catalyzed by dienoyl-CoA isomerase; however, the proton abstracted from C2 atom by dienoyl-CoA isomerase is transferred to C6 of the fatty acyl chain, which is further away from the common thioester moiety. Conserved salt-bridge interaction between an Asp-Lys is shown in Fig. 6. Apart from these, a conserved Glu196 (1dcia_ numbering) which is directly involved in the reaction is also shown. However, the catalytic Asp204 which is actively involved in proton transfer to the Glu196 is substituted by a Phe in the enoyl-CoA isomerase structure. Hence, the variation in the property of the active site alters the chemistry of the reaction.
![]() Click on the thumbnail to enlarge the picture |
Figure 6: Structural superposition of dienoyl-CoA isomerase (1dcia_; green) and enoyl-CoA isomerase (1hnua_; violet). Conserved Asp-Lys salt bridge interactions in both the enzymes and residues directly involved in catalysis are shown in ball and stick model. This figure is generated using SETOR software [Evans, 1993]. |
Homologous proteins display complex relationship among sequences, structures and functions. Sequence/structure similarity is used as a quick and simple measure for function prediction. The present work clearly demonstrates that functional annotation of individual domains is most reliable if the sequence identity between the homologues is better than 60%. More importantly, it is also shown that if the sequence identity is in the range of 30-60% the Structural Distance Metric (SDM) serves as a useful diagnostic for giving a quick idea about potential functional similarity. Often, proteins of higher sequence divergence (30-60%) with SDM values lower than 30 correspond to same functions. This feature is particularly useful in the structural genomics initiatives when the 3-D structure of a protein is solved before its function is known.
The enzyme commission (EC) code, which is the most widely accepted enzyme classification scheme, classifies enzyme functions based on the overall reaction of the enzymes rather than the molecular details of the reaction. Homologues inherit common catalytic machinery from their ancestors through evolution. However, sometimes retooling of the region near active site through mutation events result in different overall reactions in spite of the conservation of active site residues.
Enzyme families with most of members belonging to the lyase classification are probable candidates for inter-conversion into other enzyme classes. Lyase - isomerase interconversion is a common transformation. These inter-conversions among the enzyme classes are observed rarely among the closely-related homologues within protein domain families. All these observations suggest that homologous enzymes alter the chemistry of their reactions by retaining similar folds and subtly varying the active site topology.
AR is supported by a fellowship from the Department of Biotechnology, Government of India. We thank Mr. D.C. Dinesh for his help in the preparation of the manuscript. This research is supported to NS by the Department of Biotechnology, Government of India.