In Silico Biology 3, 0038 (2003); ©2003, Bioinformation Systems e.V.  

Investigation of interaction between Pax5 isoforms and thioredoxin using de novo modelling methods

Miroslava Cuperlovic-Culf1,*, Gilles A. Robichaud1, Michel Nardini1 and Rodney J. Ouellette1,2




1Laboratorie de pathologie moléculaire, Institut de recherche médicale Beauséjour
37 Providence Street, Moncton, NB. E1C 8X3, Canada.
2Départment de chimie et biochimie, Université de Moncton, Moncton, N.-B., Canada

* corresponding author
Email: miroslavac@health.nb.ca
Fax: 506-862-4222
Tel: 506-862-4848





Edited by E. Wingender; received June 03, 2003; revision received September 26, 2003; accepted September 28, 2003; published October 10, 2003



Abstract

The Pax-5 transcription factor plays a crucial role in B-cell development, activation and differentiation. In murine B-cells four different isoforms of Pax-5 have been identified, and their role in the regulation of the activity of the wild-type protein was revealed although still not fully understood. Using theoretical methods, we investigated the properties of one region of the Pax-5e and Pax-5d isoforms (named UDE domain) and we present a possible theoretical model for the interaction of this domain with thioredoxin that have been previously postulated based on the experimental results. Domain UDE (MW 4.8kDa) is characterised by an extremely high ratio of positively charged residues (8) in comparisons to negatively charged amino acids (3), as well as unusually large concentrations of prolines (11.6%) and cysteines (4.7%). This is indicative of its role in protein-protein interaction. The experimental 3D structure for either UDE domain or for any analogous sequence is not yet available, and therefore we resorted to various bioinformatics methods in order to predict the secondary and 3D structure from the primary sequence of UDE. Physicochemical properties of the predicted UDE structure gave more indication about possibilities for UDE-thioredoxin binding. In addition, UDE domain was shown to have both sequence and structure analogous to a segment of NAD-reducing hydrogenase HOXS subunit which is believed to interact with thioredoxin. These studies showed that the UDE domain in Pax-5d and Pax-5e represents an ideal binding site for thioredoxin and we developed a model of UDE-TRX complex with two disulphide bridges. The active site of thioredoxin remained exposed after binding to UDE in this model and therefore binding of thioredoxin to Pax-5d could explain the unexpectedly high resistance of this isoform to oxidation. The complex between thioredoxin and Pax-5e can be a method for transportation of thioredoxin into the nucleus and also into the the vicinity of Pax-5a, explaining the observed activator role of Pax-5e.

Key words: protein structure, protein structure prediction, protein-protein interaction, molecular modelling, bioinformatics, transcription factors, alternative splicing, HMMSTR, Rosetta, theoretical analysis, proteomics, electrostatics, disulphide bridge calculation



Introduction

Transcription factors play a central role in the regulation of gene expression. They can be defined as proteins that regulate transcription after nuclear translocation by specific interaction with DNA, or by stoichiometric interaction with a protein that can be assembled into a sequence-specific DNA-protein complex [Wingender, 1997] Alternative splicing of transcription factor genes often leads to the presence of several transcription factor isoforms which have various roles in the regulation of gene transcription [Strachan and Read, 1999]. This mechanism provides the cell with rapid and efficient ways to dictate a variety of functionally related activities from a single gene locus [Zwollo et al., 1997].

One small family of transcription factors involved in development is the paired box (Pax) gene family containing at least nine members, Pax-1 through Pax-9 [Walther et al., 1991; Stapleton et al., 1993]. One of the best studied members of Pax family is Pax-5. Pax-5 is expressed at all stages of B cell development except for the terminally differentiated plasma cell stage and it influences the expression of several B-cell-specific genes. B-cell specific transcription factor BSAP which is encoded by the Pax-5 gene regulates the CD19 gene which, in turn, encodes for co-stimulatory molecule of the B cell receptor [Dorfler and Busslinger, 1996]. Alternative splicing of transcription factors is a common theme among members of Pax family. The murine Pax-5 gene produces at least four characterised isoforms as a result of alternative splicing: Pax-5a (full-length Pax-5 or BSAP), Pax-5b, Pax-5d and Pax-5e (Figure 1) [Lowen et al., 2001; Zwollo et al., 1997]. The expression levels of the different isoforms varies according to B-cell development stages indicating an important regulatory role for these alternatively spliced products. In normal B-cells, Pax-5a, Pax-5d and Pax-5e are expressed at detectable levels, with Pax-5a expression level being significantly higher. Pax-5b is not expressed at detectable levels until the later stages of B-cell development [Lowen et al., 2001; Zwollo et al., 1997]. Previous experimental studies have determined various possible roles of murine Pax-5 isoforms, as summarised below.



Figure 1: Sequences of Pax-5 isoforms translated from isoforms cDNA’s using Translate.
a. Graphical representation of domains in Pax-5 isoforms



b. Naming convention and amino acid sequences of graphically represented domains, where domain abbreviations are: INT – short initial sequence, PAI and RED domains of the paired domain of Pax-5, BIG - complete C-terminal domain which includes octapeptide, homeodomain, serine-threonine-tyrosine rich transactivation domain and repression (inhibitory) domain, and UDE – domain unique to Pax-5d and Pax-5e.


The Pax-5a isoform is characterised as the full length (wild type) murine Pax-5 cDNA. This isoform performs Pax-5 function in regulation of the CD19 gene in B-cells and the activity of Pax-5a is regulated through changes in relative amounts of alternative isoforms Pax-5d and Pax-5e [Anspach et al., 2001; Tell et al., 1998; Tell et al., 2000]. Pax-5a is known to bind to several co-acting transcription factors. A well studied example is the binding with Ets-1 transcription factor where protein-protein binding is achieved through the paired domain of Pax-5 [Garvie et al., 2001]. Recent reports provided evidence on the regulatory role of a redox mechanism that involves Ref-1 protein in Pax-5a activity. Cell experiments in oxidising conditions have shown that oxidation leads to the formation of an intramolecular disulphide bond within a PAI region of the paired domain of Pax-5a causing interference with specific DNA binding [Anspach et al., 2001; Tell et al., 1998; Tell et al., 2000].

The Pax-5b isoform has a partial paired domain (RED domain) but a complete transactivation domain (BIG). From the X-ray structure of the complete paired domain (PAI and RED) bound to DNA and Ets-1, it is known that the RED domain provides only a minor contribution to the bonding with DNA (one amino acid forms bonds with two nucleic acids and another amino acid contacts a sugar phosphate backbone) and it is not involved in the interaction with Ets-1 [Garvie et al., 2001]. Therefore, although Pax-5b can still bind to DNA with its RED domain, it would have much lower affinity and specificity. Since the RED domain is not sensitive to oxidation Pax-5b would have the same DNA binding activity regardless of the oxidation state of the cell. The presence of Pax-5b in B-lymphoid cells appears to correlate with the differentiation stage, and therefore it was previously suggested that this isoform provides an important regulatory role in transcriptional regulation [Zwollo et al., 1997]. Pax-5b, with its partial paired domain and complete BIG domain, is able to interact with other regulatory proteins and possibly functions as a co-repressor by competing with Pax-5a for binding to unidentified regulatory molecules or co-acting transcription factors. [Zwollo et al., 1997; Lowen et al., 2001].

The Pax-5d isoform contains the complete DNA-binding domain (paired domain) but an alternative C-terminal domain (here named domain UDE in Figure 1 and further text). Earlier in vitro experimental studies have shown that Pax-5d binds to DNA with a similar affinity as Pax-5a [Lowen et al., 2001; Zwollo et al., 1997]. In addition, levels of Pax-5d protein are highest in resting cells, and decrease gradually in activated B cells. Based on the sequence and these experimental findings, it was suggested that Pax-5d represents a dominant transcriptional suppressor of Pax-5a through competitive binding to DNA. Since Pax-5d lacks both transactivating domain and a homeodomain region it would be unable to interact with the basal initiation complex and other regulatory factors [Lowen et al., 2001]. But, unlike Pax-5a's binding to DNA which is dependent on the Pax-5a being in the reduced state, it was determined from experiments on aged mice that, even though Pax-5d includes oxidation sensitive PAI domain, this isoform binds to DNA with the same affinity regardless of the oxidation state of the cell [Anspach et al., 2001].

The Pax-5e isoform contains the partial DNA-binding domain (RED), as does Pax-5b, and the alternate C-terminal domain (UDE) as does Pax-5d. Thus, Pax-5e is unlikely to interact strongly with the Pax-5 DNA binding site directly but may compete with Pax-5a for functional associations with other co-repressors or co-activators. Although the majority of co-factors interact with the transactivating domain of Pax-5, two transcription factors, PU.1 and AML1, both essential for proper B-cell development were shown to bind to the paired domain of Pax-5 even in the absence of the transactivating domain [Libermann et al., 1999; Maitra and Atchison, 2000]. Pax-5e expression was found to increase relative to Pax-5d in B-cells undergoing accelerated growth. Also, it was shown experimentally that Pax-5e specifically enhances the activity of Pax-5a possibly via the recruitment, shuttling, and/or redox function in the B cell [Lowen et al., 2001].

Previous transfection studies using SDS-PAGE gels have revealed protein bands at 38 kDa [Lowen et al., 2001; Zwollo et al., 1997] and 27 kDa [Zwollo et al., 1997] that would correspond to Pax-5d and Pax-5e bound to protein(s) of a M.W. of 8-12 kDa. Furthermore, the iodoacetamide treatment, known to be preventing oxidation-dependent aggregation of cysteine-containing proteins, did not result in a significant shift in Pax-5e migration [Lowen et al., 2001]. Subsequent Western Blot analysis revealed that substituting a serine in place of the most C-terminal cysteine residue in UDE region of Pax-5e results in a protein that migrates to the expected 19 kDa position and abolished a band at 27 kDa [Lowen et al., 2001]. This strongly suggested formation of a disulphide bridge complex between Pax-5e and another protein. Resistance of a disulphide bridge complex to reducing agents used in SDS-PAGE is possible in the case of either a single strong bond or multiple bonds. Subsequent experimental work pointed to a redox factor - thioredoxin (TRX) protein, finally showing that the complex found at 27 kDa in Pax-5e experiment can be immunoprecipitated using both anti-Pax-5e and anti-TRX antibodies [Lowen et al., 2001]. Thioredoxins are enzymes that participate in redox reactions via the reversible oxidation of an active centre disulphide bond. Interestingly thioredoxins are known to lack NLS site and therefore, one suggested function of Pax-5e involves shuttling TRX into the nucleus [Powis and Montfort, 2001].

Even though amino acid sequences of the murine Pax-5 isoforms are well known and the crystal structure of the paired domain (PAI and RED) bound to DNA and Ets-1 is available [Garvie et al., 2001; Wheat et al., 1999] there are no structural information about C-terminal domains of these proteins. Over the last decade, several very accurate methods have been developed for protein secondary structure prediction primarily based on the observation that similar protein sequences can be expected to have similar secondary structures and several reports have proven this direct relationship between protein sequence and structure [Chothia and Lesk, 1986; Rost, 1999; Devos and Valencia, 2000]. Modelling of protein's 3D structure is also becoming more important as the divide between the number of known protein sequences and known structures increases. For proteins with high sequence similarity to one or more template proteins with known structure, the most accurate structural predictions are achieved using comparative modelling. If a template structure can not be identified the target sequence may be modelled using de novo prediction methods [Schonbrun et al., 2002]. De novo methods have progressed to the point where models with correct overall topology can be obtained in a reasonable time. De novo methods start from the assumption that the native state of a protein is at the global free energy minimum, and carry out a large-scale search of conformational space for protein tertiary structures that are particularly low in free energy for the given amino acid sequence [Baker and Sali, 2001]. A method that has shown particularly good results in CAST4 experiment is Rosetta [Schonbrun et al., 2002; Simons et al., 2001]. Rosetta is based on a picture of protein folding in which short segments of the chain independently sample distinct distributions of local conformations biased by their local sequence [Schonbrun et al., 2002]. CAST4 experiments have shown that de novo methods give overall correct structure and it was estimated that RMSD is 4-8 Å for Rosetta method applied to 80 amino acids long proteins with no sequence similarity to any proteins with known structure [Baker and Sali, 2001]. Even though some initial information about protein function can be obtained on the basis of similarity with other better known proteins using sequence alignment between the investigated sequence and protein databases, protein function information has to be further established by studying the protein structure and properties using molecular modelling [Gallet et al., 2000; Lu et al., 2001].

In this paper we investigated the properties of domain UDE in murine Pax-5d and Pax-5e isoforms from the primary sequence and also secondary and 3D structures modelled using de novo methods. From the obtained theoretical structures for UDE and the experimental structure of TRX we developed a possible model for UDE -TRX interaction which would be in agreement with previously published experimental data.



Methods

All computational work was carried out using a Dell Workstation running under Linux RedHat 7.2. Figures were prepared within MOE and edited using StarOffice 6.0.


Sequence properties.

The protein sequences of the Pax-5 isoforms were obtained from the cDNA sequences using Translate software available from the ExPaSy web site [Appel et al., 1994]. General sequence properties were obtained using the program ProtParam [Appel et al., 1994]. Comparisons of the investigated sequence with protein sequences described in databases was achieved using the sequence alignment algorithms PSI-BLAST [Altschul et al., 1997] and WU-BLAST2 [Gish, 1996-2002]. The WU-BLAST2 search was performed in comparison to the Swiss-Prot database [Boeckmann et al., 2003]. The protein-protein interaction properties of thioredoxin were obtained from the DIP database [Xenarios and Eisenberg, 2001; Xenarios et al. 2002]. Information about transcription factors, i. e. Pax-5 and interaction of thioredoxin with other transcription factors, was obtained from the TRANSFAC database [Wingender et al., 1996].


Secondary and 3D structure prediction.

The theoretical structures were obtained using the bioinformatics de novo protein structure prediction server HMMSTR/I-Sites/Rosetta [Bystroff et al., 2000; Simons et al., 1999]. The HMMSTR/Rosetta Server predicts the secondary, local, supersecondary, and tertiary structure of proteins from the sequence. HMMSTR software uses a hidden Markov model based on the I-sites Library of sequence-structure motifs [Bystroff et al., 2000]. ROSETTA is a Monte Carlo Fragment insertion protein folding program for the de novo prediction of tertiary structure [Simons et al., 1999]. The accuracy of HMMSTR predicted secondary structures were estimated from comparisons with other secondary structure prediction method PSIPRED. PSIPRED [McGuffin et al., 2000] is a simple and reliable secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST.


Structure properties.

For a determination of structural properties, energy minimisation and structure comparisons we used the MOE program suite [MOE Molecular Operating Environment, Montreal, CCG Inc. 2001]. Any possible unusual structural features of models were determined from Ramachandran plots using MOE sequence editor. Surface properties were also calculated in MOE. Properties studied here were: electrostatic properties, partial charges and hydrophobicity. Electrostatic properties of surfaces were calculated from non-linear Poisson-Boltzmann equation as applied in MOE package. Results of all surface properties calculations were represented on a solid grid.

The crystal structures of the protein TRX were obtained from the Brookhaven Protein Database [Berman et al., 2000]. Multiple alignment was achieved using ClustalW method [Thompson et al., 1994].



Results and discussion

Sequence Properties

The primary protein sequences of the murine Pax-5 isoforms are shown in Figure 1. Domains PAI and RED constitute the paired domain of Pax-5 which is involved in DNA binding [Lowen et al., 2001; Wheat et al., 1999]. The short leading sequence, INT (here named for the initial domain) exists in all isoforms and is thus, likely to have the same role in all of them. The longest domain BIG (named here for its largest size in comparison to other segments) exists in the C-terminal part of Pax-5a and Pax-5b and includes the octamer, homeodomain and transactivation domain which are all essential in performing the role of Pax-5 in transcriptional regulation. Finally the domain unique to Pax-5d and Pax-5e, UDE (here named after unique domain to isoforms d and e) will be the focus of this study. Previous experimental efforts resulted in the hypothesis that the UDE domain is involved in binding to a TRX type protein [Lowen et al., 2001], but the theoretical feasibility of this binding as well as the theoretical model for the binding are unknown.

From the primary sequences of alternative splicing products it was possible to determine some general properties of the Pax-5 isoforms and the UDE domain, using the ProtParam program (Table 1). All isoforms except for Pax-5b are strongly positively charged. Pax-5b has a significantly lower positive to negative amino acids ratio, due to the deletion of PAI domain, and, according to pI prediction, Pax-5b is likely to be near electrically neutral at physiological pH. On the other hand, Pax-5e protein, also lacking PAI domain, is still strongly positively charged due to the properties of UDE domain.


Table 1: General properties of Pax-5 isoforms and UDE domain calculated using ProtParam.
  Pax-5a Pax-5b Pax-5d Pax-5e UDE
MW (kDa) 42 36 26 20 4.8
pI 9.08 7.79 10.02 9.8 10.05
Number of cysteines 3 1 5 3 2
Number of negatively charged res. (Asp+Glu) 34 31 20 17 3
Number of positively charged res. (Arg+Lys) 40 32 36 28 8


The UDE domain has a high abundance of charged residues: arginine (11.6%) and lysine (7%); polar residue serine (9.3%) as well as unusually high concentration of proline (11.6%). Since protein domains that are involved in protein-protein interaction have been previously proven to contain large concentrations of charged residues such as arginine and lysine as well as larger percentages of proline and cysteine residues than is the average for all proteins in the Swiss-Prot database [Gallet et al., 2000] it can be postulated from the amino acid composition of UDE that it can be expected to be involved in protein-protein interactions.


Homology search

Sequence alignment methods based on the homology search algorithms, WU-BLAST2 and PSI-BLAST are very well established as extremely accurate methods for the determination of sequences in well characterised proteins that are analogous to the investigated domain. The alignment for domain UDE, obtained using WU-BLAST2 in comparison to protein sequences available in the Swiss-Prot database is shown in Table 2 (PSI-BLAST results are the same and not shown). The best, longest segment alignment is found with the segment of the NAD-reducing hydrogenase HOXS subunit (HOXF_ALCEU in Table 2 and further text). Subunits and of HOXS constitute an NADH oxidoreductase. The short sequence aligned with UDE is not within the oxido-reduction active site of HOXS , but this cysteine-rich C-terminal section was suggested previously as a possible site for the formation of some regulatory disulphide bridges with thioredoxin's cysteine residues [Miginiac-Maslow and Lancelin, 2002; Schepens et al., 2000]. Zwollo and co-workers, 1997, determined experimentally that Pax-5e and Pax-5d form a strong bond with thioredoxin-like protein through at least one of the two cysteine residues in the UDE domain. Thus, it appears that domain UDE creates disulphide bridge(s) with cysteines of the other protein. Therefore considering that alignment with domain of HOXF_ALCEU is achieved without introduction of gaps and that both cysteine residues have homologous positions in two sequences, it can be concluded that although identity level is below 50% alignment results show that it is extremely likely that cysteines in these two domains have the same role in disulphide bridge formation with another protein. The possibility for similar behaviour of HOXF_ALCEU and UDE is investigated further in the following sections.


Table 2: Results of WU-BLAST2 sequence alignment results for domain UDE from Pax-5e and Pax-5d presented in MView 1.41.8
                           1 [        .         .         .         .  ] 43
                   100.0%    GKRWLRIPTRNAPSRVCVEPSQKGETKVQYDMLSCRGPGFPGS   
 1 SW:HOXF_ALCEU   40.5%    GARDARAVQISGPSGECVSVAKDGERKLAYEDLSCNG------   
 2 SW:Y14B_BPT4    61.1%    -------------------------TEKQYDELFQRGPSMPGS   
 3 SW:VIF_HV2D1    35.1%    GKNWIVVPTWRVPGRM---PQRKGAARKQWRRDHWRG------   
 4 SW:LEU3_BUCUH   37.1%    GKRWDHLPINERPERASLLPLRK-----QFDVLLC--------   
 5 SW:LEU3_BUCAI   34.3%    GKKWDNLPVEQRPERAALLPLRK-----QFDVLLC--------   
 6 SW:LEU3_BUCML   34.3%    GKKWNNLPIEKRPERASLLPLRK-----QFDILLC--------   
 7 SW:YPT4_SCHPO   40.0%    ---WLSDIRAMAPSTICIDP-QDQSLGIQYGDLSFRRPVHPSS   
 8 SW:FZD9_HUMAN   35.1%    -----RLPTRNDPHALCME--------------ACRAPGSYG-   
 9 SW:VIF_HV2SB    50.0%    GKRWIAVPTWRVPGRM---------------------------   
10 SW:VIF_HV2ST    50.0%    GKRWIAVPTWRVPGRM---------------------------   
11 SW:LEU3_BUCUA   34.3%    GKKWDNLPINQRPERASLLPLRK-----QFDVLLC--------   
12 SW:CHAD_RAT     29.7%    -RRWLEAKT-SRPDATCSSPAkkGQRIRDTDAlsCKSP-----   
13 SW:HB2T_HUMAN   35.3%    ---------RRVQPRVNVSPSKKGPLQ-HHNLLVCHVTDfpGS   
14 SW:LEU3_BUCAP   34.3%    GKKWDYLPIESRPERASLLPLRK-----QFDILLC--------   
15 SW:LEU3_BUCDN   34.3%    GKKWDTLPINERPERASLLPLRK-----QFDVLLC--------   
16 SW:LEU3_BUCRP   34.3%    GKKWDYLPIESRPERASLLPLRK-----QFDVLLC--------   
17 SW:LEU3_BUCUO   34.3%    GKKWDNFPIEERPERAALLPLRK-----QFDVLLC--------   
18 SW:LE21_DEIRA   52.2%    GKTWLRVP---APGEVCVSTSNR--------------------   
19 SW:LEU3_BUCUD   34.3%    GKKWDDLPINQRPERASLLPLRK-----QFDILLC--------   
20 SW:LEU3_BUCUM   34.3%    GKKWDHLPIDKRPERASLLPLRK-----QFDILLC--------   
21 SW:FZD9_MOUSE   35.3%    -----RLPTRNDPHALCME--------------ACRTPG----   
22 SW:RR41_SCHPO   34.5%    GRRWDEM--RNFQCRIGIEPSENGSAFIE--------------   
23 SW:A2AA_RAT     41.5%    -KRRTRVpsRRGPD-ACSAPRAKGKTKASqdSLPRRGPGAAG-   
24 SW:YS71_MYCTU   36.4%    -RRGLNPPKPQAAGRYRVQPSGKGGLRPGVDLSS---------   
25 SW:HB2P_HUMAN   35.3%    ---------RRVQPRVNVSPSKKGPLQ-HHNLLVCHVTDfpGS   
26 SW:HB2Q_HUMAN   35.3%    ---------RRVQPRVNVSPSKKGPLQ-HHNLLVCHVTDfpGS   
27 SW:FOH1_MOUSE   38.1%    -QRWLRVGT---------DSSWKGGLKVPYNV----GPGFAGN   
28 SW:PYR1_SYNEL   34.3%    -------PSADVPPRACLGGSF-GESGRVyeVAGIRQPGYPG-   
29 SW:NADD_SALTI   31.7%    GERELQ---RNAPS-----------------LIVCRRPGYP--   
30 SW:LACA_EQUAS   33.3%    ----LEIILREGANHVCVE-----DTDYAHYMFFCVGPCLPSA   
31 SW:HACA_THETH   47.8%    GRTWLRVP---APGEVCVSTSNR--------------------   
32 SW:NADD_ECO57   31.2%    ---------RNAPS-----------------LIVCRRPGYP--   
33 SW:NADD_ECOLI   31.2%    ---------RNAPS-----------------LIVCRRPGYP--   
34 SW:NADD_SALTY   31.2%    ---------RNAPS-----------------LIVCRRPGYP--   
35 SW:VIF_HV2BE    30.6%    -RNWIVVPTWRVPGRM---PQRKGTARKQWRRDHWRG------   
36 SW:RS9_STAAM    41.7%    ------------------QPFDVTETKGNYDVlnVHGGGFTG-   
37 SW:THG2_ARATH   34.5%    -------------ARTCASQSQRFKGKCVSDTNccHNEGFPG-   
38 SW:FD6E_BRAJU   37.0%    GTPWVKAMWREAKECIYVEPDRQGEKK----------------   
39 SW:PEM1_PHACH   38.5%    ----LAAAVRAAPTAVC-------DTQVFLEVL-LKGVGFPGS   
40 SW:GSEP_BACLI   50.0%    -------PAQAAPSpvSSDPSYKAETSVTYD------------   
41 SW:HB2S_HUMAN   32.4%    ---------RRVQPKVNVSPSKKGPLQ-HHNLLVCHVTDfpGS   
42 SW:MYPH_HUMAN   36.8%    -----KVPTAEPPGEVAVSESTREEAKAVIDILVIEKPGPPSS   
   consensus/100%            ...........................................   
   consensus/90%             ...........................................   
   consensus/80%             ..........th.t.h...........................   
   consensus/70%             ..........phPttht..................h.......    


Secondary and tertiary structure modelling

Experimental structures are not available for either UDE domain or HOXF_ALCEU segment. In addition, there is no similarity between these two sequences and any of the sequences with known structures making the use of comparative modelling impossible. Thus, any structural information about these domains can only be determined using one of the de novo modelling methods. Theoretical modelling method of choice in this study was the HMMSTR/I-sites/Rosetta suite. The reliability of the secondary structure predictions from HMMSTR was determined by comparison to the secondary structures predicted by PSIPRED method (Table 3) [Bairoch et al., 1997; Jones, 1999]. Sequences were submitted to each of the three HMMSTR models, one for prediction of backbone angles (HMMSTR-R), one for the prediction of secondary structure (HMMSTR-D) and the third for the prediction of superstructure (HMMSTR-C). In Table 3 we show the result of the secondary structure models. For both the UDE domain of Pax-5e/d and for HOXF_ALCEU segment HMMSTR-D and PSIPRED methods are in an excellent agreement. HMMSTR models are known to exhibit an imbalance in the secondary structure prediction with usually over-predicting turns and under-predicting helices and strands, however the length distribution of secondary structure are well reproduced [Bystroff et al., 2000]. In complete agreement with that observation HMMSTR secondary structure predictions result in more turns than the PSIPRED predictions.


Table 3: Secondary structure prediction for UDE from PSIPRED.
UDE Sequence (UDE of Pax-5d/e) GKRWLRIPTRNAPSRVCVEPSQKGETKVQYDMLSCRGPGFPGS
PSIPRED Secondary Structure
C = coil; E = strand
CCCEEECCCCCCCCCEECCCCCCCCCEEEEEEEECCCCCCCCC
HMMSTR-D Secondary Structure
L = turn 0<phi<180; -90<psi<+90;
E = extended -200<phi<20; +40<psi<260
LLLEEELLLLLLLLLEEELLLLLLLLLEELLLLLLLLLLLLLL
HOXF_ALCEU segment GARDARAVQISGPSGECVSVAKDGERKLAYEDLSCNG
PSIPRED Secondary Structure
C = coil; E = strand
CCCCCEEEEEECCCCCEEEECCCCCCCEEECCCCCCC
HMMSTR-D Secondary Structure
L = turn 0<phi<180; -90<psi<+90;
E = extended -200<phi<20; +40<psi<260
LLLLLLEEEEELLLLLEEEEELLLLLLEEELLLLLLL


The five best 3D backbone structures for the UDE domain obtained by Rosetta are shown in cartoon backbone presentation in Figure 2. The overall topology is the same for all five best structures with only minor differences in the dihedral angles between models. Ramachandran plots for these models (not shown) demonstrate that most backbone dihedral angles are in energetically favoured regions and none of them are outside the allowed region. Secondary structure of model 5 is in the best agreement with PSIPRED secondary structure prediction and thus model 5 was used in the following. Side-chains were added to the predicted structures using DeepView software and the structure of side-chains of the domain UDE (with the backbone co-ordinates fixed) was determined by energy minimization using MMFF94 force field under the MOE program.



Figure 2: The results of Rosetta de novo prediction of 3D structure of the UDE domain. Shown are five best models. Model 5 (yellow) has the best agreement with the secondary structure obtained using PSIPRED.


HMMSTR/Rosetta method was also used for modelling the segment of HOXF_ALCEU. Ramachandran plots for these models, same as in the case of UDE domain, showed that most dihedral angles were within the core region and none were outside of the allowed region. Once again the model having secondary structure most comparable to PSIPRED secondary structure was chosen, side-chains were added using DeepView software and their orientation was determined by finding the structure with the lowest energy using MMFF94 force field under MOE program.

Similarities between models of UDE domain and segment of HOXF_ALCEU with respect to the distances and surroundings of cysteine sulphurs were overwhelming. After the energy minimisation of side-chain orientations the distance between the cysteine sulphurs in the HMMSTR model of UDE was 20.6 Å and in HOXF_ALCEU segment S-S distance was 18.2 Å (Figure 3A). The overall topologies of predicted structures of these two segments are also in perfect agreement (Figure 3A). Finally, comparison of properties of amino acids in the vicinity of the two cysteines shows that these two segments are properties-wise identical (Figure 3B). Thus, one can hypothesise that these two segments have the same role in one of the functions of Pax-5 and HOXF_ALCEU even though the major functions of these two proteins are very different.



Figure 3:
A. Comparison of Rosetta models of UDE and HOXF_ALCEU segments in chain colour cartoon backbone presentation with cysteine sulphurs shown as balls. Line represent distances between cysteine sulphurs in UDE (model 5) and HOXF_ALCEU segments.

B. Residue colour presentation of UDE sequence of Pax-5d/e and HOXF_ALCEU segment, where colour code is: yellow – thiol (C), red – acidic (DE), blue – basic (RKH), pink – neutral hydrophilic (STNQ), olive – aliphatic (GAVILM), green – aromatic (FYW), orange – imino (P).


Thioredoxin Properties

One possible equivalent role of UDE sequence of Pax-5d/e and the segment of NAD-reducing hydrogenase HOXF_ALCEU is in the interaction with thioredoxins. The role of thioredoxins in redox regulation of transcription factors is well documented [TRANSFAC database; Nishiyama et al., 2001]. NMR structures of complexes between thioredoxin and DNA-binding domains of transcription factors NF-B and Ref-1 are also available and it is known that thioredoxin creates a disulphide bridge through cysteine 32 (active site) with DNA-binding domain of these transcription factors. Best to our knowledge interaction of thioredoxin with other domains in transcription factors was never studied.

The primary sequence of mouse thioredoxin was obtained from the EntrezProtein database (#NP_035790). With a M.W. of 11.7 kDa, this protein fits well with the experimentally estimated weight of the protein interacting with isoforms Pax-5e and Pax-5d.


Table 4: Result of ClustalW comparison of Mus Musculus and Human thioredoxin
trx_mouse MVKLIESKEAFQEALAAAGDKLVVVDFSATWCGPCKMIKPFFHSLCDKYSNVVFLEVDVD 60

trx_human MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEKYSNVIFLEVDVD 60

          *** **** ****** *****************************.:*****:*******

trx_mouse DCQDVAADCEVKCMPTFQFYKKGQKVGEFSGANKEKLEASITEYA 105

trx_human DCQDVASECEVKCMPTFQFFKKGQKVGEFSGANKEKLEATINELV 105

          ******::***********:*******************:*.* .



X-ray structures are available for human thioredoxin in oxidised and reduced form of Cys 32 and Cys 35 active site. Sequence alignment for mouse and human thioredoxin using ClustalW showed that they are 89.2% identical (Table 4). Thus the 3D structure of human thioredoxin is comparable to mouse thioredoxin and can be used for structural investigations of interactions. The human thioredoxin X-ray structures for the oxidised and the reduced form of the protein were obtained from the Brookhaven protein databank (PDB entries 1ERU and 1ERT) and the structures are shown in Figure 4. These structures show that creation of Cys-Cys disulphide bridge in the active site in the oxidised form causes only minor changes in the overall structure of thioredoxin and therefore binding of thioredoxin to another protein through other cysteines in the sequence should not be affected by the oxidative state of thioredoxin. In addition from the structure of thioredoxin it can be seen that all cysteines in thioredoxin are surface exposed and since free cysteines are almost always buried in a native protein it would be very surprising to find surface exposed cysteines not involved in disulphide bridge formation.



Figure 4: Crystal structures of thioredoxin in oxidised (1ERU) and reduced (1ERT) form. Cysteines are in element colour, ball and stick presentation while rest of the proteins are in chain colour stick representation. Hydrogen atoms are not included.


Surface Features of UDE and Thioredoxin

Out of five cysteines in thioredoxin Cys 32 and Cys 35 form the active site for the redox reaction of thioredoxin. Therefore these two cysteines were not considered for interaction with the UDE domain. Of the remaining three cysteine residues the sulphur atoms in two (residues 62 and 73) are separated by 19.09 Å, providing an optimal site for the formation of two disulphide bridges with UDE cysteines (Figure 4). In addition WU-BLAST2 alignment of mouse thioredoxin showed that all mammalian thioredoxin sequences have cysteines in positions 62 and 73. The preservation of these amino acids points to the possibility for a significant role of these two cysteines in a function of thioredoxin in mammalian organisms. In addition, while studying thioredoxin and UDE structures in more detail it was observed that several non-polar rings on the surfaces of these two proteins could overlap once disulphide bonds are formed and in this way provide additional non-covalent bonding.

Graphical visualisation of the surface electrostatics and hydrophobic properties of macromolecular structures is having a significant impact on the field of structural biology. Surface properties have been used extensively in the study of macromolecular interactions [Honig and Nicholls, 1995]. Here we compared general surface properties: electrostatic potential, hydrophobicity and partial charge distribution in order to estimate the likelihood of interaction of UDE and thioredoxin on positions of Cys 62 and Cys 73 (thioredoxin) and Cys 17 and Cys 35 (UDE); pictorial representation of results are shown in Figure 5. Electrostatic calculations were performed using the Poisson-Boltzmann equations as applied in the MOE package. As expected from the amino acid composition, domain UDE has a positively charged surface while thioredoxin is negatively charged and therefore there will be an electrostatic attraction between the two proteins. At the expected binding site, estimated from the positions of cysteines, the surfaces are electrostatically complementary therefore suggesting a strong attraction. Although there is no obvious complementarity in the distribution of partial charges UDE is indeed mostly either neutral or positive while thioredoxin has more negative partial charges exposed. Surfaces are also compatible with regard to hydrophobicity. Both thioredoxin and UDE are mostly hydrophilic but in between two bindings sites on both proteins there are comparable hydrophobic sections that would be shielded from the water environment by forming a UDE-thioredoxin complex.



Figure 5: Comparison of electrostatic, partial surface charges and hydrophobicity properties of UDE and thioredoxin. Cysteines proposed to be involved in bonding are indicated. Electrostatic units shown are relative and dependent on force field chosen. All calculations shown were performed using the MMFF94 force field making the comparison of results shown here possible.


Model of Thioredoxin-UDE Complex

The UDE – thioredoxin complex was finally formed by interactive superposition on the proposed binding sites and subsequent formation of two disulphide bridges between Cys 62 (TRX) and Cys 17 (UDE); and Cys 73(TRX) and Cys 35 (UDE). Energy minimisation of the formed complex was performed using the MMFF94 force field under the MOE program. The resulting structure of the complex is shown in Figure 6 in chain colour space fill and cartoon backbone presentations.



Figure 6: Model of thioredoxin-UDE complex with two disulphide bridges. Active site sulphurs of thioredoxin are shown in yellow.


Following the energy minimisation of the complex the disulphide bond lengths were 2.05 Å and 2.29 Å (correct S-S bond length is between 2 and 3 Å). Potential energy of the complex after minimisation is lower than the potential energies of UDE and thioredoxin before bonding:

Epcomplex = 4306 kcal/mol; EpUDE = 1908 kcal/mol; Eptrx = 4914 kcal/mol

The thioredoxin active site (Cys 32 and Cys 35) remained exposed in the complex and can, therefore, take part in redox processes (labelled yellow in Figure 6). Alignment of structures of thioredoxin and UDE prior to bonding and after bonding and the energy minimisation show that only minimal changes in structure of both proteins were necessary for energy minimisation of the complex with two disulphide bridges.

The UDE-Thioredoxin complex shown in Figure 6 represents only one possible theoretical model and further experimental work will be necessary for further validation of the mode of interaction and structure of this complex. Still, the provided model could become valuable in assisting the design of future experiments. The model offers a working hypothesis for the location of the potential binding site and proposes a disulphide bonding patters which can be further tested by site directed mutagenesis.




Conclusions

Various bioinfomatics and molecular modelling methods were used in order to explore in greater detail the possibility for binding between thioredoxin and the UDE domain of Pax-5d and Pax-5e. Based on the results of this study, it can be concluded that UDE provides a very good binding site for thioredoxin and it is hypothesised that the function of UDE in Pax-5e and Pax-5d is to bind thioredoxin. Molecular modelling resulted in a possible structure of the UDE-thioredoxin complex where it was observed that thioredoxin's active site for oxido-reduction process is still completely exposed and able to take part in the oxidoreduction. Thioredoxin bonded to Pax-5d can maintain a reduced state for the PAI domain of Pax-5d isoform. Thereby, it is hypothesised that binding of Pax-5d to thioredoxin can explain the apparent resistance of this isoform to oxidation that was previously shown experimentally. Pax-5e does not have the PAI domain and therefore thioredoxin in this complex must have a different role. As it was observed that Pax-5e increases the activity of Pax-5a, it is hypothesised here that thioredoxin bonded to UDE in Pax-5e maintains the reduced state for PAI domain of Pax-5a isoform. In this model Pax-5e would act as a vessel used to bring thioredoxin close to the Pax-5 binding site. In addition, it was previously proposed that as thioredoxin does not have a NLS site, Pax-5d and Pax-5e could also function as a shuttle that transports thioredoxin into the nucleus.

This work shows one of many possible applications of readily available bioinformatics modelling tools in the study of protein-protein interactions.




Acknowledgements

Atlantic Canada Innovation Fund and the Dr. G. L. Dumont Hospital Foundation provided financial support for this project. GAR was supported by a CIHR postdoctoral fellowship.



References