In Silico Biology 2, 0032 (2002); ©2002, Bioinformation Systems e.V.  
G C B ' 0 1

Comparing bound and unbound protein structures using energy calculation and rotamer statistics

Kerstin Koch, Frank Zöllner, Steffen Neumann, Franz Kummert and Gerhard Sagerer




Technische Fakultät, AG Angewandte Informatik
Universität Bielefeld,
Postfach 100131,
33501 Bielefeld
Email: kerstin@techfak.uni-bielefeld.de,  fzoellne@techfak.uni-bielefeld.de,  sneumann@techfak.uni-bielefeld.de,  franz@techfak.uni-bielefeld.de,  sagerer@techfak.uni-bielefeld.de





Edited by E. Wingender; received December 14, 2001; revised and accepted February 11, 2002; published April 22, 2002


Abstract

Protein data in the PDB covers only a snapshot of a protein structure. For flexible docking conformational changes need to be considered. Rotamer statistics provide the likelihood for side chain conformations, and further comparison of bound and unbound state yields differences in preferred positions. Furthermore, we do a full sampling of selected angles and apply the AMBER force field. Conformation of energy minima comply with the rotamer statistics. Both types of information target the reduction of search space for enumerative docking algorithms and provide parameters for elastic docking.

Key words: Rotamer library, flexible protein-protein docking, energy calculations, AMBER force field, side chain flexibility, flexibility measure



Introduction

Proteins are essential for many functions in organisms, e. g. cell stability, immune response or catalysis of metabolic reactions. Many of these functions involve binding of other molecules and substrates, small regulatory molecules or proteins.

There are two main types of flexibility within a protein [5]. The local side chain flexibility and domain movement, involving whole parts of a protein. In our work we focus on local flexibility of side chains, since movement of side chains plays an important role during induced fit of proteins. Using PDB structures that represent the same protein we compare the bound and unbound form, and describe changes upon complex formation.

Current rotamer libraries implemented by Ponder and Richard [17], Tuffery et al. [20], Lovell et al. [12] are being used for side chain placement in e. g. folding tasks. They provide probabilities for side chain conformations without concerning the backbone conformation. Ponder and Richards choose 19 high quality PDB structures for compiling their library, with R factors less than 0.18. Because of their high quality data, they were able to reduce their means and standard deviation compared to other rotamer libraries, but no angles higher than 2 are considered. Tuffery et al. use cluster analysis to describe their rotamers, but they use a minimization procedure before compiling their library instead of using the raw data. The newer approach by Lovell et al. uses 240 PDB structures with a resolution better than 1.7Å and the mode of the rotamers instead of their mean.

In a backbone dependent rotamer library by Dunbrack et al. [3], the influence of backbone conformation on a possible placement of the side chain is also taken into account. The probability for the rotamers of the first angle are estimated over the whole area of the Ramachandran plot. Bayesian statistics allow to estimate the distribution in regions with less or no data as well.

To overcome some crystallographic problems, Dunbrack and Coworkers choose other rotamer definitions for those side chains which are branched or have planar ring systems. These side chains show a crystallographic uncertainty, because it can not be observed which part of the branch carries the functional group or in which direction the ring is directed. Because the direction of a ring and the position of the branch is important for the nomenclature of the rotamers, they include values 180° apart from the original angle in the rotamer.

All rotamer libraries are compiled on data of unbound proteins whereas our library contains complexes as well. For protein-ligand interaction a different distribution of side chain conformations has been shown by Najmanovich et al. [13].

Janin and Wodak [8] compared force field calculations and rotamer statistics in a survey similar to our approach. At that time only a small number of proteins were resolved crystallographically. Due to computational limitations only the first two angles were considered, and the calculations were done on isolated residues not concerning the whole protein. As a sampling rate for energy calculations, 20° was chosen.

We use force field calculations to study side chain flexibility as well. Therefore we use structures of bound and unbound proteins and rotate one angle by 360° in steps of 5°, while keeping the other angles fixed. In the energy histogram of the synthetic conformations, some energy minima can be detected which do not turn up in statistical investigations. Therefore untypical protein conformations (from the view of a rotamer library) can be investigated as well. These information can help tuning our measure of flexibility.

The article is structured as follows: First we characterize the dataset and restictions used for the selection of a subset of PDB entries. Section "Rotamer statistics and energy calculation" describes the methods used for the rotamer statistics and energy calculation. The results and a comparison between conformations in energy minima and crystal structures comformations follows in section "Results and discussion" . We conclude the article with a summary and an outlook towards applications of the methods we presented.



Data

In this section we describe the data we used to carry out our calculations on conformation and flexibility of residue side chains.

All data is taken from the PDB [2]. The crystallographic resolution of the structures is 2Å or better. Each structure has been tested for valid bond length and missing atoms.

Some entries can not be processed by our force field due to various reasons, usually missing parameter sets for the hetero groups in the entry. Overall 33.300 residues were considered for energy calculation, not counting alanine, proline and glycine.

Table 1: Summarized data set

  Entries Residues
Complexes 98 26500
Unbound 202 42700
Energy calculation 220 33300

Our testset has been classified into complexes using PDB-at-a-glance [19] and CATH [15]. Proteins with only one chain sequence identical to a complex chain are classified as unbound.



Rotamer statistics and energy calculation

In this section the methods used to process the data on side chain conformations are presented.

First we give a short definition of our rotamers and the corresponding angle ranges. Then our two approaches to flexibility, energy calculations and rotamer statistics, are presented. At last we describe how we compared the different approaches.


Definition of rotamers

There are up to six torsion angles per residue. The two backbone torsion anglesphi and psi describe the rotation of the backbone along the primary sequence. Side chain torsion angles 2 up to 4 are located along the side chain of a residue.

Angles were calculated for all 69,000 bound and unbound residues in our dataset in section "Data". Using the IUPAC Convention [7] these angles are placed in three discrete bins, as shown in Table 2.


Table 2: Angle ranges for the different rotamers

Rotamer IUPAC Convention angle range
1 g 0° to 120°
2 t 120° to 240° and
-120° to -180°
3 g + 240° to 360° and
-120° to 0°

Rotamer and synthetic protein conformation

As mentioned above the protein structure information from the PDB lacks hydrogen atoms which are needed for correct energy calculations. Therefore we have added hydrogens to the protein structures. To obtain valid positions for the hydrogens an energy minimization on the protein model is performed using the AMBER force field [21; 4] and the conjugated gradient minimizer from the BALL library [10].

Afterwards, synthetic protein conformations are built and scored. The conformations are created by rotating each side chain torsion angle by 360°. We use a sampling rate of  5°. For each of these conformations the total energy of the structure is calculated using the AMBER force field. (Footnote a)

The total energy of the unmodified conformation is included for inspecting energy differences. The data is stored in a relational database for further processing (see [22]).

To investigate the flexibility we calculate the distribution of the angles for each residue type. For this distribution only angles are taken into account which correspond to the global energy minimum of the residue. The histogram h is defined as follows, with bins i in 5° steps:

(1)

where

and

0 < i,j < 360°   range of angles

E(residue,i)   energy for given residue and torsion angle

N   number of residues in testset

We use the statistics software R [6] which implements histogram and density approximation algorithms.




Comparing rotamer library and synthetic conformations

Both approaches provide a measure of flexibility for amino acid side chains. To compare them we calculated the correlation factor between the distributions for each type of amino acid:

(2)

where

    are vectors to be correlated, here containing the counts of the histograms.

We also applied a correlation technique taken from the field of image processing called histogram intersection [18]. We use the classical approach calculating the minimum distance where I and M are two histograms and n is the number of bins:

(3)



Results and discussion

The methods explained in the last section were applied to the PDB entries in our testset. The resulting raw data is summarized in the following part.


Rotamer statistics

We used different granularities to look at side chain conformation. The backbone independent approach uses only up to four angles, whereas the backbone dependent also considers secondary structure and includes the backbone angles . Afterwards we describe changes upon complex formations. Therefore all residues that occur in the complex struture and in the unbound protein are taken. Later we will seperate this dataset into residues that are exposed and buried for further investigations.


Backbone independent

The conformation of a side chain angle is investigated in dependency of the previous angle.

The distribution of the first angle is trimodal with peaks in the range of -180°, -60° and +60°. Because the 1 is close to the backbone, its conformation is mainly influenced by and angles. The third rotamer (g + conformation) is preferred, especially for polar or bulky amino acids [9].

The conformation of the 2 is less influenced by the backbone, although it mainly determines the position of the functional group. More flexibility is allowed for this angle. The distribution is trimodal for apolar residues and bimodal in the case of PHE, TYR, HIS. To avoid steric clashes, the ring system is positioned parallel to the backbone, the rotamer denotes the orientation of the ring. In the case of branched ARG and ASN residues, the planar end of the amino acid has to be positioned, which leads to the bimodal distribution as well.

Table 3: Rotamer dependencies

residue 1 [%] 2 [%] 2 over all [%]
    1 2 3 1 2 3
ARG r1: 10.31 17.07 78.86 4.07 1.8 8.1 0.4
  r2: 23.55 4.27 90.39 5.34 1.0 21.3 1.3
  r3: 66.14 6.46 78.83 14.7 4.27 52.1 9.7
ASN r1: 15.32 45.86 14.29 39.85 7.0 2.2 6.1
  r2: 33.03 40.23 10.12 49.65 13.3 3.3 16.4
  r3: 51.65 17.10 20.97 61.93 8.8 10.8 32.0
ASP r1: 16.13 54.52 4.01 41.47 8.8 0.6 6.7
  r2: 33.55 61.90 11.58 26.53 20.8 3.9 8.9
  r3: 50.32 10.29 9.54 80.17 5.23 4.8 40.3
CYS r1: 8.69 NA
  r2: 17.44
  r3: 73.87
GLN r1: 6.98 0.79 91.34 7.87 0.1 6.4 0.5
  r2: 31.50 42.41 56.02 1.5 13.4 17.6 0.5
  r3: 61.52 6.97 57.73 35.30 4.3 35.5 21.7
GLU r1: 14.94 1.22 43.90 54.88 0.2 6.6 8.2
  r2: 29.23 23.68 71.96 4.36 6.9 21 1.3
  r3: 55.83 20.72 41.11 38.17 11.6 23 21.3
HIS r1: 19.16 3.25 0.81 95.93 0.62 0.2 18.4
  r2: 51.87 61.26 2.7 36.04 31.78 1.4 18.7
  r3: 28.97 27.42 2.15 70.43 7.94 0.6 20.4
ILE r1: 21.17 6.75 92.15 1.09 1.4 19.5 0.2
  r2: 13.25 24.49 73.47 2.04 3.2 9.7 0.3
  r3: 65.57 4.42 74.78 20.80 2.9 49.0 13.6
LEU r1: 1.61 47.73 45.45 6.82 0.8 0.7 0.1
  r2: 29.50 92.19 5.95 1.86 27.2 1.8 0.5
  r3: 68.90 10.98 86.53 2.49 7.6 59.6 1.7
LYS r1: 7.39 10.14 77.54 12.32 0.7 5.7 0.9
  r2: 29.62 12.84 83.18 3.98 3.8 24.6 3.98
  r3: 62.99 5.10 67.01 27.89 3.2 42.2 17.6
MET r1: 12.05 20.00 74.00 6.00 2.4 8.9 0.7
  r2: 29.64 10.57 86.99 2.44 3.1 25.8 0.7
  r3: 58.31 3.31 76.45 20.25 1.9 44.6 11.8
PHE r1: 31.17 60.00 0.00 40.00 18.7 0 12.4
  r2: 10.57 76.92 1.71 21.37 8.1 0.2 2.3
  r3: 58.27 51.16 6.05 42.79 29.8 3.5 25
SER r1: 41.38 NA
  r2: 21.89
  r3: 36.73
THR r1: 47.26 NA
  r2: 6.72
  r3: 46.02
TRP r1: 20.83 6.2 0 93.8 1.29 0 19.54
  r2: 23.85 56.63 1.2 42.17 13.5 0.29 10.06
  r3: 55.32 73.51 0.52 25.97 40.66 0.29 14.37
TYR r1: 16.17 59.32 0 40.68 9.6 0 6.58
  r2: 33.35 78.63 3.29 18.08 26.22 1.1 6.03
  r3: 50.48 31.76 1.54 66.7 16.03 0.78 33.66
VAL r1: 7.40 NA
  r2: 73.34
  r3: 19.26

In Table 3, the 1, 2 dependencies can be seen. The first column shows the distribution of the 1 angle, the second column the percentage for a certain 2 rotamer given a 1 rotamer. The next columns contain the percentage for all 1, 2 combinations relative to all available data for a residue type.

The distribution of the 3 is bimodal in the case of GLN, GLU and trimodal for ARG and LYS residues. The highest peak in the distribution can be seen near 180° for ARG and LYS residues, near -70° for GLU and near 60° for GLN residues. The distribution of the 3 according to the 2 shows a preference for the r2=2, r3=2 rotamers for ARG and LYS (40% and 52% respectively).

For the branched side chains of GLU and GLN the combination of r2=2 with r3=1 and r3=3 are equivalent (percentage about 20% each) because of the symmetry of the side chain.

The 4 angle of ARG and LYS residues shows an unimodal distribution with the highest peak near 80° for ARG and near 170° for LYS. The prefered 3, 4 combination for both residues is the r3=2, r4=2 combination with 23% for ARG residues and 53% for LYS residues.

The rotamer distributions for ARG, ASN and GLU residues vary with the degree of solvent accessible surface area [16]. Because we have sparse data for exposed residues, we have not divided the data in exposed and burried samples for these residues. In further research, the data for this rotamers has to be splitted according to their accessibility and new rotamer calculations have to be done.

The ratio of amino acids changing their rotamer upon complex formation were checked. Residues which show a high percentage of rotamer changes are considered to be flexible. The flexibility scale using this measure is: ARG, SER, GLN, LYS > VAL, GLU, THR, ASN, ASP > ILE, MET, LEU, HIS, PHE, TYR, CYS, TRP.

In the first group 20% to 30% of the side chains change their 1-rotamer, in the second group 12% to 14% move during docking, in the last group 5% or less of the residues move. The percentage of side chains to change their rotamer can be used as a measure for flexibility, as explained in the summary section.

Direction of movements were investigated as well. The expected percentage for rotamer changes if no direction is prefered would be about 16.6% for each change (3 2 possible directions of movement). Probabilities are higher in many cases, that is directions are clearly preferred. For example 29% of the ARG side chains which change their rotamer move from the second to the first rotamer of the 1.

Table 4 shows a few examples of rotamer changes and their direction, shown in the second column. The third column contains the percentage of a particular movement among the moving side chains and the last column shows how many percent of the investigated residues change their rotamer at all concerning the whole data.

Table 4: Direction of movements for some residues.

residue direction % in this % overall changes
ARG 2 1 29% 30%
  1 3 27%  
  2 3 18%  
  3 2 17%  
  3 1 8%  
SER 1 3 40% 25%
  3 1 23%  
  2 1 10%  
  1 2 10%  
  2 3 9%  
  3 2 8%  
GLN 2 3 45% 20%
  3 2 23%  
  3 1 13%  
  2 1 9%  
  1 3 9%  
  1 2 2%  
ASN 3 2 31% 12%
  2 3 27%  
  1 2 17%  
  3 1 13%  
  1 3 11%  
  2 1 1%  
ASP 2 3 51% 12%
  3 2 15%  
  3 1 11%  
  1 2 11%  
  2 1 7%  
  1 2 5%  


Backbone dependent

In the backbone dependent approach the influence of the backbone angles on the side chain angle is included. The <, , 1> data is clustered into 128 classes with the LBG cluster algorithm [11]. The probabilities of the different classes are calculated according to the number of vectors of the class. The probability of the classes can be used as an a priori probability for the placement of side chains conforming to given and angles.


(1a) Distribution of the 1 according to the and angles.
Figure 1: 3D plots for (, ,) data.

(1b) Center of the 128 LBG Clusters

Figure 1a shows the distribution of the 1 according to the backbone angles and . In Figure 1b the center of the 128 cluster are shown. A view from top onto the (, ) plane would yield the classical Ramachandran plot.

It can be seen that in the region of sheets (in the upper left part of the , plane) all three angle rotamers are allowed, whereas in other regions of the Ramachandran plot the possibilities for special rotamers are restricted.


Energy calculation on synthetic protein conformations

For all generated synthetic protein structures we calculated the potential energy using the AMBER force field [21]. A low potential energy suggests that a conformation is preferred among the others generated by rotating side chain torsion angles.

At first only one torsion angle is changed at a time. In a later stage we will investigate the relations between the different torsion angles. To examine the distribution of each torsion angle and for each residue type the energetic minimum for each residue is calculated and the corresponding angle is included in a histogram, as shown in Figure 2

The distribution of the torsion angles shows significant features. For the 1 angle all distributions are trimodal, with peaks at the g, t and g+ rotamers.


Figure 2: Distribution of 1 torsion angle for ARG.

The three rotamers are not equally distributed. The g position is least preferred. The t position is preferred by the residues ASP, ASN, SER and GLN. The g+ position is preferred by the rest of the residue types. This is due to different constraints, mostly steric hindrance, charge and polarity or length of the side chain.

The 2 torsion angle is more different. Plotting the distribution shows a division of the residues into two groups. First group of residues (ASN, ASP, HIS, TYR and PHE) shows two peaks at g and g+ position. The other amino acids (ARG, GLU, GLN, LYS, LEU, ILE, MET and TRP) have three rotamers. Some residues from the first group have a ring system within its side chain (HIS, TYR and PHE). This ring system is attached to the C carbon atom which is directly influenced by changes of the 2 torsion angle. Changing the 2 torsion angle will turn the whole ring system. The distribution shows that the ring systems could only exist in two positions without any steric hindrance.


Figure 3: Distribution of the 2 torsion angle of PHE.

Although ASP and ASN do not contain a ring system in the side chain they show the same significant distribution like HIS, TYR and PHE. Because the side chain branches at C, these residues have the same steric restrictions. In addition their side chain is charged (ASP) or polar (ASN) which effects the placement of the side chain as well.

The distribution of the second group of residues shows three peaks. This group consists of amino acids with very different side chains.
The functional group of ARG, LYS, GLU, MET and GLN sitting at the end of the side chain does not effect the 2 angle directly. The long stretched side chain has to be placed without steric hindrance. This is the same as for the 1 torsion angle.
The side chain of ILE and LEU is branched but neither charged nor polar. The branched part of LEU is like the side chain of VAL. The only difference is that this side chain has one more carbon atom. Fitting the LEU side chain at 2 position is the same procedure as fitting the VAL side chain just one carbon atom shifted.

The side chain of ILE is branched asymmetrically. This is reflected by the plot of the energy minina for ILE (see Figure 4a).


Figure 4: Distribution of the 2 torsion angle of ILE and LEU.

TRP differs from the rest of this group. The side chain of TRP consists of a double ring system and the distribution is somewhere in between. On the one hand it has characteristica of the first group mentioned above, but on the other hand there is also a small third peak.

The remaining torsion angles have been analysed, too. For the 3 torsion angle the residues can also be devided into two groups. The first group contains GLN and MET, the second group are ARG and LYS. The reasons for this partitioning are the same as described for the 2 torsion angle.

GLU is an exception, since in the plot of energy minima (Fig. 5) no peaks can be determined. Because of the great distance to the backbone we assume no steric hindrance for this angle so that the rest of this side chain can be rotated freely.


Figure 5: Distribution of the 3 torsion angle of GLU.

The 4 torsion angle of ARG and LYS residues shows an equal distribtion, although they differ in their functional group. Their distribution of energy minima shows three peaks with a preference for the t position.

Examination of unbound and corresponding complex protein structures are compared to measure the flexibility upon complex formation. Like the statistical approach, rotamer changes have been detected and preferred directions of changes can be pointed out.


Figure 6: Rotamer changes for 1 torsion angle of ARG.

In Figure 6, the first torsion angle for ARG residues of the unbound case are plotted against the corresponding angle of the complex form. Each point in the plot represent one residue of the testset. Residues on or near the diagonal do not change their rotamer. Points above or below the diagonal do change during docking (two examples are pointed out by arrows). Table 5 shows the preferred directions of rotamer changes for ARG.

Table 5: Rotamer changes of aminoacid sidechain torsion angles in ARG.

Rotamer change from unbound to complex % for 1 % for 2 % for 3 % for 4
g t 19 24 62 28
g g + 8 10 32 26
t g 30 20 10 10
t g + 31 31 37 54
g + g 3 2 5 8
g + t 5 6 26 15

On the one hand you can see that each torsion angle has preferred directions of change (e. g. from t g+ for 1). On the other hand these change from one to another torsion angle. We think that this relates to the greater distance to the backbone. The higher torsion angles underlie fewer steric restrictions. Therefore they can rotate more freely. The number of side chains that change their rotamer varies with greater distance from the backbone (see Table 6).

Table 6: Percentage of rotamer changes for each torsion angle in ARG.

torsion angle % side chain changing
1 37
2 35
3 65
4 53

Similar results can be observed for the remaining residues. Each residue type can be ranked by the percentage of changes, here shown for the 1 torsion angle:

SER > ARG > GLN > VAL > ASN > ASP > THR > LYS > LEU > ILE > PHE > GLU > HIS > MET/TRP > TYR > CYS



Comparison of the statistical and energetic best conformation

In this section we compare the results from rotamer statistics and energy calculation. The comparsion is done by calculating the correlation factor between the histograms of torsion angles generated by the two approaches. We used a standard correlation method (see Figure 10 in the Appendix). In addition we applied the histogram intersection (see Figure 11).

Figure 7a: Histogram plot of rotamer distribution (red/gray) and energetic best torsion angle distribution (blue/black) for all ARG in the dataset, correlation coefficient 0.95 (standard method) / 0.849 (histogram intersection).
Figure 7b: Correlation of all residues for 1.

For ARG residues, the two approaches show a good correlation for the 1 angle (Figure 7). The calculated correlation coefficient is 0.95. The histograms for higher torsion angles (2-4) do not correlate that well, but this can be attributed to a greater degree of flexibility (see Figures 8 and 9).


Figure 8: Histogram plot of rotamer distribution (red) and energetic best torsion angle distribution (blue) for all ARG in dataset, correlation coefficient 0.795 (standard method) / 0.758 (histogram intersection).

For comparing the histograms of the two approaches, we implement a histogram intersection. Intersecting histograms result in a model histogram [18]. By normalizing the intersection we get a value for the correlation of the two histograms (see Figure 11 at appendix "Results of ..."). The agreement of the two approaches can be seen as well.


Figure 9: Histogram of rotamer distribution (red) and energetic best torsion angle distribution (blue) for all ARG in dataset, correlation coefficient 0.401 (standard method) / 0.663 (histogram intersection).

As described above both methods result in nearly equal rotamer distributions. Differences in the distributions may occur because some crystal structures of side chains can differ from the optimal conformation without effecting the energetic state of the whole protein that much. By generating the synthetic conformations for a side chain we get the global best conformation neglecting the rest of the protein because all other residues are held fixed.

In Table 7 we have summarized the flexibility scales of both approaches:

Table 7: Flexibility scales for both approaches orderd by percentage of overall changes

statistical approach energy calculations
residue % changes residues % changes
ARG 30 SER 40,0
SER 25 ARG 37,0
GLN 20 GLN 28,4
LYS 20 VAL 27,7
VAL 14 ASN 26,6
GLU 13 ASP 23,3
THR 13 THR 19,3
ASN 12 LYS 9,6
ASP 12 LEU 9,1
ILE 5 ILE 8,8
MET 5 PHE 8,2
LEU 2 GLU 8,0
HIS 2 HIS 7,0
PHE 1 MET 4,4
TYR 1 TRP 4,4
CYS 0.5 TYR 4,3
TRP 0.1 CYS 3,9

Comparing the rotamer changes found by both approaches differences in the ranking of the residues can be seen, others are ranked equal (GLN, THR, ILE, HIS and CYS). Some residues differ only at one position (ARG,SER, TYR,TRP,VAL). The ranking of all other residues differ more than one position.

We think differences in the flexibility scales of our approaches are due to the fact that the number of samples processed for the unbound/complex comparison under the energetic approach is much smaller than the amount of data used in the statistically approach (see also section "Data"). With ongoing calculations we hope that this differences disappear. Summerising these results one can observe that the sidechains of ARG, SER and GLN change their rotamer most frequently. The sidechains of TYR, TRP and CYS are nearly inflexible and rarly change their rotamer. In between these groups there are residues whose sidechain flexibility varies (according to the both approaches).


Summary and outlook

The results show that both rotamer statistics and AMBER energy calculation agree on allowed and preferred side chain conformations. We have shown that a statistical approach to side chain conformations comparing bound and unbound form of a protein yields obvious differences.

Fast methods for volume based protein-protein docking exist. However they employ the rigid-body assumption, and often miss the correct solution. Flexibility can be considered by enumerating possible conformations starting with the unbound protein using information about preferred movements. A different approach uses a weighting mechanism which assigns weights to the residues and thus decreases the penalizing influence of some residue which is likely to have a different conformation in the complexed form.

Finally the amino acids can be ranked by their flexibility, and depending on the available computing power the less flexible ones can be excluded from flexibility considerations and processed under the rigid-body assumption.

Future work will be focused on using the available information to reduce the search space for enumerative docking algorithms. We will extract parameters for elastic docking [14] using pairwise distances between bound and unbound data.

Accessing flexibility automatically is even more important since in a 1:n docking scenario manual inspection of flexibility is not feasible.


Acknowledgements

This work was supported by the German Research Foundation (DFG) within the graduate programs "Structure Formation" and "Bioinformatics" and the research focus "Analysis and interpretation of large genomic data sets".


Appendix - Results of correlation experiments

In this section the results of the correlation between the rotamer statistics and the synthetic protein conformations are shown. Figure 10 shows the correlation coefficients obtained using standard correlation methods implemented in R [6].


Figure 10: Correlation coefficient for different angles.

Overall, all coefficients indicate that the data correlates well except for 3 of GLU where the correlation is only 0.46. Figure 11 shows our second correlation method, histogram intersection. Looking at these correlation results most amino acid have a correlation coefficient around 0.75. Here the correlation of 3 of GLU performs much better and reaches a value of 0.68.


Figure 11: Coefficient of histogram intersection for different angles.


References

  1. Aubertin, T., Boghossian, N. P., Burchardt, A., Hildebrandt, A., Klein, H., Kerzmann, A., Kohlbacher, O., Lehnhof, H.-P., Moll, A., Müller, P. and Strobel, S. (2002). Ball reference manual. http://www.mpi-sb.mpg.de/BALL/doc_1.0b/html/AmberFF.html

  2. Bermann, H. M., Westbrook, J., Zukang, F., Gilliland, G., Bhat, T. N., Weissig, H., Shindylalov, I. N. and Bourne, P. E. (2000). The protein data bank. Nucleic Acids Res. 28, 235-242.

  3. Bower, M., Cohen, F. E. and Dunbrack, R. L. (1997). Prediction of protein side-chain rotamers from a backbone-dependend rotamer library: A new homology modelling tool. J. Mol. Biol. 267, 1268-1282.

  4. Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R., Merz, K. M., Ferguson, D. M., Spellmeyer, D. C., Fox, T., Caldwell, J. W. and Kollman, P. A. (1995). A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J. Am. Chem. Soc. 117, 5179-5197.

  5. Gerstein, M., Lesk, A. and Chothia, C. (1994). Structural mechanisms for domain movements in proteins. Biochemistry 33, 6739-6749.

  6. Ihaka, R. and Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of Computational and Graphical Statistics 5, 299-314.

  7. IUPAC-IUB Commission on Biochemical Nomenclature (CBN) (1967). Abbreviations and symbols for the description of the conformation of polypeptide chains. http://www.chem.qmw.ac.uk/iupac/misc/noGreek/ppep1.html.

  8. Janin, J. and Wodak, S. (1978). Conformation of amino acid side-chains in proteins. J. Mol. Biol. 125, 357-386.

  9. Koch, K., Neumann, S. and Sagerer, G. (2000). Towards a protein-protein docking optimized rotamer library. In: GCB 2000, Poster Abstracts, page 41.

  10. Kohlbacher, O. (2000). New approaches to protein docking. PhD thesis, University Saarbrücken.

  11. Linde, Y., Buzo, A. and Gray, R. M. (1980). An algorithm for vector quantizer design. IEEE Trans. on Communications COM 28, 84-95.

  12. Lovell, S. C., Word, J. M., Richardson, J. S. and Richardson, D. C. (2000). The penultimate rotamer library. Proteins 40, 389-408.

  13. Najmanovich, R., Kuttner, J., Sobolev, V. and Edelmann, M. (2000). Side-chain flexibility in proteins upon ligand binding. Proteins 39, 261-268.

  14. Neumann, S., Posch, S. and Sagerer, G. (1999). Towards evaluation of docking hypotheses using elasic matching. In: Computer Sciene and Biology: Proceedings of the German Conference on Bioinformatics, page 220.

  15. Orengo, C. A., Pearl, F. M., Bray, J. E., Todd, A. E., Martin, A. C., Lo Conte, L. and Thornton, J. M. (1999).
    The cath database provides insights into protein structure/function relationships. Nucleic Acids Res. 27, 275-279.

  16. Pickett, S. D. and Sternberg, M. J. (1993). Empirical scale of side-chain conformational entropy in protein folding. J. Mol. Biol. 231, 825-839.

  17. Ponder, J. W. and Richards, F. M. (1987). Tertiary templates for proteins: Use of packing criteria in the enumeration of allowed sequences for different structural classes. J. Mol. Biol. 193, 775-791.

  18. Swain, M. J. and Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision 7, 11-32.

  19. The CENTER FOR MOLECULAR MODELING (CMM) (1996). Pdb-at-a-glance. http://cmm.info.nih.gov/modeling/pdb_at_a_glance.html.
    Link vom 11.12.2001.

  20. Tuffery, P., Etchebest, C. and Hazout, S. (1997). Prediction of protein side chain conformations: a study on the influence of backbone accuracy on conformation stability in the rotamer space. Protein Eng. 10, 361-372.

  21. Weiner, S. J., Kollmann, P. A., Case, D., ChandraSingh, U., Ghio, C., Alagona, G., Profeta, S. and Weiner, P. (1984). A new force field for molecular mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc. 106, 765-784.

  22. Zöllner, F. (2001). Bewertung der Flexibilität von Aminosäureseitenketten in Proteinkonformationen durch empirische Energiefelder. Master's thesis, Universität Bielefeld.


Footnotes

Footnote a:
Using non bonded cutoff of 200Å, van der Waals cutoff of 150Å, van der Waals cuton of 130Å, electrostatic cutoff of 150Å, electrostatic cuton of 130Å, electrosstatic scaling factor for 1-4 interaction of 2.0, Vdw scaling factor for 1-4 interaction of 2.0 and a distance dependent dielectric constant of 1.0 (standard values that come with the BALL library [1]), all calculations are done without solvent molecules.