| In Silico Biology 2, 0032 (2002); ©2002, Bioinformation Systems e.V. |
| G C B ' 0 1 |
Technische Fakultät, AG Angewandte Informatik
Universität Bielefeld,
Postfach 100131,
33501 Bielefeld
Email: kerstin@techfak.uni-bielefeld.de, fzoellne@techfak.uni-bielefeld.de, sneumann@techfak.uni-bielefeld.de,
franz@techfak.uni-bielefeld.de, sagerer@techfak.uni-bielefeld.de
Edited by E. Wingender; received December 14, 2001; revised and accepted February 11, 2002; published April 22, 2002
Protein data in the PDB covers only a snapshot of a protein
structure. For flexible docking conformational changes need to be
considered. Rotamer statistics provide the likelihood for side chain
conformations, and further comparison of bound and unbound state
yields differences in preferred positions. Furthermore, we do a full sampling of selected
angles and apply
the AMBER force field. Conformation of energy minima comply with the
rotamer statistics. Both types of information target the reduction of search space for
enumerative docking algorithms and provide parameters for elastic
docking.
Key words: Rotamer library, flexible protein-protein docking, energy calculations, AMBER force field, side chain flexibility, flexibility measure
Proteins are essential for many functions in organisms, e. g. cell stability, immune response or catalysis of metabolic reactions. Many of these functions involve binding of other molecules and substrates, small regulatory molecules or proteins.
There are two main types of flexibility within a protein [5]. The local side chain flexibility and domain movement, involving whole parts of a protein. In our work we focus on local flexibility of side chains, since movement of side chains plays an important role during induced fit of proteins. Using PDB structures that represent the same protein we compare the bound and unbound form, and describe changes upon complex formation.
Current rotamer libraries implemented by Ponder and
Richard [17], Tuffery et al. [20], Lovell et
al. [12] are being used for side chain placement in e. g. folding tasks. They provide probabilities for side chain conformations
without concerning the backbone conformation. Ponder and Richards
choose 19 high quality PDB structures for compiling their library,
with R factors less than 0.18. Because of their high quality data,
they were able to reduce their means and standard deviation compared
to other rotamer libraries, but no angles higher than
2 are
considered. Tuffery et al. use cluster analysis to describe their
rotamers, but they use a minimization procedure before compiling their
library instead of using the raw data. The newer approach by Lovell et
al. uses 240 PDB structures with a resolution better than 1.7Å
and the mode of the rotamers instead of their mean.
In a backbone dependent rotamer library by Dunbrack et al.
[3], the influence of backbone conformation on a possible
placement of the side chain is also taken into account. The
probability for the rotamers of the first
angle are estimated
over the whole area of the Ramachandran plot. Bayesian statistics allow to
estimate the distribution in regions with less or no data as well.
To overcome some crystallographic problems, Dunbrack and Coworkers
choose other rotamer definitions for those side chains which are
branched or have planar ring systems. These side chains show a
crystallographic uncertainty, because it can not be observed which
part of the branch carries the functional group or in which direction
the ring is directed. Because the direction of a ring and the position
of the branch is important for the nomenclature of the rotamers, they
include
values 180° apart from the original
angle in the rotamer.
All rotamer libraries are compiled on data of unbound proteins whereas our library contains complexes as well. For protein-ligand interaction a different distribution of side chain conformations has been shown by Najmanovich et al. [13].
Janin and Wodak [8] compared force field calculations and
rotamer statistics in a survey similar to our approach. At that time
only a small number of proteins were resolved crystallographically.
Due to computational limitations only the first two
angles were
considered, and the calculations were done on isolated residues not
concerning the whole protein. As a sampling rate for energy
calculations, 20° was chosen.
We use force field calculations to study side chain flexibility as
well. Therefore we use structures of bound and unbound proteins and
rotate one
angle by 360° in steps of 5°, while
keeping the other angles fixed. In the energy
histogram of the synthetic conformations, some energy minima can be
detected which do not turn up in statistical investigations. Therefore
untypical protein conformations (from the view of a rotamer library)
can be investigated as well. These information can help tuning our
measure of flexibility.
The article is structured as follows: First we characterize the dataset and restictions used for the selection of a subset of PDB entries. Section "Rotamer statistics and energy calculation" describes the methods used for the rotamer statistics and energy calculation. The results and a comparison between conformations in energy minima and crystal structures comformations follows in section "Results and discussion" . We conclude the article with a summary and an outlook towards applications of the methods we presented.
In this section we describe the data we used to carry out our calculations on conformation and flexibility of residue side chains.
All data is taken from the PDB [2]. The crystallographic resolution of the structures is 2Å or better. Each structure has been tested for valid bond length and missing atoms.
Some entries can not be processed by our force field due to various reasons, usually missing parameter sets for the hetero groups in the entry. Overall 33.300 residues were considered for energy calculation, not counting alanine, proline and glycine.
| Entries | Residues | |
| Complexes | 98 | 26500 |
| Unbound | 202 | 42700 |
| Energy calculation | 220 | 33300 |
Our testset has been classified into complexes using PDB-at-a-glance [19] and CATH [15]. Proteins with only one chain sequence identical to a complex chain are classified as unbound.
In this section the methods used to process the data on side chain conformations are presented.
First we give a short definition of our rotamers and the corresponding angle ranges. Then our two approaches to flexibility, energy calculations and rotamer statistics, are presented. At last we describe how we compared the different approaches.
Definition of rotamers
There are up to six torsion angles per residue. The two backbone
torsion angles
and
describe the rotation of the backbone
along the primary sequence. Side chain torsion angles
2 up to
4 are located along the side chain of a residue.
Angles were calculated for all 69,000 bound and unbound residues in our dataset in section "Data". Using the IUPAC Convention [7] these angles are placed in three discrete bins, as shown in Table 2.
Table 2: Angle ranges for the different rotamers
| Rotamer | IUPAC Convention | angle range |
| 1 | g | 0° to 120° |
| 2 | t | 120° to 240° and -120° to -180° |
| 3 | g + | 240° to 360° and -120° to 0° |
Rotamer and synthetic protein conformation
As mentioned above the protein structure information from the PDB lacks hydrogen atoms which are needed for correct energy calculations. Therefore we have added hydrogens to the protein structures. To obtain valid positions for the hydrogens an energy minimization on the protein model is performed using the AMBER force field [21; 4] and the conjugated gradient minimizer from the BALL library [10].
Afterwards, synthetic protein conformations are built and scored. The conformations are created by rotating each side chain torsion angle by 360°. We use a sampling rate of 5°. For each of these conformations the total energy of the structure is calculated using the AMBER force field. (Footnote a)
The total energy of the unmodified conformation is included for inspecting energy differences. The data is stored in a relational database for further processing (see [22]).
To investigate the flexibility we calculate the distribution of the
angles for each residue type. For this distribution only
angles are taken into account which correspond to the global energy
minimum of the residue. The histogram h is defined as follows, with
bins i in 5° steps:
![]() | (1) |
where
![]() |
and
0 < i,j < 360° range of angles
E(residue,i) energy for given residue and torsion angle
N number of residues in testset
We use the statistics software R [6] which implements histogram and density approximation algorithms.
Comparing rotamer library and synthetic conformations
Both approaches provide a measure of flexibility for amino acid side chains. To compare them we calculated the correlation factor between the distributions for each type of amino acid:
![]() | (2) |
where
are vectors to be correlated, here containing the counts of the histograms.
We also applied a correlation technique taken from the field of image processing called histogram intersection [18]. We use the classical approach calculating the minimum distance where I and M are two histograms and n is the number of bins:
![]() | (3) |
The methods explained in the last section were applied to the PDB entries in our testset. The resulting raw data is summarized in the following part.
Rotamer statistics
We used different granularities to look at side chain conformation.
The backbone independent approach uses only up to four
angles,
whereas the backbone dependent also considers secondary structure and
includes the backbone angles
,
. Afterwards we describe
changes upon complex formations. Therefore all residues that occur in
the complex struture and in the unbound protein are taken. Later we
will seperate this dataset into residues that are exposed and buried
for further investigations.
The conformation of a side chain angle is investigated in dependency of the previous angle.
The distribution of the first
angle is trimodal with peaks in
the range of -180°, -60° and +60°. Because the
1
is close to the backbone, its conformation is mainly
influenced by
and
angles. The third rotamer (g +
conformation) is preferred, especially for polar or bulky amino acids
[9].
The conformation of the
2 is less influenced by the backbone,
although it mainly determines the position of the functional group.
More flexibility is allowed for this angle. The distribution is
trimodal for apolar residues and bimodal in the case of PHE, TYR, HIS.
To avoid steric clashes, the ring system is positioned parallel to the
backbone, the rotamer denotes the orientation of the ring. In the case
of branched ARG and ASN residues, the planar end of the amino acid has
to be positioned, which leads to the bimodal distribution as well.
Table 3: Rotamer dependencies
| residue | 1 [%] |
2 [%] |
2 over all [%]
|
||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 1 | 2 | 3 | ||
| ARG | r1: 10.31 | 17.07 | 78.86 | 4.07 | 1.8 | 8.1 | 0.4 |
| r2: 23.55 | 4.27 | 90.39 | 5.34 | 1.0 | 21.3 | 1.3 | |
| r3: 66.14 | 6.46 | 78.83 | 14.7 | 4.27 | 52.1 | 9.7 | |
| ASN | r1: 15.32 | 45.86 | 14.29 | 39.85 | 7.0 | 2.2 | 6.1 |
| r2: 33.03 | 40.23 | 10.12 | 49.65 | 13.3 | 3.3 | 16.4 | |
| r3: 51.65 | 17.10 | 20.97 | 61.93 | 8.8 | 10.8 | 32.0 | |
| ASP | r1: 16.13 | 54.52 | 4.01 | 41.47 | 8.8 | 0.6 | 6.7 |
| r2: 33.55 | 61.90 | 11.58 | 26.53 | 20.8 | 3.9 | 8.9 | |
| r3: 50.32 | 10.29 | 9.54 | 80.17 | 5.23 | 4.8 | 40.3 | |
| CYS | r1: 8.69 | NA | |||||
| r2: 17.44 | |||||||
| r3: 73.87 | |||||||
| GLN | r1: 6.98 | 0.79 | 91.34 | 7.87 | 0.1 | 6.4 | 0.5 |
| r2: 31.50 | 42.41 | 56.02 | 1.5 | 13.4 | 17.6 | 0.5 | |
| r3: 61.52 | 6.97 | 57.73 | 35.30 | 4.3 | 35.5 | 21.7 | |
| GLU | r1: 14.94 | 1.22 | 43.90 | 54.88 | 0.2 | 6.6 | 8.2 |
| r2: 29.23 | 23.68 | 71.96 | 4.36 | 6.9 | 21 | 1.3 | |
| r3: 55.83 | 20.72 | 41.11 | 38.17 | 11.6 | 23 | 21.3 | |
| HIS | r1: 19.16 | 3.25 | 0.81 | 95.93 | 0.62 | 0.2 | 18.4 |
| r2: 51.87 | 61.26 | 2.7 | 36.04 | 31.78 | 1.4 | 18.7 | |
| r3: 28.97 | 27.42 | 2.15 | 70.43 | 7.94 | 0.6 | 20.4 | |
| ILE | r1: 21.17 | 6.75 | 92.15 | 1.09 | 1.4 | 19.5 | 0.2 |
| r2: 13.25 | 24.49 | 73.47 | 2.04 | 3.2 | 9.7 | 0.3 | |
| r3: 65.57 | 4.42 | 74.78 | 20.80 | 2.9 | 49.0 | 13.6 | |
| LEU | r1: 1.61 | 47.73 | 45.45 | 6.82 | 0.8 | 0.7 | 0.1 |
| r2: 29.50 | 92.19 | 5.95 | 1.86 | 27.2 | 1.8 | 0.5 | |
| r3: 68.90 | 10.98 | 86.53 | 2.49 | 7.6 | 59.6 | 1.7 | |
| LYS | r1: 7.39 | 10.14 | 77.54 | 12.32 | 0.7 | 5.7 | 0.9 |
| r2: 29.62 | 12.84 | 83.18 | 3.98 | 3.8 | 24.6 | 3.98 | |
| r3: 62.99 | 5.10 | 67.01 | 27.89 | 3.2 | 42.2 | 17.6 | |
| MET | r1: 12.05 | 20.00 | 74.00 | 6.00 | 2.4 | 8.9 | 0.7 |
| r2: 29.64 | 10.57 | 86.99 | 2.44 | 3.1 | 25.8 | 0.7 | |
| r3: 58.31 | 3.31 | 76.45 | 20.25 | 1.9 | 44.6 | 11.8 | |
| PHE | r1: 31.17 | 60.00 | 0.00 | 40.00 | 18.7 | 0 | 12.4 |
| r2: 10.57 | 76.92 | 1.71 | 21.37 | 8.1 | 0.2 | 2.3 | |
| r3: 58.27 | 51.16 | 6.05 | 42.79 | 29.8 | 3.5 | 25 | |
| SER | r1: 41.38 | NA | |||||
| r2: 21.89 | |||||||
| r3: 36.73 | |||||||
| THR | r1: 47.26 | NA | |||||
| r2: 6.72 | |||||||
| r3: 46.02 | |||||||
| TRP | r1: 20.83 | 6.2 | 0 | 93.8 | 1.29 | 0 | 19.54 |
| r2: 23.85 | 56.63 | 1.2 | 42.17 | 13.5 | 0.29 | 10.06 | |
| r3: 55.32 | 73.51 | 0.52 | 25.97 | 40.66 | 0.29 | 14.37 | |
| TYR | r1: 16.17 | 59.32 | 0 | 40.68 | 9.6 | 0 | 6.58 |
| r2: 33.35 | 78.63 | 3.29 | 18.08 | 26.22 | 1.1 | 6.03 | |
| r3: 50.48 | 31.76 | 1.54 | 66.7 | 16.03 | 0.78 | 33.66 | |
| VAL | r1: 7.40 | NA | |||||
| r2: 73.34 | |||||||
| r3: 19.26 | |||||||
In Table 3, the
1,
2 dependencies can
be seen. The first column shows the distribution of the
1
angle, the second column the percentage for a certain
2 rotamer
given a
1 rotamer. The next columns contain the percentage for
all
1,
2 combinations relative to all available data for
a residue type.
The distribution of the
3 is bimodal in the case of GLN, GLU
and trimodal for ARG and LYS residues. The highest peak in the
distribution can be seen near 180° for ARG and LYS residues,
near -70° for GLU and near 60° for GLN residues. The
distribution of the
3 according to the
2 shows a
preference for the r2=2, r3=2 rotamers for ARG and LYS (40% and 52%
respectively).
For the branched side chains of GLU and GLN the combination of r2=2 with r3=1 and r3=3 are equivalent (percentage about 20% each) because of the symmetry of the side chain.
The
4 angle of ARG and LYS residues shows an unimodal
distribution with the highest peak near 80° for ARG and near
170° for LYS. The prefered
3,
4 combination for
both residues is the r3=2, r4=2 combination with 23% for ARG residues
and 53% for LYS residues.
The rotamer distributions for ARG, ASN and GLU residues vary with the degree of solvent accessible surface area [16]. Because we have sparse data for exposed residues, we have not divided the data in exposed and burried samples for these residues. In further research, the data for this rotamers has to be splitted according to their accessibility and new rotamer calculations have to be done.
The ratio of amino acids changing their rotamer upon complex formation were checked. Residues which show a high percentage of rotamer changes are considered to be flexible. The flexibility scale using this measure is: ARG, SER, GLN, LYS > VAL, GLU, THR, ASN, ASP > ILE, MET, LEU, HIS, PHE, TYR, CYS, TRP.
In the first group 20% to 30% of the side chains change their
1-rotamer, in the second group 12% to 14% move during
docking, in the last group 5% or less of the residues move. The
percentage of side chains to change their rotamer can be used as a
measure for flexibility, as explained in the summary section.
Direction of movements were investigated as well. The expected
percentage for rotamer changes if no direction is prefered would be
about 16.6% for each change (3
2 possible directions of
movement). Probabilities are higher in many cases, that is directions
are clearly preferred. For example 29% of the ARG side chains which
change their rotamer move from the second to the first rotamer of the
1.
Table 4 shows a few examples of rotamer changes and their direction, shown in the second column. The third column contains the percentage of a particular movement among the moving side chains and the last column shows how many percent of the investigated residues change their rotamer at all concerning the whole data.
Table 4: Direction of movements for some residues.
| residue | direction | % in this | % overall changes
|
|---|---|---|---|
| ARG | 2
1 |
29% | 30% |
1
3 |
27% | ||
2 3 |
18% | ||
3 2 |
17% | ||
3 1 |
8% | ||
| SER | 1 3 |
40% | 25% |
3 1 |
23% | ||
2 1 |
10% | ||
1 2 |
10% | ||
2 3 |
9% | ||
3 2 |
8% | ||
| GLN | 2 3 |
45% | 20% |
3 2 |
23% | ||
3 1 |
13% | ||
2 1 |
9% | ||
1 3 |
9% | ||
1 2 |
2% | ||
| ASN | 3 2 |
31% | 12% |
2 3 |
27% | ||
1 2 |
17% | ||
3 1 |
13% | ||
1 3 |
11% | ||
2 1 |
1% | ||
| ASP | 2 3 |
51% | 12% |
3 2 |
15% | ||
3 1 |
11% | ||
1 2 |
11% | ||
2 1 |
7% | ||
1 2 |
5% |
In the backbone dependent approach the influence of the backbone
angles on the side chain angle is included. The
<
,
,
1> data is clustered into 128 classes with the
LBG cluster algorithm [11]. The probabilities of the
different classes are calculated according to the number of vectors of
the class. The probability of the classes can be used as an a priori
probability for the placement of side chains conforming to given
and
angles.
![]() (1a) Distribution of the 1 according to the and angles. |
Figure 1: 3D plots for ( , , ) data. |
![]() (1b) Center of the 128 LBG Clusters |
Figure 1a shows the distribution of the
1 according to the backbone angles
and
. In
Figure 1b the center of the 128 cluster are shown.
A view from top onto the (
,
) plane would yield the
classical Ramachandran plot.
It can be seen that in the region of
sheets (in the upper left
part of the
,
plane) all three
angle rotamers are
allowed, whereas in other regions of the Ramachandran plot the
possibilities for special
rotamers are restricted.
Energy calculation on synthetic protein conformations
For all generated synthetic protein structures we calculated the potential energy using the AMBER force field [21]. A low potential energy suggests that a conformation is preferred among the others generated by rotating side chain torsion angles.
At first only one torsion angle is changed at a time. In a later stage we will investigate the relations between the different torsion angles. To examine the distribution of each torsion angle and for each residue type the energetic minimum for each residue is calculated and the corresponding angle is included in a histogram, as shown in Figure 2
The distribution of the
torsion angles shows significant
features. For the
1 angle all distributions are trimodal,
with peaks at the g, t and g+ rotamers.
The three rotamers are not equally distributed. The g position is least preferred. The t position is preferred by the residues ASP, ASN, SER and GLN. The g+ position is preferred by the rest of the residue types. This is due to different constraints, mostly steric hindrance, charge and polarity or length of the side chain.
The
2 torsion angle is more different. Plotting the
distribution shows a division of the residues into two groups. First
group of residues (ASN, ASP, HIS, TYR and PHE) shows two peaks at
g and g+ position. The other amino acids (ARG, GLU, GLN, LYS,
LEU, ILE, MET and TRP) have three rotamers. Some residues from the
first group have a ring system within its side chain (HIS, TYR and
PHE). This ring system is attached to the C
carbon atom
which is directly influenced by changes of the
2 torsion
angle. Changing the
2 torsion angle will turn the whole ring
system. The distribution shows that the ring systems could only exist
in two positions without any steric hindrance.
Although ASP and ASN do not contain a ring system in the side chain
they show the same significant distribution like HIS, TYR and PHE.
Because the side chain branches at C
, these residues have
the same steric restrictions. In addition their side chain is charged (ASP)
or polar (ASN) which effects the placement of the side chain as well.
The distribution of the second group of residues shows three peaks.
This group consists of amino acids with very different side
chains.
The functional group of ARG, LYS, GLU, MET and GLN sitting at the end
of the side chain does not effect the
2 angle directly. The
long stretched side chain has to be placed without steric hindrance.
This is the same as for the
1 torsion
angle.
The side chain of ILE and LEU is branched but neither charged nor
polar. The branched part of LEU is like the side chain of VAL. The
only difference is that this side chain has one more carbon atom.
Fitting the LEU side chain at
2 position is the same
procedure as fitting the VAL side chain just one carbon atom shifted.
The side chain of ILE is branched asymmetrically. This is reflected by the plot of the energy minina for ILE (see Figure 4a).
TRP differs from the rest of this group. The side chain of TRP consists of a double ring system and the distribution is somewhere in between. On the one hand it has characteristica of the first group mentioned above, but on the other hand there is also a small third peak.
The remaining
torsion angles have been analysed, too. For the
3 torsion angle the residues can also be devided into two
groups. The first group contains GLN and MET, the second group are
ARG and LYS. The reasons for this partitioning are the same as described
for the
2 torsion angle.
GLU is an exception, since in the plot of energy minima (Fig. 5) no peaks can be determined. Because of the great distance to the backbone we assume no steric hindrance for this angle so that the rest of this side chain can be rotated freely.
The
4 torsion angle of ARG and LYS residues shows an equal
distribtion, although they differ in their functional group. Their
distribution of energy minima shows three peaks with a preference
for the t position.
Examination of unbound and corresponding complex protein structures are compared to measure the flexibility upon complex formation. Like the statistical approach, rotamer changes have been detected and preferred directions of changes can be pointed out.
In Figure 6, the first torsion angle for ARG residues of the unbound case are plotted against the corresponding angle of the complex form. Each point in the plot represent one residue of the testset. Residues on or near the diagonal do not change their rotamer. Points above or below the diagonal do change during docking (two examples are pointed out by arrows). Table 5 shows the preferred directions of rotamer changes for ARG.
Table 5: Rotamer changes of aminoacid sidechain torsion angles in ARG.
| Rotamer change from unbound to complex | % for 1 |
% for 2 |
% for 3 |
% for 4 |
g
t |
19 | 24 | 62 | 28 |
g
g + |
8 | 10 | 32 | 26 |
t
g |
30 | 20 | 10 | 10 |
t
g + |
31 | 31 | 37 | 54 |
g +
g |
3 | 2 | 5 | 8 |
g +
t |
5 | 6 | 26 | 15 |
On the one hand you can see that each torsion angle has preferred
directions of change (e. g. from t
g+ for
1).
On the other hand these change from one to another torsion angle. We
think that this relates to the greater distance to the backbone. The
higher torsion angles underlie fewer steric restrictions. Therefore
they can rotate more freely. The number of side chains that
change their rotamer varies with greater distance from the backbone
(see Table 6).
Table 6: Percentage of rotamer changes for each torsion angle in ARG.
| torsion angle | % side chain changing |
1 |
37 |
2 |
35 |
3 |
65 |
4 |
53 |
Similar results can be observed for the remaining residues. Each
residue type can be ranked by the percentage of changes, here shown
for the
1 torsion angle:
SER > ARG > GLN > VAL > ASN > ASP > THR > LYS > LEU > ILE > PHE > GLU > HIS > MET/TRP > TYR > CYS
Comparison of the statistical and energetic best conformation
In this section we compare the results from rotamer statistics and energy calculation. The comparsion is done by calculating the correlation factor between the histograms of torsion angles generated by the two approaches. We used a standard correlation method (see Figure 10 in the Appendix). In addition we applied the histogram intersection (see Figure 11).
For ARG residues, the two approaches show a good correlation for the
1 angle (Figure 7). The calculated correlation coefficient is 0.95. The
histograms for higher torsion angles (
2-4) do not correlate
that well, but this can be attributed to a greater degree of flexibility
(see Figures 8 and 9).
For comparing the histograms of the two approaches, we implement a histogram intersection. Intersecting histograms result in a model histogram [18]. By normalizing the intersection we get a value for the correlation of the two histograms (see Figure 11 at appendix "Results of ..."). The agreement of the two approaches can be seen as well.
As described above both methods result in nearly equal rotamer distributions. Differences in the distributions may occur because some crystal structures of side chains can differ from the optimal conformation without effecting the energetic state of the whole protein that much. By generating the synthetic conformations for a side chain we get the global best conformation neglecting the rest of the protein because all other residues are held fixed.
In Table 7 we have summarized the flexibility scales of both approaches:
Table 7: Flexibility scales for both approaches orderd by percentage of overall changes
| statistical approach | energy calculations | ||
| residue | % changes | residues | % changes |
| ARG | 30 | SER | 40,0 |
| SER | 25 | ARG | 37,0 |
| GLN | 20 | GLN | 28,4 |
| LYS | 20 | VAL | 27,7 |
| VAL | 14 | ASN | 26,6 |
| GLU | 13 | ASP | 23,3 |
| THR | 13 | THR | 19,3 |
| ASN | 12 | LYS | 9,6 |
| ASP | 12 | LEU | 9,1 |
| ILE | 5 | ILE | 8,8 |
| MET | 5 | PHE | 8,2 |
| LEU | 2 | GLU | 8,0 |
| HIS | 2 | HIS | 7,0 |
| PHE | 1 | MET | 4,4 |
| TYR | 1 | TRP | 4,4 |
| CYS | 0.5 | TYR | 4,3 |
| TRP | 0.1 | CYS | 3,9 |
Comparing the rotamer changes found by both approaches differences in the ranking of the residues can be seen, others are ranked equal (GLN, THR, ILE, HIS and CYS). Some residues differ only at one position (ARG,SER, TYR,TRP,VAL). The ranking of all other residues differ more than one position.
We think differences in the flexibility scales of our approaches are due
to the fact that the number of samples processed for the
unbound/complex comparison under the energetic approach is much
smaller than the amount of data used in the statistically approach
(see also section "Data"). With ongoing calculations we hope
that this differences disappear. Summerising these results one can
observe that the sidechains of ARG, SER and GLN change their rotamer
most frequently. The sidechains of TYR, TRP and CYS are nearly
inflexible and rarly change their rotamer. In between these groups
there are residues whose sidechain flexibility varies (according to
the both approaches).
The results show that both rotamer statistics and AMBER energy calculation agree on allowed and preferred side chain conformations. We have shown that a statistical approach to side chain conformations comparing bound and unbound form of a protein yields obvious differences.
Fast methods for volume based protein-protein docking exist. However they employ the rigid-body assumption, and often miss the correct solution. Flexibility can be considered by enumerating possible conformations starting with the unbound protein using information about preferred movements. A different approach uses a weighting mechanism which assigns weights to the residues and thus decreases the penalizing influence of some residue which is likely to have a different conformation in the complexed form.
Finally the amino acids can be ranked by their flexibility, and depending on the available computing power the less flexible ones can be excluded from flexibility considerations and processed under the rigid-body assumption.
Future work will be focused on using the available information to reduce the search space for enumerative docking algorithms. We will extract parameters for elastic docking [14] using pairwise distances between bound and unbound data.
Accessing flexibility automatically is even more important since in a
1:n docking scenario manual inspection of flexibility is not feasible.
This work was supported by the German Research Foundation (DFG) within the graduate programs "Structure Formation" and "Bioinformatics" and the research focus "Analysis and interpretation of large genomic data sets".
In this section the results of the correlation between the rotamer statistics and the synthetic protein conformations are shown. Figure 10 shows the correlation coefficients obtained using standard correlation methods implemented in R [6].
Overall, all coefficients indicate that the data correlates
well except for
3 of GLU where the correlation is only
0.46. Figure 11 shows our second correlation method,
histogram intersection. Looking at these correlation results most
amino acid have a correlation coefficient around 0.75. Here the
correlation of
3 of GLU performs much better and reaches a
value of 0.68.
Footnote a:
Using non bonded cutoff of 200Å,
van der Waals cutoff of 150Å, van der Waals cuton of 130Å,
electrostatic cutoff of 150Å, electrostatic cuton of 130Å,
electrosstatic scaling factor for 1-4 interaction of 2.0, Vdw
scaling factor for 1-4 interaction of 2.0 and a distance dependent
dielectric constant of 1.0 (standard values that come with the BALL
library [1]), all calculations are done without solvent
molecules.