CoPreTHi: A Web tool which combines transmembrane protein segment prediction methods

V. J. Promponas, G. A. Palaios, C. M. Pasquier, J. S. Hamodrakas and S. J. Hamodrakas

Faculty of Biology,
Department of Cell Biology and Biophysics,
University of Athens,
Athens 157 01

Edited by E. Wingender; received December 04, 1998; accepted December 11, 1998


CoPreTHi is a Java based web application, which combines the results of methods that predict the location of transmembane segments in protein sequences into a joint prediction histogram. Clearly, the joint prediction algorithm, produces superior quality results than individual prediction schemes. The program is available at

Key words: Joint Prediction, Transmembrane Proteins, Web-Tool, Java Interface


Membrane proteins are involved in a variety of important biological functions. Their biological activity depends primarily on their spatial conformation. The three-dimensional structure of a protein is usually determined by single-crystal X-ray crystallography. However, suitable single crystals are not easily produced for membrane proteins, therefore, experimental data at atomic or near-atomic resolution for membrane proteins are very few.

Since experimental findings indicate that all the necessary information for a protein to fold into its native structure is coded into its amino acid sequence [Anfinsen, 1973], several attempts have been made to predict the three-dimensional structure of a protein from sequence alone [Rost and Sander, 1994; Rost and Sander, 1996], but with limited success only. Also, recently, genomic sequences from different organisms have plenty of orphan ORF's, making more demanding the existence of accurate methods for the prediction of protein structure and function. In these cases, even discriminating between globular and membrane proteins might be rewarding.

In the field of membrane proteins it is often very important to predict the location of transmembrane segments along the sequence, since these are the basic structural building blocks defining their topology.

Several successful prediction algorithms have been developed for membrane proteins, which sometimes not only predict transmembrane segments, but also topology and their secondary structure. For globular proteins, when predicting secondary structure, it has been claimed that combined prediction schemes provide a higher degree of accuracy than individual prediction methods [Schulz et al., 1974; Argos et al., 1976; Hamodrakas, 1988].


In this report, we present CoPreTHi, a Web-based application that uses the results of some popular prediction methods freely accessed over the WWW:

DAS [Cserzo et al., 1997],
ISREC-SAPS [Brendel et al., 1992],
PHD [Rost et al. 1995],
SOSUI [Hirokawa et al., 1998],
TmPred [Hofmann and Stoffel, 1993],
TopPredII [von Heijne, 1992],

and a method developed by our group:
PRED-TMR [Pasquier et al., 1998]

combining them into a joint prediction histogram, to predict the location of transmembrane segments in protein sequences. There is also the possibility for a user to submit the results of any other method in a specified format and to include them into the joint prediction histogram as well.

An amino acid residue predicted to be a part of a transmembrane domain by three or more methods is considered to be a residue inside a transmembrane region; thus, a combined prediction is obtained. Optionally, observed results can also be entered (e.g. the FT records of a SWISS-PROT entry: [Bairoch and Apweiler, 1997 ]. A reliability index Q [Chou and Fasman, 1978 ] and a correlation coefficient C [Matthews, 1975] are calculated in this case, to evaluate the accuracy of each prediction method separately, as well as of the joint prediction. All the results can be presented either in plain text (without any HTML tables, for Web Browsers without this capability) or in HTML mode, including a graphical representation.


CoPreTHi consists of three subprograms: The first, a Java program, creates the input form for the data (results of individual prediction methods). The second, a C program, performs all calculations and sends output to the third, which is also a Java program. This is responsible for the display of the results, draws a graphical representation and creates a table, containing details about individual prediction schemes, the joint prediction, and, optionally, their performance against the observed data (if available). Since input/output is performed with Java programs, the interface is user friendly.

In the main page of our server (at the URL: ) there are links to the methods mentioned above. Therefore, a user can run the individual methods separately and manually copy and paste each method's results into the input form of our tool as well as any observed results. The only extra information required to produce the joint prediction histogram for all the methods, is the number of amino acids of the sequence, which should be entered manually in the 'seqlen' text area. The name of the sequence may optionally be entered in an appropriate text area, and it is displayed in the results page.


The predictions of the individual methods mentioned above, were tested, on a representative set of 155 sequences of transmembrane proteins with reliable topology, deposited in SWISSPROT (as described by [Pasquier et al., 1998] . For all individual methods and the joint prediction, a reliability index (Q) and a correlation coefficient (C) were calculated for each protein. Significant differences were found between the different prediction methods for some of these proteins, even for characteristic examples, such as Bacteriorhodopsin. Although this is true, the mean values of Q's and C's calculated by individual methods on the entire test set of 155 proteins, do not differ significantly, varying from approximately 86.4% to 89.3% (Q) and 0.71to 0.78 (C), whereas the joint prediction gives 91.6% and 0.79 respectively. Clearly, the joint prediction algorithm, produces superior quality results on this set of 155 transmembrane proteins than individual prediction schemes.


CoPreTHi is freely available for use through the Internet at the URL: It can be executed over the World Wide Web on any Java compatible Web Browser. A list of the results obtained for the set of the 155 protein sequences used for our tests is also available at the URL: Detailed help and useful comments are located at the URL: