ISB Home

- Article -

Volume 2

Special Issue

Full article

In Silico Biology 2, 0029 (2002); ©2002, Bioinformation Systems e.V.  

ProML - the Protein Markup Language for specification of protein sequences, structures and families

Daniel Hanisch1, Ralf Zimmer2 and Thomas Lengauer3

1Fraunhofer Institute for Algorithms and Scientific Computing (SCAI),Schloss Birlinghoven, D-53754 Sankt Augustin, Germany
2Institut für Informatik, LMU München, Theresienstraße 39, D-80333 München, Germany
3Max-Planck Institut fuer Informatik, Stuhlsatzenhausweg 85, D-66123 Saarbruecken, Germany

Edited by E. Wingender; received November 30, 2001; revised and accepted March 18, 2002; published April 08, 2002


We propose a specification language ProML for protein sequences, structures, and families based on the open XML standard. The language allows for portable, system-independent, machine-parsable and human-readable representation of essential features of proteins. The language is of immediate use for several bioinformatics applications: we discuss clustering of proteins into families and the representation of the specific shared features of the respective clusters. Moreover, we use ProML for specification of data used in fold recognition bench-marks exploiting experimentally derived distance constraints.

Key words: Protein Markup Language, ProML, XML, protein properties, protein families, protein structures, distance constraints, protein clusters