A hypergraph-based method for unification of existing protein structure- and sequence-families
Jan Freudenberg1, Ralf Zimmer2, Daniel Hanisch3 and Thomas Lengauer4
1,2,3,4 Previous Address: GMD-Forschungsinstitut Informationstechnik, Schloss Birlinghoven St. Augustin; Present Addresses: 1 Institut für Humangenetik, Universitätsklinikum Bonn; 2 Institut für Informatik, LMU München; 3 FhG-SCAI, Schloss Birlinghoven, St. Augustin; 4 MPI für Informatik, Saarbrücken
Classification of proteins is a major challenge in bioinformatics. Here an approach is presented, that unifies different existing classifications of protein structures and sequences. Protein structural domains are represented as nodes in a hypergraph. Shared memberships in sequence families result in hyperedges in the graph. The presented method partitions the hypergraph into clusters of structural domains. Each computed cluster is based on a set of shared sequence family memberships. Thus, the clusters put existing protein sequence families into the context of structural family hierarchies. Conversely, structural domains are related to their sequence family memberships, which can be used to gain further knowledge about the respective structural families.
Key words: sequence analysis, structure analysis, domain boundary delineation, protein databases, protein homology, protein structure prediction, threading, template selection, optimization, protein clustering