Facilitating context­sensitive queries: A tool to rephrase SWISS­PROT

Steffen Möller 1, Michael Wise 2, David Kreil 3, Michael Schröder 4, David Gilbert 5 and Rolf Apweiler 6




1,3,6European Bioinformatics Institute
Hinxton, Cambridge, UK
Phone: +44 1223 494444
Fax: +44 1223 494468
1E-mail: moeller@ebi.ac.uk
3E-mail: kreil@ebi.ac.uk
6E-mail: apweiler@ebi.ac.uk
2 Centre for Communications Systems Research, University of Cambridge,
Cambridge, UK
E-mail: M.Wise@ccsr.cam.ac.uk
4,5Department of Computing
City University, London, UK
4E-mail: msch@cs.city.ac.uk
5E-mail: drg@cs.city.ac.uk

D. Gilbert and M. Wise are visiting scientists at the EBI.






Database search tools like SRS 5 (http://srs.ebi.ac.uk), which are widely used by biologists to search a multitude of biomedical databases, are very suitable for context­free queries ­ those that match regular expressions on predefined attributes. However, there are limitations to context­independent queries. They cannot compare an attribute's value with another attribute's value.

One possibility to overcome such limitations is the automated rewriting of information in SWISS­PROT entries to enable context­sensitive queries. We developed a tool to automatically reformulate any SWISS­PROT entry as a set of predicates. An example of which is shown below.

 
id(p29358,'143b—bovin').
ac(p29358,[p29358]). 
de(p29358,'14­3­3 protein beta/alpha (kcip­1)'). 
os(p29358,['bos taurus (bovine)','ovis aries (sheep)']). 
oc(p29358,[eukaryota,metazoa,chordata,craniata,vertebrata,mammalia,eutheria, 
cetartiodactyla,ruminantia,pecora,bovoidea,bovidae,bovinae,bos]). 
gn(p29358,ywhab). 
.. 
cc(p29358,'subcellular location',cytoplasmic). 
.. 
ft(p29358,mod—res,185,185,phosphorylation,''). 
.. 
kw(p29358,[brain,neurone,phosphorylation,acetylation,'multigene family','alternative initiation']). 
sq(p29358,tmdkselvqkaklaeqaeryddmaaamkavteqghelsneernllsvayknvvgarrsswrvissieqkternekkqqmgkeyrek 
ieaelqdicndvlqlldkylipnatqpeskvfylkmkgdyfrylsevasgdnkqttvsnsqqayqeafeiskkemqpthpirlglalnfsvfyyei 
lnspekacslaktafdeaiaeldtlneesykdstlimqllrdnltlwtsenqgdegdagegen).
'//'(p29358).

Figure 1: Rewrite of SWISS­PROT entry P29358


These can be read by any implementation of PROLOG, the language in which the user can formulate the queries. Figure 2 shows how we formally differentiate N­ and O­glycosylation. This distinction is not explicitly made in SWISS­PROT FT (Feature Table) lines. Such implicit information needs to be formalized for computer­aided analysis and data­mining applications. 'ft' and 'sq' are PROLOG facts as they are derived from SWISS­PROT, representing a single FT line and the sequence lines of an entry, respectively.



 
glycosylation(AccessionNumber,Position,NorO):­ 
% a Feature table entry stating a glycosylation
ft(AccessionNumber,carbohyd,Position,—,—), 
% retrieves the sequence 
sq(AccessionNumber,Sequence), 
% get the residue that is glycosylated and three more 
residues(Position,4,Sequence,Seq), 
% find out what type it is 
( % check if it matches the pattern for N­Glycosylation 
matches(Seq,['N', not 'P',['S','T'], not 'P']),!, NorO is n 
; % otherwise assume it's O­glycosylated 
NorO is o ). 

Figure 2: Declaration of a Prolog Predicate to Determine Glycosylation Sites

A query 'glycosylation(P50635,Pos,Type)' returns Pos=112
Type=n ;

Pos=390
Type=n

Please contact moeller@ebi.ac.uk for further information and program sources.