In Silico Biology 2, 0021 (2002); ©2002, Bioinformation Systems e.V.  
G C B ' 0 1

The Semantic Metadatabase (SEMEDA):
Ontology based integration of federated molecular biological data sources

Jacob Köhler1 and Steffen Schulze-Kremer2

1University Bielefeld,

2 Resource Center DHGP,
Berlin, Germany

Edited by E. Wingender; received November 30, 2001; revised and accepted February 3, 2002; published March 19, 2002


A system for "intelligent" semantic integration and querying of federated databases is being implemented by using three main components: A component which enables SQL access to integrated databases by database federation (MARGBench), an ontology based semantic metadatabase (SEMEDA) and an ontology based query interface (SEMEDA-query). In this publication we explain and demonstrate the principles, architecture and the use of SEMEDA. Since SEMEDA is implemented as 3 tiered web application database providers can enter all relevant semantic and technical information about their databases by themselves via a web browser. SEMEDA' s collaborative ontology editing feature is not restricted to database integration, and might also be useful for ongoing ontology developments, such as the "Gene Ontology" [Ashburner et al., 2000]. SEMEDA can be found at

We explain how this ontologically structured information can be used for semantic database integration. In addition, requirements to ontologies for molecular biological database integration are discussed and relevant existing ontologies are evaluated. We further discuss how ontologies and structured knowledge sources can be used in SEMEDA and whether they can be merged supplemented or updated to meet the requirements for semantic database integration.

Keywords: Semantic database integration, molecular biology, meta-database, ontology, knowledge representation, controlled vocabulary


In conjunction with the rapid progress of biotechnologies and the human genome project [Aldhous, 1990], an increasing amount of data is being generated. The amount of new data is that big that human genetics journals are increasingly reluctant to publish mutation reports [Krawczak et al., 2000]. However, much data is often published in publicly accessible data sources. The annual list of current database systems from the Journal of Nucleic Acid Research (January issue) lists at the moment about 300 systems with molecular biological data. The various data sources are maintained by many different institutions and companies, and vary widely in their content, formats and access methods. Whereas only a few years ago proprietary solutions often based on flatfiles were used for data storage, nowadays relational DBMS (Database Management Systems) are the de facto standard. Many biological databases were started in the early 80s, i. e. at times when the Internet was not widely used, and DBMSs by themselves required advanced technical skills. Data was made available by proprietary methods, later via static web pages, and even later when flatfiles grew too big, server side scripts like CGI scripts, were used for searching and data retrieval from flatfiles. For data exchange, usually proprietary flatfile formats were used, and several flatfile formats evolved.

At present, most databases are connected to the Internet and can be accessed via WebPages. Many databases provide hypertext links to entries in other databases. In most cases, AC numbers (accession numbers) or other database specific identifiers, are generated when a new entry is added to a database. Those identifiers are often used for interlinking between databases via their WebPages. However, due to the fact that in many cases different databases use different identifiers or terms for equivalent entries, interlinking databases is tedious. Usually, pair-wise mappings between database entries have to be generated in order to be able to provide links between databases. Therefore, databases usually only provide links to the "most relevant" databases by using accession numbers.

However, besides accession numbers, many other database attributes which use common controlled vocabularies such as EC numbers [NC-IUBMB, 1992], CAS Registry numbers [Buntrock, 2001], GO terms [Ashburner et al., 2000; Gene-Ontology-Consortium, 2001], etc. are suitable for linking between databases. Even when databases use the same controlled vocabularies, they are often not used for linking between databases. This is due to the fact that the number of existing molecular biological databases is too high to survey, i. e. a systematic approach for mapping equivalent database attributes is missing. Therefore, compared to the fact that at present more than 400 molecular biological databases exist [Baxevanis, 2002], the degree of interlinking is low [Williams, 1997].

Database integration solutions based on indexed flatfiles such as DBGET/LinkDB [Kanehisa, 1997a; Kanehisa, 1997b; Kanehisa et al., 2002], SRS [Etzold et al., 1996] and SIR [Ramu, 2001] are the de facto standard for the integration of high numbers of heterogeneous databases. However, the aforementioned problems also apply to those systems. For example in SRS, an indexing script for each database has to be provided and each indexing script is responsible for generating the links to other relevant databases. Based on principles and requirements (multi-user support, importing of ontologies, web supported ontology editing and database metadata editing etc.) as described in [Schulze-Kremer, 1997a] and [Köhler et al., 2000], a system for "intelligent" semantic integration and querying of federated databases is being implemented by using three main components: A component which enables SQL access to integrated databases by database federation (MARGBench [Freier et al., 1999]), an ontology based semantic metadatabase (SEMEDA) and an ontology based query interface (SEMEDA-query). Subsequently we explain and demonstrate the principles, architecture and the use of SEMEDA. The principle idea of ontology based semantic database integration is to define database attributes by referencing them to ontological concepts. In a next step, which will not be described in this publication, the structure of the ontology can be used for "intelligent" and user-friendly database querying.


General Considerations

What are Ontologies in SEMEDA. The notion what ontologies are and how they should be implemented varies between people and research groups [Noy et al., 1997]. In SEMEDA, ontologies can be considered as a set of concepts, which are connected by binary relations. Concepts are well-defined entities: they have a unique meaning, properties like a name (label), a description and an identifier. The terms concept and node will be used as synonyms in this publication and so will the terms relation and edge. Relation type (edge type) addresses the characteristics of a relation. The term hierarchy applies to the closure (in its mathematical sense) of a concept over a given relation type.

SEMEDA allows assigning relational algebraic properties such as symmetry, reflexivity and transitivity to relation types. Especially transitivity and symmetry are used to derive the semantic entailments of concepts, and will be especially important for SEMEDAs database query interface.

More formally in SEMEDA an ontology is a pair:

O := (N, E)

N: set of concepts, E: set of edges, E (a, b, t) where a, b N, and t is the type of the relation which defines the semantic (is-a, is-part etc.) and the algebraic properties (transitivity, symmetry etc.) of the relation.

Elaborate data structures and design principles for "formal ontologies" have been developed by different research groups, for example within the GALEN project [Rector et al., 1994] or based on lattice theory [Ganter et al., 1999]. Since many "real world" ontologies do not follow those principles, SEMEDA can import such ontologies and supports several advanced features but does not require that they be used.

Support for Collaborative Ontology Editing. Several users have to be able to collaboratively edit the ontology and database meta-information. Problems related to transaction management are solved by using locks and permissions.

3 groups of user with different permissions exist at present time:

Admins: full permission on everything. Only people who are responsible for maintaining the system should use administrative accounts.

DB Provider: Objects, which are generated by the DB provider, will be treated as "suggested objects", i. e. they will not be fully integrated to the ontology before an administrative account releases access. Therefore a database provider may: a) add nodes, edge-types and edges; b) edit nodes and edge-types which he generated; c) delete nodes and edge-types which he generated; d) add a database and edit delete their own database meta-information; e) define table attributes as nodes.

After an administrative account has released objects which were suggested by a database provider, the database provider can no longer edit these objects.

Everybody: everybody has permission to browse the ontology. However, confidential database information (host, port, login/password) can only be browsed by the database provider and by admins.


Because querying large ontologies can be computing intensive, performance was important even though our intention was to develop "only" a prototype. We implemented the system as a 3 tiered system consisting of a relational database (backend) and jsp 1.1 (java server pages) as the middle tier, which dynamically generates the html frontend. Using this architecture has several advantages: data (ontologies and database metainformation) can be consistently stored independently from the application and can also be retrieved or imported by using the various built in interfaces and tools of the DBMS.

Backend. Oracle 8i was used as the relational DBMS system, but any other database system could also have been used. Whereas often the "application logic" is located in the middletier, we implemented it partly in JSP and partly by using DBMS features such as constraints and PL/SQL (trigger, procedures, functions). This results in better performance (fewer JDBC calls, query optimisation by the DBMS) and has the advantage that much of the implemented functionality can be used by other applications without having to use the middletier. On the other hand, the use of DBMS features makes it more difficult to port SEMEDA to other DBMS. However, this does not affect raw data exchange of ontologies and database metadata between SEMEDA and other systems.

Figure 1 shows a simplified ER Schema of the backend. The actual implementation of the schema contains more features, which are not displayed in Figure 1: Adaptations for synonyms exist and a relationship REFINE connects TABLEs and NODEs and allows further definition of the content of a database table. This allows for example to specify if a database table contains data about just one species, even if the table itself does not have an attribute with species information.

Figure 1: Simplified ER-schema of SEMEDAs backend.

As can be seen in the ER schema (Figure 1), we basically save ontologies as a set of nodes, which are connected by edges. However, a tree or a net representation of the ontology is needed when a user browses an ontology, or when all "child nodes" of a given hierarchy have to be selected. For this, we used the "connect by prior" statement, a DBMS specific SQL extension [Oracle-Corp., 1997]. In [NHS Information Authority, 2000] several methods for "treewalking" in databases were investigated and it is stated that this proprietary SQL extension performs "excellent".

Database identifiers are often considered to be an unimportant issue. However, in order to facilitate data- and ontology exchange, finding a good solution was crucial and several requirements had to be met. IDs should be referable from external applications, i. e. should not change and be unique, also between SEMEDA implementations so that IDs do not have to be adjusted when an ontology is transferred between two SEMEDA implementations or with other ontology applications. The IDs should further be able to cope with IDs of other ontologies to facilitate import of ontologies (a number can be stored as a string, a string can not easy and unequivocally be stored as a number).

In order to achieve these requirements, we decided to use a prefix, which is unique for each SEMEDA deployment. Such a prefix could be similar to a stock symbol. For example, concepts generated at a SEMEDA version deployed at the Resource Center Primary Databases might use the prefix RZPD, followed by an integer. Such concept IDs should not be modified when ontologies or metadata are traversed between SEMEDA implementations. Also when ontologies are imported from other system, the original IDs should be used and only when necessary be extended by a unique, source ontology specific prefix.

Middle -Tier. The middletier indirectly connects the frontend with the backend, i. e. maps HTTP GET and POST requests to the appropriate SQL/DML statements and PL/SQL procedures via JDBC. In addition, JSP is used for session tracking.

Frontend. For the frontend, dynamically generated HTML and JavaScript was used. Internet Explorer 5.5 and Netscape 4.7 (or newer) versions are supported. By using Cascading Style Sheets, the layout of the frontend can easily be adjusted.

Ontologies are often large (see Table 1). Thus for a visual representation of the ontology, only a subset of all concepts can be displayed at a given time. For visual representation of the ontology a by depth and by relation type filtered tree is used, which is equivalent to a depth limited closure of a node over a given hierarchy.

SEMEDA can be browsed at (see also Fig. 2).

Figure 2: Screenshot of SEMEDA in edit mode. Whereas this user interface is complex and rich in functionality, the user interface for querying semantically integrated databases will be simple and user friendly. Left: Database metadata editor; middle: Ontology Editor; right: context dependent frame. See

The frontend consists of three frames: the left frame is a database metadata editor and the middle frame is used for ontology editing and browsing. The right frame is used context dependently to display and edit details of the database metadata and the ontology.

Both in the ontology and metadata editor frames, concept, database, table and attribute information can be viewed in the right frame by clicking the appropriate objects. These objects can be edited by selecting the appropriate radio-button before clicking on the appropriate "add", "edit" or "delete" button. Whereas all concepts should be part of the is-a relation, other hierarchies can be browsed (browse hierarchy) and new hierarchies can be defined (edit hierarchies).

Figure 3 shows some concepts of SEMEDAs "Main Ontology". Since the is-a relation is transitive, SEMEDA can derive: (enzyme is-a protein) and (protein is-a organic compound) (enzyme is-a organic compound). On the other hand, if a user wants to find all organic compounds, all subconcepts of "organic compound" can be derived (in this example: protein, enzyme, DNA). This is for example useful, if a user wants to know which databases contain attributes which can be searched for organic compounds, since all relevant database attributes are defined as "organic compound" or a subconcepts of "organic compound".

Figure 3: is-a hierarchy of the subconcepts of "Substance" in SEMEDAs "Main Ontology". Left: screenshot. Right: equivalent graph visualisation. Further explanations are given in the text.

Database attributes can be defined using the "define" button after selecting both the attribute and the appropriate concept in the ontology. By "defining" a database attribute as a concept, the database provider states: "entries of this database attribute are is-a children of the selected concept". A database attribute can be defined more than once, since database attribute contents sometimes are heterogeneous: for example the attribute "source" in the Protein Data Bank [Berman et al., 2000] may contain tissue, species, cell line etc. information. Defining an attribute only as one concept means that all attribute entries are is-a children of the concept, defining it both as a specific concept and a the root concept (thing) means that some entries are is-a children of the concept.

The "refine" button refines the content of a database table, i. e. makes statements about all entries of a table. Refining a table for example as mouse means that all data in this table is mouse data. Tables can also be refined to several concepts since for example a table may contain protein data from mice.

A clear distinction between browse and edit mode exists and the user always sees in which mode he is working. "Suggested objects" are visually differentiated and the username of the owner, i. e. the user who suggested an object, is displayed in the edit mode, thus a user can see if he has the permission to modify an object. Objects, which do not "belong" to somebody specific, can only be edited by administrative accounts. Such nodes should be 100% correct and not be modified. Thus a DB provider can rely on the semantics of the concepts in SEMEDA, and he can be sure that the semantics will not change after he defined database attributes as concepts.

Evaluation of existing ontologies for semantic database integration

SEMEDA can handle several ontologies, however for the purpose of database integration all databases should be semantically defined by using the same ontology. In the subsequent sections we list requirements to ontologies for our database integration approach, then we list ontologies which might be appropriate for semantic database integration, match the ontologies versus the criteria and finally discuss how and if ontologies can be merged and supplemented. However, even though we found several good ontologies, it might still be the best solution to build a small custom ontology. Whether it will be better to build a small custom ontology or to use one of the systems which we subsequently evaluated can only be decided by trying both approaches.


Generating large ontologies is a labour intensive time-consuming task [Schulze-Kremer, 1997b]. Therefore, ontologies should be imported whenever it is possible. However, in order to keep the ontologies up to date, it should be possible to re-import ontologies when they have been updated in the source ontology.

Subsequently we list criteria to ontologies and "knowledge collections" for molecular database integration. General requirements for molecular biological ontologies are given in [Schulze-Kremer, 1997b; Schulze-Kremer, 1998; Rector et al., 1998] however requirements for ontologies to be used in SEMEDA are more specific:

  1. IDs: unique id of concepts or unique label of ontology concepts.
  2. Stability of concepts: Whereas the text of concept definitions and the name (label) of concepts may change the semantic of a concept should remain the same. This, and the "unique id" criteria are important for "re-importing" ontologies and referencing ontology concepts.
  3. Valid is-a hierarchy: the is-a hierarchy is important, for "intelligent" database queries; hence the transitive closure of the is-a hierarchy has to be mathematically sound.
  4. Size: The ontologies may be small as long as they cover the database attributes to be integrated. Small ontologies are easier to survey, but a high number of synonyms would be very useful, since it would enable the users to use their own terminology.
  5. Availability: The ontology has to be available, free of charge is a plus but not a prerequisite.
  6. Wide use: wide acceptance and use is not essential but makes sure that the potential users are familiar with the terminology/structure of the imported ontology.
  7. Maintenance: the ontology should still be maintained and updated and it should be possible to suggest new concepts. Although it is possible to add new concepts within SEMEDA, it would be helpful if the newly needed concepts could also become part of the public domain/source ontology.
  8. Good definition of concepts.

In addition, the ontologies to be imported have to cover the field of research of molecular biological databases. Since we intend to map database attribute semantics, not database entries, the ontologies generally do not have to be very deep. In order to get an overview of the terminology needed in SEMEDA, we compiled the most common attributes from some molecular biological databases. The SRS implementation of the European Bioinformatics Institute gives a good overview of molecular biological databases ( From each database category (group) the attributes of the databases which have the highest number of entries were checked. In addition we investigated databases, which are already connected to the MARGBench: Brenda, RegulonDB [Salgado et al., 2001], TRANSFAC/TRANSPATH [Wingender et al., 2000] and MDDB [Hofestädt et al., 2000; Hofestädt et al., 1998]. Our aim was to get a broad overview of the knowledge domain needed for database integration. Therefore we did not have to exactly match the database attributes, nor did we list all attributes of the selected databases.

Common database attributes are: Molecule, Enzyme, Organelle, Gene, Transcription Factor, Transcription Binding Sites, Chromosome, Author, Title, Journal, Abstract, MedlineID, Reference, Comment, Specie, Organism, Tissue, Organ, Reaction, Compound, Age, Date, Sequence, Sequence length, Protein Chain, Evidence, EC-Number, Link, Sequence Type, Cell Lines, NMR Experimental Data, Atomic coordinates, Crystallographic Protein Coordinate Information, Protein Residues, Codon Change, Codon Change Position, Substrate, Product, Cytogenetic Location, Phenotype, Mutation, Mutation Type, Amino Acid Changes, Pathology, Therapy, Clotting Activity, Validated, Allele, Reaction Equation, Reaction Direction, Specific Activity, Purification Steps, Storage Conditions, Temperature, Coenzyme, Optimal PH, Michaelis Constant, Molecular Mass, Restriction Enzyme, Restriction Position, Vendor, Accession Number, Reference.

Evaluation of existing ontologies

In Table 1 some ontologies, controlled vocabularies and other structured knowledge sources are listed and evaluated against the above-mentioned criteria and terms. The criterion "good definition of concepts" is more or less subjective and therefore not discussed in the table. Some other ontologies which are listed and evaluated in [Schulze-Kremer, 1998] or [Noy et al., 1997] were not further investigated because they obviously did not cover the knowledge domain of molecular biology.

Table 1: Evaluation of Ontologies and Knowledge Source for semantic database integration.

Ontology/ Evaluation

Info: Uses IDs, fairly stable, transitive is-a, size: 1272 concepts available for free, rarely used, is maintained.

Comment: Covers fundamental molecular biological concepts and some general terms and concepts.
"upper" CYC URL:

Info: Uses IDs, fairly stable, transitive is-a, size: 3000 concepts, available for free, sometimes used, but not as much as WordNet, is maintained.

Comment: The "upper CYC" ontology is a general "toplevel ontology", commercial extensions to the upper CYC ontology are available.
WordNet 1.6 [Fellbaum, 1998] URL:

Info: Uses IDs, fairly stable, fairly transitive is-a, size: 66025 concepts (noun synsets), available for free, widely used, is maintained. Covers most common English words.

Comment: Fundamental molecular biological concepts are also covered. Specific molecular biological concepts are not covered. Word, id, and lex_file_num together identify concepts. "Semantic Concordance Package" can be used to trace updates between WordNet versions, however its complicated proprietary structure and flatfile format make imports and updates tedious.
EcoCyc/MetaCyc [Karp et al., 2000; Maranas et al., 2001] URL:

Info: IDs: uses IDs from the sources where it was compiled from (for example EC Numbers), fairly stable, fairly transitive is-a, available free for academic use, mainly used within the CYC system, is maintained.

Comment: small but "deep". All entries are subconcepts of Metabolic Pathways, Signaling Pathways, Reactions, Enzymes, Genes, tRNAs, Compounds or Citations. Data is Escherichia coli centric.
GO [Ashburner et al., 2000] URL:

Info: Uses IDs, fairly stable, non-transitive is-a, size: about 8000 concepts, available for free, widely used in genome databases, is maintained. Covers exclusively genetic and closely related terminology.

Comment: The Gene Ontology actually consists of three ontologies where some concepts are defined in more than one of the ontologies, but have different IDs in each ontology. The "is a" hierarchy is not a formal is-a hierarchy.

Info: Uses IDs, fairly stable, non-transitive is-a, size: about: 800 000 concepts, available for free, widely used, is maintained.

Comment: The Unified Medical Language System maps the terminology of 60 different biomedical source vocabularies.

Comment: The Medical Subjects Headings are a subset of UMLS.
MmCIF, [Westbrook et al., 2000] URL:

Info: Uses IDs, fairly stable, non-transitive is-a, available for free, widely used in protein-databases and protein structure related sciences, is maintained.

Comment: The Macromolecular Crystallographic Information File is a very specific large detailed ontology of crystallographic information, closely integrated to the Protein Data Bank [Berman et al., 2000].
Controlled Anatomical Vocabulary URL:

Info: Uses IDs, fairly stable, mixed is-a/is-part hierarchy, size: about 500 concepts, available for free, concepts compiled from several sources, is maintained.

Comment: Concept Definitions have to be looked up in the appropriate sources from which the controlled vocabulary has been compiled.
LinkBase [Ceusters, 2001] URL:

Info: Uses IDs, fairly stable, transitive is-a, size: 950.000 concepts, commercially available, used in L&C products and custom tailored systems, is maintained.

Comment: Biomedical terminology as well as basic general concepts are covered. In addition to the "is a" hierarchy, LinkBase implements several other relation types in a strictly formal way and 2.100.000 instantiated relations exist. In addition, 300.000 cross references to foreign systems exist.
GALEN Common Reference Model (CRM) [Rector et al., 1994] URL:

Info: Uses IDs, fairly stable, transitive is-a, size: about 1200 concepts, available for free (GALEN Open Source License), widely used, is maintained.

Comment: Extensible Core Model of Biomedical Terminology. Many instantiated relations exist. The core model is a tree and not a directed acyclic graph, i. e. multiple inheritance is not used.

Info: Uses IDs, fairly stable, information on transitive is-a was not available, size: about 200 000 concepts, commercially available, widely used, is maintained.

Comment: Biomedical Terminology.
Taxonomy [Benson et al., 2000] URL:

Info: Uses IDs, fairly stable, phylogenetic trees can be seen as excellent transitive is-a trees, size: huge, most relevant species to molecular biology are covered, available for free is maintained

Comment: Changes are documented from update to update. Is part of NCBI and thus closely integrated and referenced to and from other applications.
Tree of Life URL:

Info: IDs: Phylogenetic groups seem to be mapped as a file structure, thus no ids exist, fairly stable, phylogenetic trees can be seen as excellent transitive is-a trees, size: huge, unclear license restrictions

Comment: Taxonomic multiauthored phylogenetic tree. Uses also a picture for phylogenetic groups and implements a review process and much useful information. However, the information seems to be stored as plain html files, which would make importing the phylogeny very tedious. Larger than necessary for our purpose.

Of the mentioned criteria, formal transitive is-a hierarchies and coverage of database knowledge domains are the most important criteria for "intelligent" ontology based database queries as described in [Köhler et al., 2000]. Therefore, non-commercial ontologies, which are suited best for our purposes, are the Molecular Biological Ontology (MBO), WordNet and GALENs Common Reference Model. Commercial systems are LinkBase and SNOMED. The size and the fact that LinkBase covers many other relation types in a formal way makes it especially suitable. However, for SNOMED information concerning the transitivity of the is-a hierarchy was not available, neither from the documentation nor on request. None of the ontologies, which are listed in Table 1 covers, all relevant database concepts will have to be supplemented either manually or by merging ontologies [Uschold et al., 1998]. In ontologies which consist mainly of an is-a hierarchy, merging specialised ontologies by substituting whole is-a branches is possible. For example NCBIs Taxonomy could be merged into Wordnet although this would have to be done with some care: for example substituting WordNets "organism" hierarchy with NCBIs Taxonomy would also erase Author, since "Author is-a Human and Human is-a ... is-a organism". Since parts of non-formal ontologies such as GO are often correct is-a hierarchies, merging parts of such hierarchies into a toplevel ontology would also be possible. A labour intensive approach for using ontologies with valuable concepts (UMLS, GO, MmCIF) but non-formal is-a hierarchies would be to introduce a formal is-a hierarchy.

Even though it is possible to transfer any of the mentioned ontologies to formal ontologies, the time needed would be a matter of many days (MmCIF) or several "man-years" (UMLS). In ontologies which were generated by merging ontologies keeping track of updates and re-importing the updated source ontologies would be cumbersome.

Therefore we decided that for semantic database integration it might be best either to build a small custom ontology or to import one of the ontologies which support formal is-a hierarchies. Thus, non-commercial candidates are GALENs CRM, MBO and WordNet. The commercial LinkBase might be best suited due to its size and formal modelling of the is-a hierarchy. However, whether a small custom ontology or import of a large generic ontology works best can only be decided after both approaches have been tried.


In [McEntire et al., 2000] a list of some ontology exchange data formats/languages is given and evaluated and a standard for conceptional graphs has been proposed Interestingly, most "real world" applications do not make use of these formats or standards. Almost none of the ontologies or knowledge resources in Table 1 supports any of the exchange formats, but rather provide database dumps, tab-delimited ASCII files or proprietary data-formats. This might be due to the fact that most systems which handle large ontologies are implemented as databases, i. e. both for importing and exporting, database dumps and tab-delimited files are easier to handle than the elaborate exchange formats.

Whereas we implemented SEMEDA with molecular biological databases in mind, databases from other knowledge domains could also be integrated (Ecology, Chemistry, Agriculture, GIS, Stock-Market related databases, socio-economic etc.). Another application for SEMEDA might be to check syntactical correctness of database entries. This might be especially useful in database attributes which contain "natural language" terms such as species data, i. e. data for which no other algorithmic validity checking is possible. Since database attributes should contain is-a children of the concepts as which they are defined, syntactic correctness checking could be implemented straight forward, provided that the ontology is so large that it covers all entries of the attributes to be checked.

SEMEDA is a system which allows database providers to provide all technical and semantical meta-information, which is necessary to access their database. The next step after completion of SEMEDA-edit will be to implement support for integrated schema modeling [Freier et al., 1999] and "intelligent" database queries (SEMEDA-query) as described in [Köhler et al., 2000]. SEMEDA can handle several ontologies, and several ontologies can be imported / exported by using the DBMS built in features. Therefore, SEMEDA' s collaborative ontology editing feature is not restricted to database integration, and might also be useful for ongoing ontology developments such as the "Gene Ontology" [Ashburner et al., 2000].