ISB Home



- Article -





Volume 4

Special Issue
Ontology
Workshop
Tokyo 2003



Full article

In Silico Biology 4, 0005 (2003); ©2003, Bioinformation Systems e.V.  



Indexing anatomical concepts to OMIM Clinical Synopsis using the UMLS Metathesaurus

Teruyoshi Hishiki1, Osamu Ogasawara2, Yoshimasa Tsuruoka3, Kousaku Okubo2*

1 Biological Information Research Center, National Institute of Advanced Industrial Science and Technology (AIST)
   Email: t-hishiki@jbirc.aist.go.jp
2 National Institute of Genetics
   Email: oogasawa@lab.nig.ac.jp, kousaku@genomatrix.com
3 CREST, JST (Japan Science and Technology Corporation)
   Email: tsuruoka@is.s.u-tokyo.ac.jp

*  corresponding author


Edited by E. Wingender; received September 22, 2003; revised and accepted December 24, 2003; published December 28, 2003


Abstract

As a first step toward the quantitative comparison of clinical features of diseases, we indexed the text descriptions in the Clinical Synopsis section of the Online Mendelian Inheritance in Man (OMIM) with concepts for the body parts, organs, and tissues contained in the Metathesaurus of the Unified Medical Language System (UMLS). We also indexed the text with the diseases and disorders having links to body parts specified in the thesaurus. The vocabulary size was approximately 177,540 representations for 81,435 concepts, and 2,161 concepts were indexed to 3,779 OMIM entries. The indexed concepts included 134 concepts for the noun forms of anatomical concepts and 985 indexed concepts for diseases and disorders that were linked to 132 and 408 anatomical concepts, respectively. We report herein that the retrieval of OMIM entries for diseases affecting specific organs can be made more comprehensive through the anatomical concepts indexed to the Clinical Synopsis or linked to the indexed concepts, as compared to simply matching organ names to the Clinical Synopsis text. The recall and precision of identifying relevant body parts in the Clinical Synopsis were calculated as 78% and 92.5%, respectively, based on random sampling. The examination of the unidentified body parts due to lack of indexed diseases and disorders showed that although most of the concepts for diseases and disorders were contained in the Metathesaurus, their relations to body parts were not. The indexing result proved the effectiveness of the Metathesaurus as a resource for the identification of concepts indicating body parts, diseases, and disorders.

Key words: text mining, automated indexing, OMIM, UMLS