In Silico Biology 5, 0039 (2005); ©2005, Bioinformation Systems e.V.  

KEGG-based pathway visualization tool for complex omics data


Kazuharu Arakawa1, Nobuaki Kono1, Yohei Yamada1, Hirotada Mori1, 2 and Masaru Tomita1*




1 Institute for Advanced Biosciences, Keio University, Endo 5322, Fujisawa, Kanagawa 252-8520, Japan
2 Research and Education Center of Genetic Information, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0101, Japan



* Corresponding author
   E-mail: mt@sfc.keio.ac.jp





Edited by E. Wingender; received July 09, 2005; revised and accepted July 15, 2005; published July 18, 2005



Abstract

Pathway-level visualization of omics data provides an essential means for systems biology, to capture the systematic properties of the inner activities of cells. Here we describe a web-based resource consisting of a web-application for the visualization of complex omics data onto KEGG pathways to overview all entities in the context of cellular pathways, and databases created with the software to visualize a series of microarray data. The web-application accepts transcriptome, proteome, metabolome, or the combination of these data as input, and because of this scalability it is advantageous for the visualization of cell simulation results. The web server can be accessed at http://www.g-language.org/data/marray/.

Keywords: KEGG, pathway visualization, systems biology, microarray, transcriptome, metabolome



Introduction

The advent of high-throughput measurement techniques such as transcriptome by microarray and metabolome by mass spectrometry made it possible to capture the entire snapshot of cell-wide activity [Oliver et al., 2002]. The omics data becomes even more informative in a systems biology context, where the entire data is interpreted in the context of cellular systems and pathways, for example, allowing a systematic study of the regulation of the metabolism through gene expression [Kitano, 2002]. Given the large volume of data, visualization is crucial for the interpretation of results. However, most existing visualization tools remain in the genomic [Awad et al., 2004] or transcriptomic context [Rees et al., 2004], and only few of them link the information with the biological pathways for a systematic study of the regulation of pathways, that provides essential insights into the inner activity of a cell.

Both of VitaPad [Holford et al., 2004] and ArrayXPath [Chung et al., 2004] provide means to visualize the transcriptome data in the context of pathways, but VitaPad is based on automatic layout and ArrayXPath draws the maps de novo based on coordinates extracted from pathway databases. However, in many cases it is advantageous to view information in familiar reference pathway maps with gaps and candidate enzymes. GenMapp [Dahlquist et al., 2002] and EcoCyc Omics Viewer [Keseler et al., 2005] provide such functionalities, but these are oriented towards gene expression data, and cannot map complex omics data from different layers simultaneously. Growing number of studies address the importance of the integration of multiple omics data such as genome, transcriptome, and metabolome [Covert et al., 2004], and simultaneous representation is crucial for cell simulation results [Arita et al., 2005].

In this work we have developed a visualization tool in the form of a web-application, that maps complex omics data to KEGG pathways [Kanehisa et al., 2004]. KEGG database provides a SOAP API for generic mapping, but there are following shortcomings for the purpose of this work: 1) it is a generic library for programming rather than a software for direct application, 2) generated images are based on bitmap graphics with relatively low resolution, 3) it can only map one gene to an enzyme therefore unable to correctly map heteromeric enzymes with multiple subunits, and 4) it cannot simultaneously map transcriptome and metabolome data.



Implementation

The visualization tool was developed using the generic workbench for bioinformatics, G-language Genome Analysis Environment [Arakawa et al., 2003]. It is entirely coded with Perl programming language with Ming library (http://ming.sourceforge.net/) for the generation of Shockwave Flash (SWF) graphics, and is therefore cross-platform. SWF is a standard for vector graphics therefore enables resolution independent rendering, and since most web browsers are pre-equipped with Flash players unlike other vector image formats such as the Scalable Vector Graphics (SVG), the graphics are readily portable to the Internet. The web-application interface requires three inputs from the users: name of the organism, name of the pathway, and the data to map onto the pathway image. The first two entries, the names of the organism and pathway, are selectable from a dynamically obtained list of available organisms and pathways in KEGG using a pull-down menu. Given a query, the web-application first downloads the specified pathway image and HTML file from the specified organism specific KEGG database, therefore the pathway image and gene/enzyme/metabolite coordinates are always obtained from the latest KEGG distribution. Then the coordinates for genes, enzymes, and metabolites are parsed from the downloaded HTML, and using this information, values given in the query are mapped to the graph. For generic usage, the software takes a comma-delimited name-value pairs, where the value ranging from 1 to 100 represent the color of each entity. For reference pathway, EC number and KEGG compound ID can be used for the name of entries, and for every other pathway, common and canonical gene names and KEGG compound ID can be used. In this way, mapping of enzymes are limited to the reference pathway, and the possible combination of the "omics" data is the proteome and metabolome for the reference pathway, and transcriptome and metabolome for every other pathway. Genes and enzymes are represented in a spectrum ranging from red to green, and metabolites are represented in a spectrum ranging from blue to yellow. It is also worth noting that the circles representing the metabolites have wider diameter than those of KEGG for convenience. The software also correctly identifies enzymes with multiple subunits, and the box representing the enzyme is divided to fit all composite genes. By integrating the transcriptome, proteome, metabolome and pathway information, the system-level control of the pathway, with for example gene expression regulation, can be observed at a glance.



Figure 1: The screenshot of the database of microarray data generated with the visualization tool. The entire pathway maps are listed for a systematic overview, and by clicking to one of the thumbnails, a detailed pathway map with the color coded expression data opens. Heteromeric enzymes with multiple subunits are correctly identified to have multiple genes. Although it is not shown with this example, the software can map different layers of omics data such as the transcriptome and metabolome data simultaneously.




Application of the tool to microarray data

Using this visualization tool, we have created a database of pathway images, generated from the transcriptome analysis of 38 two-component regulatory system mutants of Escherichia coli K-12 W3110 [Oshima et al., 2002]. Each of the 38 knockout conditions contains images of 104 pathways with the corresponding gene expression levels mapped. The top page of the generated website displays a tiled thumbnails of the mapped images converted with ImageMagick utility (http://www.imagemagick.org/) for a systematic overview of the data set, and full-sized scalable images can be obtained by clicking on the thumbnails. This way the cell-wide gene expression regulation can be understood at a glance, and then it is possible to look into the specifics following the system level overview. In the thumbnail overview, the pathway names are interactively displayed by bringing the mouse cursor over the image. Users can also cycle through the different conditions of an enlarged pathway map using a pull-down menu. Likewise, there is another pull-down menu for the navigation between pathways in the same knockout condition.



Limitations

Although all the entities mapped onto the pathway image, including the colored box and circles and text representing the name of genes are drawn with vector graphics and are therefore scalable, the original KEGG pathway image that the FLASH image is based on is bitmap. This limits the overall image resolution when enlarged. Moreover, we are using FLASH as a vector image format, and currently its animation features are not employed. Interactivity and time-series representation using FLASH animation will be a future work.



Conclusions

The web-resource described in this work provides a web-application for pathway level visualization of omics data in vector images based on the KEGG database, and a database of visualized microarray data. The generated database, software, and the original dataset are accessible at our website (http://www.g-language.org/data/marray/, http://ecoli.aist-nara.ac.jp/).



Acknowledgements

The authors would like to thank Yoichi Nakayama for useful advices. Pathway Solutions Inc. granted us an academic license to distribute the software and the database in our website. This research was supported by the Japan Society for the Promotion of Science (JSPS), and CREST of JST (Japan Science and Technology).




References