In Silico Biology 10, 0002 (2010); ©2009, Bioinformation Systems e.V.  
Special Issue: Petri Net Applications in Molecular Biology


Cell Illustrator 4.0: A computational platform for systems biology


Masao Nagasaki*, Ayumu Saito, Euna Jeong, Chen Li, Kaname Kojima, Emi Ikeda and Satoru Miyano




Human Genome Center, Institute of Medical Science, The University of Tokyo, Japan




* Corresponding author
   Phone: +81-5449-5615; Fax: +81-5449-5442
   Email: masao@ims.u-tokyo.ac.jp





Edited by E. Wingender; received October 05, 2009; revised December 22, 2009; accepted December 25, 2009; published January 09, 2010



Abstract

Cell Illustrator is a software platform for Systems Biology that uses the concept of Petri net for modeling and simulating biopathways. It is intended for biological scientists working at bench. The latest version of Cell Illustrator 4.0 uses Java Web Start technology and is enhanced with new capabilities, including: automatic graph grid layout algorithms using ontology information; tools using Cell System Markup Language (CSML) 3.0 and Cell System Ontology 3.0; parameter search module; high-performance simulation module; CSML database management system; conversion from CSML model to programming languages (FORTRAN, C, C++, Java, Python and Perl); import from SBML, CellML, and BioPAX; and, export to SVG and HTML. Cell Illustrator employs an extension of hybrid Petri net in an object-oriented style so that biopathway models can include objects such as DNA sequence, molecular density, 3D localization information, transcription with frame-shift, translation with codon table, as well as biochemical reactions.

Keywords: biopathway, simulation, Petri net, modeling, pathway database, ontology, CSML, CSO, CI, ODE, Cell Illustrator, Java, JWS



Introduction

Systems Biology requires computational tools that enable us to understand and analyze complex biopathways. A strong need is a biology-oriented software with which biological scientists (users) can intuitively model and simulate complex dynamic interactions and processes in biopathways comprising of hundreds of entities within and among cells, e. g., gene regulatory networks, metabolic pathways, signal transduction pathways, and cell-cell interactions. To this aim, we started developing a software tool in 1999; the first version was published as Genomic Object Net 1.0 in 2002 [1, 2], and it was later released under the name Cell Illustrator 1.0.

This paper presents the new technologies and tools introduced to the latest version of Cell Illustrator 4.0 while discussing their impacts on biopathway modeling and simulation. For instance, Cell Illustrator 4.0 uses Java Web Start [3] and includes new pathway layout algorithms [4-7], formats {Cell System Markup Language (CSML) [8], Cell System Ontology (CSO) [9]} and tools related to pathway modeling and simulation. Cell Illustrator employs the concept of a hybrid Petri net [10, 11] as the modeling method. We extended this concept for object-oriented style and developed its core architecture, hybrid functional Petri net with extension (HFPNe) [10-12], which is optimized to be suitable for biopathway modeling and simulation. HFPNe was introduced to handle any type of objects to match with the original Petri net concept. In HFPNe, the new elements generic entity and process are introduced to handle these objects (Fig. 1). Additionally, by using discrete and continuous entities and processes, HFPNe can handle the discrete and continuous events at once and any kind of functions can be assigned to the delay, weight and speed parameters of these elements. ODE can be easily modelled using the subset of HFPNe elements, i. e., continuous entity, continuous process, and setting all input connector weight parameters to be "nocheck". For detailed formal definition and properties of HFPNe, readers are referred to [12, 13].



Click on the thumbnail to enlarge the picture
Figure 1: Petri net elements in HFPNe and their icons on Cell Illustrator. (a) A Petri net consists of three elements: place, transition, and arc. In HFPNe, to bridge the gap between computer science and biology researchers, these terms are renamed as more intuitive terms: entity, process, connector, respectively. In HFPNe, entity and process have three types – discrete, continuous, and generic – and connector has three types – process, associate, and inhibitory. In Cell Illustrator, entity and process can be displayed with more intuitive icons with the annotation by one of the biological pathway ontologies named CSO. (b) The connection rules of elements in HFPNe.


Click on the thumbnail to enlarge the picture
Figure 2: Transcription simulation of sequence level on Cell Illustrator. The upper part shows a sequence level simulation model using generic entities and a generic process (http://www.csml.org/download/model/csml30/generic_transcription_30.xml). In the lower part, a concentration level simulation model is displayed that uses only one continuous entity and process.

With Cell Illustrator, we can model and simulate any biological objects in biopathways, not only biochemical reactions, molecular density, and 3D localization information, but also sequence-level information (Fig. 2), e. g., translated product with frame-shift and translation with codon table.

Some modeling applications, including E-CELL [14], Gepasi [15] and BioSPICE [16], require some skills in mathematics and programming. The concept of Cell Illustrator, in contrast, does not require any prior knowledge in differential equations and programming. To achieve this, we developed biologically sophisticated GUIs and related tools described in the following sections. Prerequisites of Cell Illustrator are advertised as "interest in biology, ability to operate a cell phone, and the mathematical ability of standard middle school student or better." Since 1999, new versions of Cell Illustrator have been released almost every two years. Although there is a policy to distribute software applications as open source in the community, Cell Illustrator is distributed as commercial software in order to afford the ability to attend to every user's need and make continuous improvements quickly. Cell Illustrator Player (CI Player), a full viewer of Cell Illustrator models without the simulation engine, is freely distributed. Thus, users can share and view complete models similarly to what Adobe Reader does for PDF documents. The text book [13] and [17] present use of Cell Illustrator and its applications in detail.

With Cell Illustrator, a considerable number of users have been conducting biopathway modeling and simulation for their interested networks and have proven its practicality in their research. For example, Troncale et al. [18] modeled the regulation of hematopoiesis and investigated the role of interleukin-6 in human early hematopoiesis by simulations with an HFPNe model constructed with Cell Illustrator. Koh et al. [19] conducted parameter estimation by applying an evolutionary technique to their HFPNe model of Akt and MAPK signaling pathways and investigated their working hypothesis on crosstalk interaction. Hardy and Robillard [20] simulated the Ca2+/calmodulin-dependent protein kinase II (CaMKII) regulation network with Cell Illustrator, and then analyzed the dynamics of signal propagation in the CaMKII regulation pathway. Sato et al. [21] modeled the olfactory transduction pathway and implied that increased PDE1C dosage extends the longevity of the depolarization signals of the olfactory receptor neuron. Wu and Voit [22] demonstrated how the canonical GMA and S-system models in BST can be directly implemented in HFPNe framework using Cell Illustrator. In addition, they described on Cell Illustrator how to account for different types of time delays as well as for discrete, stochastic, and switching effects [23].

The above mentioned researches have shown that pathway modeling based on the Petri net concept is practically accepted by biological scientists. We believe that Cell Illustrator will enhance more biological research using biopathway modeling and simulation. In the Methods section, we address new functions of Cell Illustrator 4.0 for graph layout, exchange of formats from/to CSML, CSO with visualized icons, and SaaS (Software as a Service) technology and modules. Finally, we discuss limitations of Cell Illustrator 4.0 and further functionalities that should be built up for large scale Systems Biology.



Methods


Automatic graph layout

When the total number of elements in a biopathway model is fewer than fifty, the function of the automatic pathway layout is less important. In that situation, it was enough to put and arrange those elements manually. No automatic layout function was implemented in Cell Illustrator 1.0 released in 2002. Along with the progress of Systems Biology, there has arisen a strong requirement for handling larger pathway models and pathway models written in other XML formats that lack graphical layout information. Naturally automatic layout functionality was keenly demanded to solve this requirement. The first simple solution was to use a known graph layout library. The later Cell Illustrator has selected one of the graph layout libraries named JGraph with Circle, Moen, Sugiyama, and organic layout algorithms [24]. Unfortunately, these layout algorithms were not enough for most biopathway models. From this fact, new grid-based layout algorithms have been developed [4-6] and implemented in Cell Illustrator. Fig. 3 shows all of the graph layout algorithms, including BLK [4, 25], SCCB [6], and Grid Eades [5]. These grid layout algorithms position the elements on the grid points. With this function, Cell Illustrator succeeded in laying out the pathway models by considering cellular location information that has a complicated structure, e. g., a need to position some elements on the internal region of the torus shape. In Fig. 4, elements in the nucleus, e. g. transcription process, pri-micro RNA entity, are arranged on the nucleus cell component while elements in the cytoplasm, e. g., the translation process, are put on the cytoplasm cell component. It is difficult to generate such a layout with such complicated cellular location information using the force directed based layout algorithms. This is a unique feature of Cell Illustrator 4.0 that is superior to other such software applications.



Click on the thumbnail to enlarge the picture
Figure 3: Graph Layout dialog. Six grid layout algorithms (BLK, CB, SCCB, Eades, Random Grid, Adjustment) are implemented. Subcellular localization information can be used for layout. Layout parameters can be set up using the "Option" button.


Click on the thumbnail to enlarge the picture
Figure 4: Layout result of the ASE cell fate model in C. elegans. computed with the grid graph layout algorithm (SCCB) from a random layout shown at the left-bottom. The model has 76 nodes (entities: 24; processes: 52) and 82 connectors.


CSML 3.0 and import from and export to other formats

The native XML format of Cell Illustrator 1.0 was CSML 1.9 which aimed to represent the HFPNe simulation model with custom graphical information. Until now, various pathway databases were built up with their own XML formats. Therefore, development of XML with high expressive power that can cover most of them for data import without loss of information while keeping their contents, e. g., the biological meanings, simulation model, and layout information, is important. To deal with this situation, CSML version 3.0 [8] was developed as a highly optimized XML format for biopathway modeling and simulation that almost achieves this objective (the latest version is 3.0 in September 2009). Its major features are as follows:

  1. Full compatibility with the pathway modeling and simulation ontology format Cell System Ontology 3.0 (CSO) [9]. With this feature, it is possible to exchange other ontology based formats, e. g., BioPAX [26].
  2. Ability to contain the fact-based information that has no effect on the simulation model but has very important meaning to the biopathway, e. g., indirect regulation from one gene to another.
  3. Capacity to represent not only high-level Petri net models but also ODE based models.

Cell Illustrator 4.0 faithfully implements the major CSML 3.0 specifications: other modeling and simulation XML formats, such as SBML [27] and CellML [28], can be imported exactly into Cell Illustrator 4.0. BioPAX can also be imported into Cell Illustrator 4.0 via CSO 3.0 format while complementing kinetics to template simulation models. Other pathway databases, e. g., KEGG [29, 30] and TRANSPATH [31], can also be imported into CSML 3.0 format. It should be noted that these functions in Cell Illustrator 4.0 have been employed and extended the results in several studies [32-35]. The detailed internal conversion data flow of Cell Illustrator 4.0 is summarized in Fig. 5.



Click on the thumbnail to enlarge the picture
Figure 5: XML format transformation. CellML and SBML are transformed with CellML2CSML and SBML2CSML to CSML 1.9 and BioPAX is transformed with BioPAX2CSO to CSML 3.0 without losing any information. Any models in CSML 1.9, CSML 3.0, and CSO 3.0 run on CIO 4.0. All transformation tools such as CellML2CSML, SBML2CSML, and BioPAX are included in CIO 4.0 and user can directly import any models to CIO 4.0.

Cell Illustrator 4.0 is able to export a CSML 3.0 model into a CSO 3.0 model without loss of information since CSML 3.0 is fully compatible with CSO 3.0. Other well-known XML formats, SBML V3L3, CellML 1.1, and BioPAX L2, are not rich enough to hold the full CSML 3.0 model; thus, exporting from CSML 3.0 to these formats results in loss of important information, which is not handled in SBML and CellML. For example, none of these formats have unified graphical representation since the model developed in one application cannot load on other application with the same view. Recently, although SBGN L1 has been proposed [36], its format is still under development and is considered only a limited graphical representation of the biological components. As another example, the BioPAX L2 cannot handle signal transduction reactions and also cannot deal with information related to pathway simulation. From the user's point of view, more export functionality to other formats, e. g., Cytoscape [37], will provide better usability. This will be a task for a future release.


Cell System ontology and standard icons

The native XML format of Cell Illustrator 4.0 is CSML 3.0, which has the background of Cell System Ontology (CSO) 3.0 [32]. CSO allows ontology based representation of signal transduction pathways, gene regulatory networks, metabolic pathways, cell-cell interactions with kinetics, and graphical information. Formally, the schema is defined using Web Ontology Language (OWL) [38]. The major features of CSO are as follows:

  1. CSO 3.0 can be applied to a pathway model both with and without simulation; and
  2. the core vocabulary in a biopathway is prepared in CSO 3.0 as entities, processes, and cellular locations (92, 275, and 114, respectively) and all terms in the vocabulary are equipped with standard icons.

Feature (i) allows representation of a pathway model in the absence of kinetics, e. g., KEGG [29] or Reactome [39], without loss of information. Feature (ii) not only boosts more exact data exchange with other applications (since the original feature of the ontology concept) but also enhances the intuitive data exchange among users since standard icons completely remove the ambiguity of a graphical pathway model, e. g., the process icon always means phosphorylation among any applications that support CSO 3.0. In Cell Illustrator 4.0, all standard icons are collected in the Biological Element dialog (Fig. 6). Thus, only by repeating drag and drop (D&D) operations of those icons from the dialog with a filtering step, e. g., keyword search, user can create a graphical pathway model with an ontology background. The ontology information is effectively used in some applications. Automatic pathway layouts that consider cellular locations with this ontology are implemented in CIO4.0. In [40] and [41], semi-automatic pathway validation and model checking with the ontology information are applied to generate qualified pathway models from given pathway models. The research outputs of those pathway validations and models will be available in a future release.



Click on the thumbnail to enlarge the picture
Figure 6: Biological Elements Dialog. Three tabs-"Entity", "Processes", and "Cell Component"-are shown. The "Filter" box allows easy search of target elements.


SaaS technology and Cell Illustrator Online

Various types of analysis requests are coming from Systems Biology research and development. Some of these requests require a huge supercomputer system for tasks such as optimal parameter search, very expensive/large databases to own, or very specific analysis focused on a specific research topic. No single software can cover all of these capabilities, and software customization is very expensive or impossible to cope with. Thus, inevitably, we require Software as a Service (SaaS) technology [42] for the Systems Biology computational platform. SaaS is a software application delivery model that is usually associated with software businesses and is considered a low-cost way to obtain the same merits without the associated complexity and high initial cost as licensing. The technology is introduced in Cell Illustrator 4.0 and its modules (described in the next section) and is provided by the CIO servers. Users can select the desired modules from them on their demands. The following six modules including the beta version, are serviced on the server side:

  1. CSMLDB Search Module
  2. Project Management Module
  3. High-performance Simulation Module
  4. Pathway Parameter Search Module
  5. Pathway Model to Multiple Program Languages Export Module (Java, FORTRAN, C++, C, Perl, and Python)
  6. CSML to SVG Module and HTML Module (beta)

Moreover, third parties can develop their own modules and deliver them from the server side module using APIs. To distinguish the Cell Illustrator via Java Web Start (JWS) [3] and the original (standalone) version, the former one is named Cell Illustrator Online (CIO). After JRE 1.6.0_12, the 64-bit version of JWS is also supported by Sun Microsystems, Inc. with the technical improvement in 2009. This allows users to allocate more than 1.4 gigabytes of memory to CIO 4.0, which is very useful since pathways with more than 3,000 elements with biological annotations sometimes require more memory that cannot be handled within 32-bit JRE. In our experience, more than 2 terabytes of memory can be allocated and the limitation will depend on the total memory of the client machine.

With the JWS technology, users can easily publish their CSML model on a website by creating the URL links with the rules listed in Tab. 1. If a user creates a link to launch Cell Illustrator Online Player (CIO Player), the read-only version of CIO that can launch without user registration, the linked CSML model can be freely accessed by any user with a Java 6.0 installed on a machine. If the linked CSML file is generated with logging information of simulation called CIL file, CIO Player can also replay the simulation by using the logged information (Fig. 7). Thus, without using CIO itself, the overview of the pathway structure, the kinetics of reactions, and the simulation behavior of the CSML model are available via the Internet.


Table 1: URLs and options for CIO Application by Java Web Start
(a) Base URLs
Usage Application URL
Academic CIO Player https://cionline.hgc.jp/cifileserver/launchCIOPlayer?
CIO https://cionline.hgc.jp/cifileserver/launchCIO?
Commercial CIO Player https://cio.bioillustrator.hgc.jp/cifileserver/launchCIOPlayer?
CIO https://cio.bioillustrator.hgc.jp/cifileserver/launchCIO?
(b) Allowed options (suffixes)
Option Allowed value Status Default Note
model https://xxx/xxx.csml
or http://xxx/xxx.csml
required Usually, .csml, .csml.gz, .cil or .cil.gz will be the suffix; it is possible to specify multiple CSML models with ",".
antialias on or off optional If this value is on/off, then force apply antialias/non-antialias to the displayed image; when this option is not given, it starts without changing the setting of CIO.
mode GN or BP optional If GN then CIO is forced to launch in gene network mode; if BP then CIO is forced to launch in Petri net mode. When this option is not given, it starts without changing the setting of CIO.
XMX integer value optional 512 The value specifies the maximum memory (MB) of the launched application. If the large gene network model should be loaded on CIO. The size should be larger. In 32bit machine the maximum value can be 1400; in 64bit machine, the maximum value will be almost unlimited.
(a) Select server and Java Web Start application. (b) Options for applications. If a model is located at http://www.aaa.bbb/file.csml, user can make URL https://cionline.hgc.jp/cifileserver/launchCIOPlayer?model=http://www.aaa.bbb/file.csml and can view the model in file.csml with CIO Player.



Click on the thumbnail to enlarge the picture
Figure 7: CIO Player is replaying the simulation of circadian rhythms in the Drosophila melanogaster model. CIO Player can replay the simulation by loading the CIL file, a CSML file with log information. This image contains the loaded URL https://cionline.hgc.jp/cifileserver/launchCIOPlayer?model= http://www.csml.org/download/model/csml30/circadian_drosophila_30.cil.gz.



Modules of CIO 4.0


CSMLDB Search Module

The "CSMLDB Search Module" can store each CSML pathway model into an XML database and can search the pathway content via GUI interface (Fig. 8). As of September 2009, TRANSPATH Academic and TRANSPATH Professional are fully supported. CSMLDB Academic originates from the academic version of TRANSPATH version 7.4 with Transpath2CSML technology [31] and will be available to academic users via BIOBASE GmbH (trial version of which is available for one month after registration). CSMLDB Professional originates from the commercial version of TRANSPATH version 8.4 with Transpath2CSML technology and contains more than 100,000 reactions discovered in mammalians (Tab. 2).



Click on the thumbnail to enlarge the picture
Figure 8: CSMLDB Search dialog. Shown are the results of a search for "sam" to in CSMLDB 8.4 (CSMLDB Professional). By default, the search results are sorted by ID.


Table 2: Total number of entries in CSMLDB 7.4 and CSMLDB 8.4
Database Element type Number of entries
CSMLDB 7.4 Entity 89,469
Process 140,868
CSMLDB 8.4 Entity 117,967
Process 182,383

The CSMLDB GUI provides three tabs, "Entity", "Process" and "Fact" and can search over name, ID and synonyms for entity, process and fact elements (Fig. 8). The matched result can be placed by drag-and-drop (D&D) to the main canvas by merging to the current model of the active canvas or by insertion onto a new canvas with or without applying automatic graph layout (Fig. 9). It should be noted that the source CSML models of CSMLDB Academic and Professional are simulatable models since entity and process elements are stored in the CSMLDB. In other words, no content is stored in the Fact tab of that dialog in the current version. Facts, e. g., indirect reactions, that do not have any effect on simulation models will be prepared as fact elements in a future release.

The module allows users to simplify the modeling step into two steps: (i) search the genes and proteins of interest from the known 100,000 reactions, and (ii) D&D the matched result by filtering. Moreover the created model will be a template ready for simulation.



Click on the thumbnail to enlarge the picture
Figure 9: Importing models from CSMLDB. When a canvas has a model p53 + Mdm p53:Mdm and to the canvas, a model for p53 MDM2 (activation of transcription of MDM2 by p53) is imported with the combination of checkbox options: "Merge", "Auto-Layout" and "Create New Canvas" (in a total of six patterns). If "Merge" is checked, the entities with the same ID on a canvas are merged into one entity. If "Auto-Layout" is checked, automatic layout is applied to the whole elements on the canvas. If "Create New Canvas" is checked, the newly inserted model (without "Merge" option) or merged model (with "Merge" option) is inserted on the new canvas.


Project Management and CSML Pathway Library Modules

As already mentioned, CIO 4.0 is launched with JWS technology after user authentication. With this feature, the CIO 4.0 server can identify each user and can provide services in each user level.

The "Project Management Module" (the top rectangle region in Fig. 10a) allows users to create their own projects on the server side and stores the CSML models and any files related with those projects, e. g., pdf, ppt, doc, xls, and txt. With this module, users can launch CIO 4.0 on any computer and can access his/her projects on the server side. With D&D operation of a CSML model into the main canvas, the CSML model is automatically opened in the main canvas. Moreover, users can share the contents with other users for project level with read or read-write permissions (Fig. 10b).



Click on the thumbnail to enlarge the picture
Figure 10: Project Manager Dialog. (a) The Project Manager dialog consists of two sections, "User area" (top) and "Library area" (bottom). (b) Sharing step of a project (here "our shared project") in the "User area." Each project can be shared with "Read", "Write," or both permissions.

The "CSML Pathway Library Module" (the bottom rectangle region in Fig. 10a) provides the CSML pathway libraries in public domains (public library) or commercial (commercial library). The public library contains all CSML models in http://www.csml.org/ with three categories, signal transduction pathways, gene regulatory networks and metabolic pathways. The library also contains all CSML models in the text book [13]. The commercial library registers more than 1,000 well-established biopathways that originated from TRANSPATH [43] in BIOBASE with the Transpath2CSML application [31]. All of those CSML models can be loaded, edited, saved, and simulated with CIO 4.0.

Those modules ease users to access and share their CSML models and reuse of CSML libraries in public or commercial domains.


High-performance Simulation Module

The native simulation engine in CIO 4.0 is tightly integrated with one of the script engines, named Pnuts [44], which can be compiled into Java byte code. The native simulation engine has two modes: simple math engine and complex math engine. The two modes are automatically switched depending on the complexity of the reactions of the pathway model. In more detail, if the reaction rules consist of simple math, e. g. four arithmetic operations, simple math engine is used. If other reaction rules that can be described with Pnuts script language - e. g. "if ... then ...", pow, log, and Java method itself - then complex math engine is used. The selected mode is shown in the status bar at the bottom as Optimize on or off ("on" means use of the simple math engine).

The simulation performance of the simple math engine is ten times better than the complex math engine on average and both engines are acceptable with 100 reactions (200 to 300 as total elements) on a normal machine. Most of the simulation models are categorized into this range of reactions and acceptable for most users while sometimes the size of a simulation model is getting more than thousand reactions (3,000 to 4,000 as total elements, depend on the complexity of scripts). For example, a Caenorhabditis elegans vulval development model that modeled cell-cell regulations of six cells and signal transduction regulatory network in each cell contains 1,649 elements (the complexity of each script is high) and takes several hours on a normal machine (12,410 seconds on Intel Core 2 Extreme X9650 3 G Hz) [40, 45]. In CIO 4.0 under the SaaS concept, users can activate and use the high-performance simulation module on demand. The module writes down the CSML model into pure Java native language and compiles it by using javac, which is freely distributed with Java Development Kit (JDK) [46] (requires JDK 6.0 or higher). Using this module, the above C. elegans model can run within one min, i. e., hundreds times faster (31 seconds on the same machine). Thus, the High-performance Simulation Module can accept models with thousands of elements.


Pathway Parameter Search Module

The problem of parameter search for dynamic pathway models is one of the most crucial topics in Systems Biology. Some challenges have been made for automatic parameter estimation for HFPNe models by using a technology called data assimilation (DA) which blends simulation models and observational data rationally [47, 48]. This data assimilation method is more suited for a high-performance computing system with peta FLOPS computing ability. These efforts and developments are anticipated to create groundbreaking modeling platforms for Systems Biology. In geophysics, the DA approach is applied to the prediction of the El Niño-Southern Oscillation (ENSO) phenomenon [49] that is known as the strongest climate variation on seasonal to inter-annual timescales. Since this requires high performance computers to obtain the acceptable performance, we decided not to include this function to the CIO 4.0 module.

As an alternative solution, in CIO 4.0, "Pathway Parameter Search Module" is provided for the normal computer. This module executes multiple simulations at once with many initial conditions with some range of values, e. g. run six simulations from the initial value zero to ten with every two intervals, and displays the results with 2D or 3D plots. If user searches ten different conditions for each of three entities, then in total one thousand simulations (103) should be executed at once. The module cannot work with acceptable performance without using the technology in "High-performance Simulation Module." This module informs how the systems behavior will change according to the changes in initial values. Additionally, by minor updates of the target model, the module can be applicable to investigate the effect of the coefficient of reaction speed of process and threshold value of connector. In those cases, the coefficient (or threshold) itself should be represented by using an entity (what we call externalization of coefficient or threshold). If the coefficient (or threshold) is once externalized, the instruction to use "Pathway Parameter Search Module" is the same. An example to externalize the coefficient of mass action is shown in Fig. 11.



Click on the thumbnail to enlarge the picture
Figure 11: Example of externalization of coefficient. The top model has a mass action m1*0.1 on the process p1. The coefficient of the mass action, i. e., 0.1, can be externalized with an entity e3 and the mass action on the process becomes m1*k1 as in the bottom model. Once externalized, the parameter can be estimated with the "Pathway Parameter Search Module."


Pathway Model to Multiple Program Languages Export Module

As an advanced usage, users want to use their CSML models for other applications as simulation models. For this purpose, "Pathway Model to Multiple Program Languages Export Module" is provided. The module can export one CSML model into one simulation model with Java, FORTRAN, C++, C, Perl or Python. The exported program can be directly compiled and executed with suitable compiler of each programming language, e. g., javac, gfortran95, g++, gcc or executed without the step of compile, i. e., Perl or Python. If the written script of the input model only contains predefined kinetics, i. e., mass, stochasticmass, stochasticlognormalmass, Michaelis Menten provided in CIO 4.0 or using custom kinetics, i. e. connectorrate, connectorcustom and custom, but limited to use four arithmetic operations, IfTime(simulator, compare time), getElapsedTime(simulator), getSamplingInterval(simulator) and ternary operator, e. g., ? x : y;, then the generated model can be compiled without any updates (note that the ternary operator is special scripting syntax for modules (3) and (4), this can be used instead of "if" and "else" statement in normal simulation mode since if else syntax is supported in Pnuts script). In other words, if the model contains other advanced syntax in Pnuts script language, the exported model should apply custom update with some efforts. The exported result of the CSML model to several programming languages is shown in Fig. 12. The function of this module to export with Java is used in one of the processing steps in module "Pathway Parameter Search".



Click on the thumbnail to enlarge the picture
Figure 12: Program Language Export Module. The converted result of the HFPNe model on top into the source codes of Java language with Program Language Export Module.


CSML to SVG Module and CSML to HTML Module

In CIO 4.0, without using module functionalities, the model can be saved with some raster image formats, i. e., png or jpeg format, which are usually used for displaying purpose only. For editing purpose, vector image format is better, e. g., pdf, ai and SVG [50]. "CSML to SVG Module" is developed for this purpose and can export the CSML model as a file with SVG format. The reason to select SVG format among those vector image formats is that the format is the sole XML format to represent vector image and also officially supported as a vector image format in CSML 3.0, e. g. <image format="svg">. With the same reason all predefined biological terms with icons in CSML 3.0 are distributed in SVG format. Many viewers and editors of SVG format are distributed, e. g., Inkscape [51], Adobe Illustrator, and CorelDraw, still with minor implementation differences of SVG format among them. Thus, the exported result by this module might not be correctly displayed on some platforms and therefore the module is currently in beta status. A snapshot of CIO 4.0 for the circadian rhythms in a Mus musculus model and the exported and loaded result on Inkscape are shown in Fig. 13.

For reporting purposes of CSML models, the "CSML to HTML Module" is provided. As in Fig. 14, this module generates HTML files with png images by taking one CSML model as its input.



Click on the thumbnail to enlarge the picture
Figure 13: Result of CSML to SVG Module. Circadian rhythms in Mus musculus model (http://www.csml.org/models/csml-models/circadian-rhythms-in-mouse/) (top) converted to an SVG image and loaded on Inkscape (bottom).


Click on the thumbnail to enlarge the picture
Figure 14: Result of CSML to HTML Module. Circadian rhythms in Mus musculus model (http://www.csml.org/models/csml-models/circadian-rhythms-in-mouse/) converted to an HTML reporter format with the "CSML to HTML Module" and loaded on a web browser. Entities and processes are arranged on the header with links to the detailed descriptions that contains ID, name, simulation properties, biological properties and external URL links.



Results and discussion

Since the first release of Cell Illustrator 1.0, many developments and improvements have been made. First, the native format is extended to be more suitable for biological pathway modeling, visualization, and simulation of ontology background by using new formats CSML 3.0 and CSO 3.0. The CSML 3.0 can create simulation models not only limited to Petri net models but also ODE based models. The standard icons are prepared in CSO 3.0 and thus user can create the graphical pathway model with ontology background by simply preparing the D&D operations from the Biological Element dialog that provides all those predefined icons. The created model becomes a template ready-for-simulation model since it is represented with HFPNe. With the highly optimized feature of CSML 3.0 and CSO 3.0 to represent biopathways, pathway databases in other formats can be imported to CIO 4.0 directly without loss of any information, e. g., KEGG, Reactome [39] with BioPAX [26], CellML repository [28], or BioModels [52].

As to SaaS technology, Cell Illustrator 4.0 was developed with the Java Web Start technology with authentication on server side and each user can select user's own optimal combination of modules in Cell Illustrator. The "High-performance Simulation Module" helps the user whose focus is to conduct a heavy simulation with thousands of elements. The "Pathway Parameter Search Module" allows users to find better parameter sets for their models. The "Pathway Model to Multiple Program Language Export Module" will be useful for the more advanced user who needs to connect the simulation model on CIO 4.0 to user's own application with source code level. The "CSMLDB Search Module" will be helpful for the user who needs to create mammalian pathway models from scratch since more than 100,000 reliable reactions are registered in CSMLDB. The websites [8, 53] and the textbook [17] are useful for users who are interested in developing their skills for building biological pathways with Cell Illustrator.

The forthcoming Cell Illustrator Online (CIO 5.0) will have the full implementation of CSML 3.0. By this feature, an entity-fact based pathway model (static pathway) and an entity-process based pathway model (simulatable pathway) can be mixed into one model. The CIO 4.0 needs to model the entity-fact and entity-process based pathway models in different modes, named "gene network mode" and "normal mode". Moreover, the user can create multiple sub-views from the main model by filtering the contents with some rules, e. g., gene layer, protein layer, nucleus layer, cytoplasmic layer, or expression levels of each element. Those sub-views do not have any effect on simulation since the main model (model before filtering) is simulated. Since the network size is getting larger, e. g., several thousand elements, the sub-view concept will be inevitable to grasp the characteristic features of those pathways.

In CSML 3.0, any language can be set for simulation of each kinetics and initial value, namely, <script language=""> is used in that format. However, CIO 4.0 currently supports only the Pnuts language [44]. In CIO 5.0, script-based languages Javascript and Jython [54] and a compile-based Java language will be allowed. Furthermore, CIO 5.0 will allow mixing of several script languages in one model, e. g., one reaction speed uses Javascript language and another reaction speed uses Java language. In CIO 4.0, the simulation result of ODE compatible modeling as mentioned before (just model with continuous elements and assign "nocheck" as the weight parameter of arcs) is similar to the simulation result of the numerical integration of the Euler method. To keep better compatibility with the high precision ODE-based simulators without violating the Petri-net formalism, higher order numerical integration methods, e. g., Runge-Kutta, can be selected in the next release.

In CIO 4.0, the biological elements of mRNA, protein, and their modified form and complexes are available as standard 92 icons of the CSO core vocabulary and 100,000 icons of the "CSMLDB Search Module". But CIO 4.0 supports less vocabulary of chemical compounds and the future release should cope with this weakness owing to high user demand.



Acknowledgements

We are grateful to many people. First and foremost, we would like to thank the current and former members of the Cell System Markup Language projects: Hiroko Nishihata, Kazuyuki Numata, Atsushi Doi, Yayoi Sekiya, Yoshinori Tamada, Simamura Teppei, Ruy Yamaguchi, Seiya Imoto, Kazuko Ueno of Human Genome Center in University of Tokyo; Hanji Hioka, Yuto Ikegami, Hironori Kitakaze, Yoshimasa Miwa, Daichi Saihara, Tomoaki Yamamotoya, Hiroshi Matsuno of Yamaguchi University. We would also thank users of Cell Illustrator who develop the excellent models on this platform and give insightful feedbacks for the development of Cell Illustrator.



References


  1. Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2003). Genomic Object Net: I. A platform for modeling and simulating biopathways. Appl. Bioinformatics 2, 181-184.

  2. http://www.genomicobject.net/

  3. http://java.sun.com/javase/technologies/desktop/javawebstart/index.jsp

  4. Kato, M., Nagasaki, M., Doi, A. and Miyano, S. (2005). Automatic drawing of biological networks using cross cost and subcomponent data. Genome Inform. 16, 22-31.

  5. Kojima, K., Nagasaki, M., Jeong, E., Kato, M. and Miyano, S. (2007). An efficient grid layout algorithm for biological networks utilizing various biological attributes. BMC Bioinformatics 8, 76.

  6. Kojima, K., Nagasaki, M. and Miyano, S. (2008). Fast grid layout algorithm for biological networks with sweep calculation. Bioinformatics 24, 1433-1441.

  7. Hashimoto, T. B., Nagasaki, M., Kojima, K. and Miyano, S. (2009) BFL: a node and edge betweenness based fast layout algorithm for large scale networks. Bioinformatics 10, 19.

  8. http://www.csml.org/

  9. Jeong, E., Nagasaki, M., Saito, A. and Miyano, S. (2007) Cell System Ontology: Representation for modeling, visualizing, and simulating biological pathways. In Silico Biol. 7, 0055.

  10. Alla, H. and David, R. (1998). Continuous and Hybrid Petri Nets. Journal of Circuits, Systems, and Computers 8, 159-188.

  11. Matsuno, H., Doi, A., Nagasaki, M. and Miyano, S. (2000). Hybrid Petri net representation of gene regulatory network. Pac. Symp. Biocomput. 5, 341-352.

  12. Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2004). A versatile Petri net based architecture for modeling and simulation of complex biological processes. Genome Inform. 15, 180-197.

  13. Nagasaki, M., Saito, A., Doi, A., Matsuno, H. and Miyano, S. (2009). Foundations of Systems Biology: Using Cell Illustrator and Pathway Databases. Springer, Berlin Heidelberg.

  14. Tomita, M. (2001). Whole-cell simulation: a grand challenge of the 21st century. Trends Biotechnol. 19, 205-210.

  15. Mendes, P., (1993). GEPASI: a software for modeling the dynamics, steady states and control of biochemical and other systems. Comput. Appl. Biosci. 9, 563-571.

  16. http://biospice.sourceforge.net/

  17. Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2005). Computational modeling of biological processes with Petri Net-based architecture. In: Bioinformatics Technologies, Chen, Y.-P. P. (ed.), Springer, pp. 179-242.

  18. Troncale, S., Tahi, F., Campard, D., Vannier, J.-P. and Guespin, J. (2006). Modeling and simulation with Hybrid Functional Petri Nets of the role of interleukin-6 in human early haematopoiesis. Pac. Symp. Biocomput. 11, 427-438.

  19. Koh, G., Teong, H. F., Clément, M. V., Hsu, D. and Thiagarajan, P. S. (2006). A decompositional approach to parameter estimation in pathway modeling; a case study of the Akt and MAPK pathways and their crosstalk. Bioinformatics 22, e271-e280.

  20. Hardy, S. and Robillard, P. N. (2008). Petri net-based method for the analysis of the dynamics of signal propagation in signaling pathways. Bioinformatics 24, 209-217.

  21. Sato, Y., Hashiguchi, Y. and Nishida, M. (2009). Evolution of multiple phosphodiesterase isoforms in stickleback involved in cAMP signal transduction pathway. BMC Syst. Biol. 3, 23.

  22. Wu, J. and Voit, E. (2009). Hybrid modeling in biochemical systems theory by means of functional petri nets. J. Bioinform. Comput. Biol. 1, 107-134.

  23. Wu, J. and Voit, E. (2009). Integrative biological systems modeling: challenges and opportunities. Front. Comput. Sci. China 3, 92-100.

  24. http://www.jgraph.com/jgraph.html

  25. Li, W. and Kurata, H. (2005). A grid layout algorithm for automatic drawing of biochemical networks. Bioinformatics 21, 2036-2042.

  26. http://www.biopax.org/

  27. http://www.sbml.org/

  28. http://www.cellml.org/

  29. http://www.kegg.org/

  30. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. and Kanehisa, M. (1999). KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29-34.

  31. Nagasaki, M., Saito, A., Li, C., Jeong, E. and Miyano, S. (2008). Systematic reconstruction of TRANSPATH data into Cell System Markup Language. BMC Syst. Biol. 2, 53.

  32. Jeong, E., Nagasaki, M. and Miyano, S. (2007). Conversion from BioPAX to CSO for System Dynamics and Visualization of Biological Pathway. Genome Inform. 18, 225-236.

  33. http://www.csml.org/tools/sbml2csml/

  34. http://www.csml.org/tools/cellml2csml/

  35. Nagasaki, M., Doi, A., Matsuno, H. and Miyano, S. (2003). Recreating biopathway databases towards simulation. In: Computational Methods in Systems Biology, Miyano, S., Wolkenhauer, O., Degano, P., Danos, V., Lincoln, P. and Cho, K.-H. (eds.). Lecture Notes in Computer Science 2602, 168-169.

  36. Le Novere N., et al. (2009). The Systems Biology Graphical Notation. Nat. Biotechnol. 27, 735-741.

  37. Killcoyne, S., Carter, G. W., Smith, J. and Boyle, J. (2009). Cytoscape: a community-based framework for network modeling. Methods Mol. Biol. 563, 219-239.

  38. http://www.w3.org/TR/owl-features/

  39. http://www.reactome.org/

  40. Li, C., Nagasaki, M., Ueno, K. and Miyano, S. (2009). Simulation-based model checking approach to cell fate specification during Caenorhabditis elegans vulval development by hybrid functional Petri net with extension. BMC Syst. Biol. 3, 42.

  41. Jeong, E., Nagasaki, M. and Miyano, S. (2008). Rule-based reasoning for system dynamics in cell systems. Genome Inform. 20, 25-36.

  42. Hoch, F., Kerr, M. and Griffith, A. (2001). Software as a Service: Strategic Backgrounder, SIIA eBusiness Division, Software & Industry. http://www.siia.net/estore/pubs/SSB-01.pdf.

  43. Schacherer, F., Choi, C., Götze, U., Krull, M., Pistor, S. and Wingender, E. (2001). The TRANSPATH signal transduction database: a knowledge base on signal transduction networks, Bioinformatics 17, 1053-1057.

  44. http://pnuts.org/

  45. http://www.csml.org/models/csml-models/vulvaldev/

  46. http://java.sun.com/

  47. Nagasaki, M., Yamaguchi, R., Yoshida, R., Imoto, S., Doi, A., Tamada, Y., Matsuno, H., Miyano, S. and Higuchi, T. (2006). Genomic data assimilation for estimating hybrid functional Petri net from time-course gene expression data. Genome Inform. 17, 46-61.

  48. Tasaki S., Nagasaki M., Oyama M., Hata H., Ueno K., Yoshida R., Higuchi T., Sugano S. and Miyano S. (2006). Modeling and estimation of dynamic EGFR pathway by data assimilation approach using time series proteomic data. Genome Inform. 17, 226-238.

  49. Chen, D., Zebiak, S. E., Busalacchi, A. J. and Cane, M. A. (1995). An improved procedure for El Niño forecasting: Implications for predictability. Science 268, 1699-1702.

  50. http://www.w3.org/Graphics/SVG/

  51. http://www.inkscape.org/

  52. http://www.biomodels.net/

  53. http://genome.ib.sci.yamaguchi-u.ac.jp/~gon/

  54. http://www.jython.org/