| In Silico Biology 7, 0013 (2007); ©2007, Bioinformation Systems e.V. |
1 Institute for Advanced Biosciences, Keio University, Fujisawa, 252-8520, Japan
2 Bioinformatics Program, Graduate School of Media and Governance, Keio University, Fujisawa, 252-8520, Japan
3 Department of Environmental Information, Keio University, Fujisawa, 252-8520, Japan
* Corresponding author
Email: rsaito@sfc.keio.ac.jp
5322 Endo, Fujisawa, 252-8520 Kanagawa, Japan
Phone: +81-466-47-5099; Fax: +81-466-47-5099
Edited by E. Wingender; received November 28, 2006; revised January 20 and 26, 2007; accepted January 26, 2007; published February 15, 2007
Analysis and visualization of biological networks, such as protein-protein and protein-DNA interactions, are crucially important toward obtaining a thorough understanding of living systems. Here, we present an integrative software platform, eXpanda, which enables an analysis of a very broad range of biological networks, with a special focus on the extraction of characteristic topologies which potentially function as units in the networks. eXpanda is provided as a Perl library which gives full-automatic connections to various biological databases via a Perl programmable interface and can perform topological analysis based on graph theory. The results of these analyses are visualizable by vector graphics. eXpanda is under GNU General Public License. Software package, detailed documentations, source codes, and some sample scripts are downloadable at http://medcd.iab.keio.ac.jp/expanda/.
Keywords: interactome, network analysis, network visualization, Perl library, Scalable Vector Graphics (SVG)
In recent years, the entirety of cellular level bimolecular interactions have come to be referred to as the 'interactome', and a large number of interactome datasets have been generated by various methods such as the yeast two-hybrid assay [Shrivastava et al., 1993], in vitro virus selection [Amstutz et al., 2001] and in silico prediction [Pellegrini et al., 1999]. In order to extract biological knowledge from such intracellular interactomes, analyses and visualizations of biologically important sub-networks from these complicated networks are important. Because the unification and the analysis of these proliferated datasets are difficult, the demand for the development of software to integrate and analyze huge biological networks has been increasing, and some software has previously been developed based on a Graphical User Interface (GUI). Although such GUIs may be suitable for manual inspection of each object within a network, a programmable interface is necessary for the automated and efficient analyses of the interactome focused on topological aspects. eXpanda is an integrated software platform provided as a Perl library to analyze and visualize a tremendous amount of data from unified biological networks utilizing the Perl programming language, especially focusing on (1) the extraction and integration of biological information from various resources such as web-based databases, (2) topological analysis of network structures based on graph theory, and (3) versatile and aesthetic visualization of a network graph reflected in analytical components.
eXpanda is developed as a set of Perl modules. The set of methods in this library can be put into effect from the perl code written by a user, and thus the seamless access to eXpanda from the original codes written by Perl programming language is possible. The conceptual diagram of the library architecture is shown in Fig. 1.
|
Figure 1: The conceptual diagram of eXpanda. eXpanda can visualize graph network with the flexible combination of the various network databases and the topological analyses. |
The network initialization and editing components
Previously reported interactions stored in web-based databases are downloadable as a flat file via the ftp client of respective servers or is accessible through network APIs. However it is difficult to integrate them manually. By using the parsing modules of eXpanda, the interaction data are full-automatically imported to eXpanda from flat files and also through APIs of the respective databases, DIP [Xenarios et al., 2000], MINT [Zanzoni et al., 2002], IntAct [Hermjakob et al., 2004], HPRD [Peri et al., 2004], MIPS [Mewes et al., 2002], BioCyc [Keseler et al., 2002] and KEGG [Kanehisa and Goto, 2000]. The Simple Interaction File (SIF) format and the Graphical Markup Language (GML) file format, along with graphical attributes which the user creates using another application such as Cytoscape [Shannon et al., 2003], are acceptable and can also be imported as eXpanda data structures. After the operations of data initialization, user can edit or add further information to the data structures. The objects, such as nodes, edges or subgraphs, can be selected by the methods in the library so that the user can easily edit the information of the objects. For example, the node label names can be obtained and changed by DBGET [Fujibuchi et al., 1998] through KEGG API, and the PMID list of literature describing each protein node can be retrieved via the PIR [Barker et al., 1993] entry IDs.
The analysis components
In recent studies, network analyses using topological features have been suggested to be important [Vazquez et al., 2004] because they potentially reflect functional units in the network. In eXpanda, certain interaction scoring methods based on graph theory have been implemented, such as calculation of the degree of each node and also network motif extraction, including the extraction of cliques. A method for comparing multiple initialized datasets has also been implemented, and allows a reliability assessment of the network which the user constructs by integrating experimentally verified datasets. All of these methods will help user to characterize local topologies in the entire network and to assess their statistical significance.
The visualization components
eXpanda can save the constructed or edited network as files in various formats; an SIF file, a GML file which is importable to Cytoscape, and a Scalable Vector Graphics (SVG) file as a vector image. For each node, there are 13 graphical attributes such as size, opacity, shape and label font, and seven attributes for each edge. The scores calculated by the topological analysis modules can be reflected in these attributes, such as the thickness of lines and the gradation of colors. An example output of SVG format analyzing protein-protein interaction in Saccharomyces cerevisiae using eXpanda are shown in Fig. 2.
|
Figure 2: Protein-protein interaction network of S. cerevisiae edited and visualized by eXpanda. The respective five figures demonstrate the intermediate SVG outputs after the principal manipulation steps within a sample Perl script of eXpanda. For detail procedures and the source code, see the sample gallery of eXpanda web site (sample05.pl, http://medcd.iab.keio.ac.jp/expanda/sample.html). Step 1: GML format file of network data was obtained and generated using Cytoscape 2.3.1, and imported into eXpanda. Step 2: the default attributes of network graphics were modified; all objects were colored by blue. Step 3: The size, opacity and label size of each node was edited according to the degree calculated using eXpanda analysis methods. Step 4: Referring the KEGG pathway database automatically, the node colors of enzymatic proteins in the database were changed to red, and their respective gene names were labeled by converting the SGD entry IDs. Step 5: Cliques in the network were screened, and edges included in the motifs were colored by green. |
The authors are grateful to the member of MGSP at Institute for Advanced Biosciences, Keio University for helpful suggestions.