ISB Home



- Communication -





Volume 2

Special Issue
GCB'01



Full article

In Silico Biology 2, 0024 (2002); ©2002, Bioinformation Systems e.V.  



Computer system "Gene Discovery" for promoter structure analysis

Eugenii E. Vityaev1, Yury L. Orlov 2, Oleg V. Vishnevsky 2, Mikhail A. Pozdnyakov2 and Nikolay A. Kolchanov 2

1 Sobolev Institute of Mathematics SB RAS, Acad. Koptyug prospect, 4, Novosibirsk, 630090, Russia
E-mail: vityaev@math.nsc.ru
2Institute of Cytology and Genetics SB RAS, Acad. Lavrentiev ave., 10, Novosibirsk, 630090, Russia
E-mail: orlov@bionet.nsc.ru, oleg@bionet.nsc.ru, mike@bionet.nsc.ru, kol@bionet.nsc.ru


Edited by E. Wingender; received December 20, 2001; revised and accepted February 08, 2002; published March 15, 2002


Abstract

This paper presents implementation of Data Mining and Knowledge Discovery techniques for searching for regularities in tables of context features of DNA sequences involved in regulation of transcription. The goal is to discover regularities that relate nucleotide sequences to the functional classes of these sequences. The search patterns for regularities have been constructed in the first-order logic augmented by probabilistic estimates. To this aim, the PC software system "Gene Discovery" has been designed. This system accepts molecular-genetical data retrieved from a database by using SQL queries. Nucleotide sequences of promoters of several functional systems were extracted from the TRRD database (http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/) and analysed. The data include nucleotide sequences of erythroid-specific gene promoters, endocrine system gene promoters, promoter regions of the genes controlling cell cycle, promoter of genes regulating lipid metabolism, and muscle-specific gene promoters. Several regularities that relate the nucleotide sequences in the regulatory DNA and their location relative to the transcription start with each functional class have been found.

Keywords: Machine learning, knowledge discovery, data mining, bioinformatics, eukaryotic promoter recognition, transcription factors binding sites, oligonucleotide patterns