|
Computer system "Gene Discovery" for promoter structure analysisEugenii E. Vityaev1, Yury L. Orlov 2, Oleg V. Vishnevsky 2, Mikhail A. Pozdnyakov2 and Nikolay A. Kolchanov 2
1 Sobolev Institute of Mathematics SB RAS, Acad. Koptyug prospect, 4, Novosibirsk, 630090, Russia
Abstract This paper presents implementation of Data Mining and Knowledge Discovery techniques for searching for regularities in tables of context features of DNA sequences involved in regulation of transcription. The goal is to discover regularities that relate nucleotide sequences to the functional classes of these sequences. The search patterns for regularities have been constructed in the first-order logic augmented by probabilistic estimates. To this aim, the PC software system "Gene Discovery" has been designed. This system accepts molecular-genetical data retrieved from a database by using SQL queries. Nucleotide sequences of promoters of several functional systems were extracted from the TRRD database (http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/) and analysed. The data include nucleotide sequences of erythroid-specific gene promoters, endocrine system gene promoters, promoter regions of the genes controlling cell cycle, promoter of genes regulating lipid metabolism, and muscle-specific gene promoters. Several regularities that relate the nucleotide sequences in the regulatory DNA and their location relative to the transcription start with each functional class have been found. Keywords: Machine learning, knowledge discovery, data mining, bioinformatics, eukaryotic promoter recognition, transcription factors binding sites, oligonucleotide patterns
|