ISB Home



- Article -





Volume 2

Special Issue
GCB'01



Full article

In Silico Biology 2, 0022 (2002); ©2002, Bioinformation Systems e.V.  


Construction of stochastic context trees for genetic texts

Yury L. Orlov1, Vladimir P. Filippov1, Vladimir N. Potapov2, Nikolay A. Kolchanov1

1 Institute of Cytology and Genetics SB RAS, Acad. Lavrentiev ave., 10, Novosibirsk, 630090, Russia
2 Sobolev Institute of Mathematics SB RAS, Acad. Koptyug prospect, 4, Novosibirsk, 630090, Russia.
E-mail: orlov@bionet.nsc.ru, filippov@narod.ru, vpotapov@math.nsc.ru, kol@bionet.nsc.ru


Edited by E. Wingender; received December 10, 2001; revised and accepted March 25, 2002; published April 11, 2002


Abstract

A method has been developed for constructing a tree source model for genetic text generation. Model visualisation in the form of suffix (context) trees provides a new way of context analysis of symbol sequences. Estimation of the stochastic complexity of the data in the frame of the model serves as a criterion for the model's ascertainment. The model and complexity values are used for analysis of genetic texts. The software realisation of this algorithm enables to reveal statistical properties of genetic sequences based on an information measure. The program developed is available via Internet at http://wwwmgs.bionet.nsc.ru/mgs/programs/complexity/.

Key words: complexity, information measure, suffix tree visualisation, variable memory Markov model, genetic texts, statistical modelling