Methods for Predicting Target Sites of Transcription Factors
Akinori Sarai
RIKEN Institute (The Institute for Physical and Chemical Research),
3-1-1 Koyadai,
Tsukuba 305-0074 Japan
E-mail: sarai@rtc.riken.go.jp
Gene regulation in higher organisms is achieved by a complex system of transcription factors. Explosive amount of sequence information and identified transcription factors are presenting a great challenge to bioinformatics. Transcription factors usually bind to multiple target sequences and regulate multiple genes. Thus, the intrinsic specificity of transcription factors is rather low compared with prokaryotic DNA-binding proteins, probably because the synergistic action of multiple transcription factors on the same promoter may be the strategy for the complex regulation of gene expression. Therefore, finding potential target sites in a vast sequence space is a multiple-minimum problem, similar to the protein folding. Indeed, the problem is similar to the inverse folding problem of proteins, in which target sequences that fit to a given fold are sought. Thus, similar algorithm may be applied. The method for predicting the target sites may be classified according to whether it uses structural information or not. Here, I will consider the following four methods:
- Sequence-based method. Currently, this is the most commonly used method for the target prediction. It relies on sequence information obtained from known binding sequences. Usually, consensus sequence patterns or weight matrices are used to scan the database. The method is quite straightforward but its validity severely depends on the quality of the sequence information.
G-based method. This is based on experimental measurements of binding between protein and DNA. The binding-affinity data for systematic single-base mutations to consensus binding site can be used to derive matrices similar to the weight matrices in the sequence-based method. Because the data are based on physical interactions, it will be more reliable than the sequence-based weight matrices. However, it requires laborious experiments. The application of this method to particular transcription factors showed some success [1]. Nevertheless, it has limitations, because of the complexities in transcription factor system such as cooperativity.
- Structure-based method. This is based on the analysis of structural database of protein-DNA complex. We can derive empirical potential functions for the specific interactions between bases and amino acids from the statistical analysis [2]. Then these statistical potentials are used to evaluate the fitness of sequences to the complex structures of particular transcription factors by a combinatorial threading procedure similar to the fold recognition of protein structures. The accuracy of this method for the target prediction is still limited because of the limited numbers of available structural data. However, the power of this method is that we can examine the effects of DNA deformation, cooperativity and other structural effects on the specificity in a quantitative manner [2]. Also, increase in the structural data will make this method promising.
- Ab-initio method. This method does not rely on any experimental data, but it is based on computer simulations to derive contact potential between bases and amino acids. The computer simulations, which consider structural flexibility and interaction redundancy, would require intensive computation time. The interaction "free-energy maps" derived from the calculations for different pairs of base and amino acid have shown different specificity [3]. These data for all the combination of bases and amino acids can be eventually used for the prediction of target sequences.
In this presentation I will discuss each method and comparison among different methods in more detail.
REFERENCES
-
Deng, Q.-L., Ishii, S. and Sarai, A. (1996). Binding-Site Analysis of c-Myb: Screening of Potential Binding Sites by the Mutational Matrix Derived from Systematic Binding Affinity Measurements. Nucleic Acids Res. 24, 766-774 .
-
Kono, H. and Sarai, A. (1999). Structure-based prediction of DNA target sites by regulatory proteins Proteins 35, 114-131 .
-
Pichierri, F., Aida, M., Gromiha, M. and Sarai, A. (1999). Free-energy maps of base-amino acid interactions for protein-DNA recognition. J. Am. Chem. Soc. 121, 6152-6157 .