ISB Home

- Article -

Volume 6

Full article

In Silico Biology 6, 0003 (2006); ©2005, Bioinformation Systems e.V.  

In silico discrimination of single nucleotide polymorphisms and pathological mutations in human gene promoter regions by means of local DNA sequence context and regularity

Imtiaz A. Khan1, Matthew Mort2, Paul R. Buckland3, Michael C. O'Donovan3, David N. Cooper2 and Nadia A. Chuzhanova1,2*

1 Biostatistics and Bioinformatics Unit, Cardiff University, Cardiff, CF14 4XN, UK
2 Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK
3 Department of Psychological Medicine, Cardiff University, Heath Park, Cardiff, CF14 4XN, UK

* Corresponding author
   School of Computer Science, Cardiff University, 5 The Parade, Cardiff CF24 3AA, UK
   Email:; Phone: +44-2920-879090

Edited by E. Wingender; received October 24, 2005; revised December 16, 2005; accepted December 18, 2005; published December 27, 2005


DNA sequence features were sought that could be used for the in silico ascertainment of the likely functional consequences of single nucleotide changes in human gene promoter regions. To identify relevant features of the local DNA sequence context, we transformed into consensus tables the nucleotide composition of sequences flanking 101 promoter SNPs of type CT or AG, defined empirically as being either 'functional' or 'non-functional' on the basis of a standardised reporter gene assay. The similarity of a given sequence to these consensus tables was then measured by means of the Shapiro-Senapathy score. A decision rule with the potential to discriminate between empirically ascertained functional and non-functional SNPs was proposed that potentiated discrimination between functional and non-functional SNPs with a sensitivity of 80% and a specificity of 20%. Two further datasets (viz. disease-associated SNPs of types AG and CT (N = 75) and pathological promoter mutations (transitions, N = 114)) were retrieved from the Human Gene Mutation Database (HGMD; and analyzed using consensus tables derived from the functional and non-functional promoter SNPs; ~70% were correctly recognized as being of probable functional significance. Complexity analysis was also used to quantify the regularity of the local DNA sequence environment. Functional SNPs/mutations of type CT were found to occur in DNA regions characterized by lower average sequence complexity as measured with respect to symmetric elements; complexity values increased gradually from functional SNPs and pathological mutations to functional disease-associated SNPs and non-functional SNPs. This may reflect the internal axial symmetry that frequently characterizes transcription factor binding sites.

Keywords: promoter, polymorphism, mutation, functional significance, discriminant analysis, DNA sequence complexity