Suzuki et al., In Silico Biol. 4, 0036 (2004), Supplement
Search database by simple queries
As shown in Supplementary Information Figure S1A, fields for querying appear on the top page of the DBTSS. In order to retrieve the TSSs/promoters of the gene of interest, users can use simple queries such as RefSeq IDs (NMs), Ensembl IDs (ENSTs), gene definitions and so on for either human or mouse genes. Alternatively users can submit a sequence for the BLAST search. For the human genes, users can search for the gene within a particular distance from the SNPs of interest, which should be a useful way for the identification of functional SNP candidates located in promoter regions (regulatory SNPs; rSNPs; for further information on this issue, see Brookes (1999) Gene 234, 177-186; Ponomarenko et al. (2002) Hum. Mutat. 20, 239-248).
![]() |
| Figure S1: Screen shots form the search (A-C) pages of DBTSS. Forms that are used for search by queries (A), by putative TF binding sites and their combinations (B), and from the gene list (C) are illustrated. As exemplified in (B), the fields to specify the TF binding sites are represented by red, yellow and blue boxes (Factor 1-3). Using these boxes, users can retrieve target promoters that include all of the Factors 1, 2 and 3. For each of the boxes, users can choose the search method between exact sequence match and matrix search using PWMs. As exemplified in the case of "Factor 3", users can also specify TF binding sites, either of which should be contained in the targets, by creating additional boxes. This search can be done for either human or mouse promoters individually or for the promoter elements conserved between human and mouse |
| (High resolution figure: 193 KB) |
Clicking on the "search" will bring up two graphical views of the result as illustrated in Supplementary Information Figure S2D. The search results will bring up as a genomic view of the gene, separated in two panels. In the first panel, the overview of the genomic organization of the gene hit is displayed in terms of the exon-intron structure of the corresponding RefSeq, Ensembls and mapped full-length cDNAs. The annotated positions of the protein coding regions are also illustrated. In order to simplify the view, the items which should be displayed can be selected in the "viewer controller". In the second panel, the exact sequence around the TSSs is displayed. TF binding sites, if there are any, characterized by previous experiments and SNPs registered in public databases are displayed according to the information recorded in TRANSFAC Public (ver. 6.0) and dbSNP [Sherry et al. (2001) Nucleic Acids Res. 29, 308-311], respectively. Also, the promoter sequence of the arbitrary length from the arbitrarily designated standard point can be retrieved as a text in this viewer.
For the comparative analysis of the promoters, users can enter the "comparative view of the promoters" page from either human or mouse promoter viewers described above. Whenever information on a mouse/human counterpart is available, the "Go mouse/human counterpart" button appears in the upper left corner of the first panel. Alternatively, users directly enter the comparative promoter viewer of the genes of interest by specifying the human and mouse gene pairs from the correlation tables (Supplementary Information Figure S1C).
Comparative viewer of the promoters
The results of the sequence comparison of the promoters between human and mouse counterparts can be browsed as shown in Supplementary Information Figure S2D. In this page, the sequence alignment calculated using LALIGN is displayed. The positions of the aligned sequences are represented by boxes and each of the corresponding nucleotides is connected by lines. The TSSs identified by the full-length cDNAs and the 5'-ends of the RefSeqs are represented by red and blue arrows on the human and mouse promoters, respectively. In the lower panel, the sequence match is displayed and the TSSs identified by the full-length cDNAs and the 5'-ends of RefSeqs are marked on the nucleotides. Also, users can dynamically change the standard positions of the alignment by specifying the TSSs of the users' choice. A default alignment is provided using LALIGN, which is a local alignment program, but this can be switched to ClustalW, which is designed for global alignment [Thompson et al. (1994) Nucleic Acids Res. 22, 4673-4680].
![]() |
| Figure S2: Screen shots form the result (A-D) pages of DBTSS. Results of the search will be displayed as the genomic viewer (A) and the sequence viewer (B). Examples from the comparative viewer of the promoters are also displayed in (C) and (D). |
| (High resolution figure: 187 KB) |
Search database by putative TF binding sites and their combinations
The most important feature added to this version of DBTSS is the engine which enables the search for the promoters by putative TF-binding sites. The combination of this implementation and the comparative promoter data should provide experimental biologists with the most practical and powerful usage for the promoter analysis. Users can search the promoters with the search keys like "promoters containing a putative TF binding site(s) of particular kinds, which is conserved between human and mouse". In order to narrow down the targets, users can perform combinatorial searches of the TF sites.
For this search, users can create arbitrary number/combinations of the search field for putative TF-binding sites. For each of the position weight matrices (PWMs), which define the consensus sequence of the TF binding sites [Matys et al. (2003) Nucleic Acids Res. 31, 374-378], users can specify arbitrary cut-offs, target regions and strand of the search (default parameters are set as "minSUM64.prf", which is documented for the Match tool of TRANSFAC in TRANSFAC as to minimize both false negatives and false positives). Users can also choose the exact sequence match for the query instead of the PWMs, so that the users can search target sites of newly discovered sites of the TFs or consensus sequences for which pre-existing PWMs are less reliable.
The results of the search by putative TF binding sites can be browsed as exemplified in Supplementary Information Figure S1B. Since in most cases, the hits are expected to be multiple, users can overview the results in a list, in which the hits are shown by their gene names. When users choose the hit of interest from the list, the viewer of the exact sequences around the TSSs and the positions of predicted TF-binding sites appear. When the search has been performed with "human and mouse conserved" option, the "Go mouse/human comparison" button appears next to the information table of the predicted TF binding sites. From this, users can refer to the comparative promoter viewer to directly examine the conservation of the predicted TF binding sites between human and mouse at the sequence level.
| Table S1: List of the predicted TF binding sites. |
| Matrix | TF definition | Matrix sim. | Core sim. | Human | Mouse | Conserved |
| V$AFP1_Q6 | AFP1 | 1 | 0.947 | 35 | 25 | 0 |
| V$AHR_01 | AhR | 1 | 0.958 | 2 | 0 | 0 |
| V$AMEF2_Q6 | aMEF-2 | 1 | 0.928 | 44 | 36 | 0 |
| V$AML_Q6 | AML | 1 | 1 | 405 | 386 | 23 |
| V$AP1_C | AP-1 | 0.989 | 0.991 | 678 | 638 | 64 |
| V$AP4_01 | AP-4 | 1 | 0.954 | 47 | 39 | 1 |
| V$AR_Q2 | AR | 1 | 0.955 | 7 | 7 | 0 |
| V$ATF_B | ATF | 1 | 0.985 | 447 | 327 | 82 |
| V$BACH2_01 | Bach2 | 1 | 0.987 | 97 | 44 | 0 |
| V$CDP_01 | CDP | 0.829 | 0.832 | 55 | 35 | 0 |
| V$CDX2_Q5 | Cdx-2 | 1 | 0.982 | 17 | 10 | 0 |
| V$COUP_01 | COUP-TF / HNF-4 | 0.988 | 0.964 | 30 | 34 | 2 |
| V$COUP_DR1_Q6 | COUP direct repeat 1 | 1 | 0.951 | 33 | 43 | 2 |
| V$CP2_01 | CP2 | 0.987 | 0.992 | 125 | 95 | 1 |
| V$CRX_Q4 | Crx | 1 | 0.961 | 3480 | 1415 | 368 |
| V$E2F_01 | E2F | 1 | 0.884 | 84 | 45 | 4 |
| V$E4F1_Q6 | E4F1 | 1 | 0.985 | 64 | 48 | 4 |
| V$EGR1_01 | Egr-1 | 0.885 | 0.854 | 111 | 74 | 2 |
| V$ER_Q6 | ER | 1 | 0.963 | 120 | 103 | 2 |
| V$FXR_Q3 | FXR | 1 | 0.971 | 1 | 2 | 0 |
| V$GRE_C | GR | 1 | 0.87 | 86 | 81 | 0 |
| V$HFH4_01 | HFH-4 | 1 | 0.906 | 170 | 185 | 3 |
| V$HIF1_Q5 | HIF-1 | 1 | 0.968 | 191 | 106 | 14 |
| V$HNF1_01 | HNF-1 | 1 | 0.935 | 125 | 80 | 9 |
| V$HNF3ALPHA_Q6 | HNF-3alpha | 0.972 | 0.962 | 1473 | 1982 | 113 |
| V$HNF6_Q6 | HNF-6 | 1 | 0.991 | 54 | 35 | 2 |
| V$HOX13_01 | Hox-1.3 | 1 | 0.924 | 5 | 2 | 0 |
| V$HP1SITEFACTOR_Q6 | HP1 site factor | 0.941 | 0.944 | 55 | 39 | 2 |
| V$HSF_Q6 | HSF | 1 | 0.986 | 3 | 5 | 1 |
| V$IPF1_Q4 | IPF1 | 1 | 0.965 | 218 | 176 | 3 |
| V$IRF7_01 | IRF-7 | 0.976 | 0.958 | 181 | 112 | 14 |
| V$ISRE_01 | ISRE | 1 | 0.988 | 5 | 1 | 0 |
| V$LEF1_Q6 | LEF-1 | 1 | 0.95 | 695 | 717 | 53 |
| V$LHX3_01 | Lhx3 | 1 | 0.979 | 318 | 236 | 5 |
| V$LXR_Q3 | LXR | 1 | 0.902 | 44 | 26 | 0 |
| V$MAF_Q6 | MAF | 1 | 0.977 | 2 | 0 | 0 |
| V$MTF1_Q4 | MTF-1 | 1 | 0.961 | 40 | 21 | 1 |
| V$MYCMAX_B | c-Myc/Max | 1 | 0.966 | 990 | 772 | 114 |
| V$MYOD_01 | MyoD | 1 | 0.979 | 163 | 107 | 12 |
| V$NF1_Q6 | NF-1 | 1 | 0.986 | 483 | 468 | 32 |
| V$NFE2_01 | NF-E2 | 1 | 1 | 22 | 32 | 2 |
| V$NFKB_C | NF-kappaB | 0.973 | 0.972 | 102 | 70 | 7 |
| V$NFMUE1_Q6 | NF-muE1 | 1 | 1 | 87 | 48 | 8 |
| V$NFY_Q6 | NF-Y | 1 | 0.978 | 323 | 336 | 28 |
| V$NRF1_Q6 | Nrf-1 | 1 | 0.991 | 411 | 276 | 49 |
| V$OCT1_Q6 | Oct-1 | 1 | 0.994 | 14 | 22 | 1 |
| V$OLF1_01 | Olf-1 | 0.99 | 0.963 | 9 | 7 | 0 |
| V$P53_01 | p53 | 0.664 | 0.753 | 0 | 0 | 0 |
| V$PITX2_Q2 | PITX2 | 1 | 0.976 | 2588 | 736 | 187 |
| V$POU1F1_Q6 | POU1F1 | 0.985 | 0.979 | 329 | 202 | 10 |
| V$PPARA_01 | PPARalpha/RXR-alpha | 0.899 | 0.863 | 6 | 4 | 0 |
| V$PTF1BETA_Q6 | PTF1-beta | 1 | 0.991 | 9 | 8 | 0 |
| V$RORA1_01 | RORalpha1 | 1 | 0.969 | 502 | 614 | 29 |
| V$SF1_Q6 | SF-1 | 1 | 1 | 604 | 495 | 48 |
| V$SMAD4_Q6 | SMAD-4 | 0.99 | 0.917 | 317 | 256 | 10 |
| V$SP1_Q6 | Sp1 | 1 | 0.969 | 5937 | 3316 | 1486 |
| V$SREBP1_02 | SREBP-1 | 1 | 0.992 | 20 | 11 | 0 |
| V$SRF_Q6 | SRF | 0.99 | 0.983 | 36 | 40 | 3 |
| V$STAT_01 | STATx | 1 | 0.976 | 515 | 421 | 40 |
| V$TEF_Q6 | TEF | 1 | 0.881 | 640 | 448 | 25 |
| V$TEL2_Q6 | Tel-2 | 1 | 1 | 83 | 62 | 5 |
| V$TFIIA_Q6 | TFIIA | 0.961 | 0.959 | 252 | 251 | 9 |
| V$TFIII_Q6 | TFII-I | 1 | 1 | 2225 | 1627 | 246 |
| V$USF_Q6 | USF | 1 | 0.959 | 1127 | 780 | 239 |
| V$YY1_02 | YY1 | 1 | 0.94 | 174 | 117 | 18 |
| Using the created dataset of the promoters, putative TF binding sites were searched by MATCH with the corresponding cut-offs (the third and the fourth columns). The numbers of hits detected in human and mouse promoters are show in the fifth and the sixth columns. The numbers of hits detected in both human and corresponding mouse promoters are shown in the seventh column. For the cut-offs, we used very strict values. The statistical significance of the cut-off for each matrix is described in Kel et al. (2003) Nucleic Acids Res. 31, 3576-3579. |