In Silico Biology 4, 0036 (2004); ©2004, Bioinformation Systems e.V.  


Large-scale collection and characterization of promoters of human and mouse genes

Yutaka Suzuki1*, Riu Yamashita1, Matsuyuki Shirota1, Yuta Sakakibara1,2, Joe Chiba2, Junko Mizushima-Sugano1, Alexander E. Kel3, Takahiro Arakawa4, Piero Carninci4,5, Jun Kawai4,5, Yoshihide Hayashizaki4, 5, Toshihisa Takagi1, Kenta Nakai1 and Sumio Sugano1




1 Human Genome Center, The Institute of Medical Science, The University of Tokyo: 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan;
2 Department of Biological Science and Technology, Science University of Tokyo, 2641 Yamazaki, Noda-shi, Chiba, 278-8510, Japan;
3 BIOBASE GmbH, Halchtersche Str. 33, D-38304 Wolfenbüttel, Germany;
4 Genome Science Laboratory, Discovery and Research Institute, RIKEN Wako Main Campus, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan;
5 Laboratory for Genome Exploration Research Group, RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan

*  Corresponding author; email: ysuzuki@ims.u-tokyo.ac.jp
   Phone: +81-3-5449 5343; Fax: +81-3-5449 5416





Edited by E. Wingender; received June 10, 2004; revised and accepted July 20, 2004; published July 23, 2004



Abstract

We report the generation and initial characterization of a large-scale collection of sequences of putative promoter regions (PPRs) of human and mouse genes. Based on our unique collection of 400,225 and 580,209 human and mouse full-length cDNAs, we determined exact transcriptional start sites (TSSs). Using positional information of the TSSs, we could retrieve adjacent sequences as PPRs for 8,793 and 6,875 human and mouse genes, respectively. The positions of the PPRs were 4 kb upstream to previously reported 5'-ends of cDNAs on average, demonstrating that full-length cDNA information is indispensable for this purpose. Among those PPRs supported by experimentally validated TSSs, 3,324 could be paired as mutually homologous genes between human and mouse and were used for the comprehensive comparative studies. The sequence identities in the proximal regions of the TSSs were 45% on average, and 22,794 putative transcription factor binding sites that are conserved between human and mouse were identified. The data resource created in the present work and the results of the sequences' initial characterization should lay the firm foundation for deciphering the transcriptional modulations of human genes. All the data were deposited and made available through a database for comparative studies, DBTSS.

Key words: full-length cDNA, promoter, comparative genomics, transcriptional start sites



Introduction

In order to understand the transcriptional network of human genes, it is essential to characterize their regulatory regions, which include regions called promoters. To this end, one of the challenges confronted by both experimental and bioinformatics researchers has been to decode what kind of functional sequence elements reside in which parts of the promoters and how they serve as modulators of transcription. A large number of regulatory proteins, which are collectively called transcription factors (TFs), have been identified and their sequence-specific binding to promoter elements has been shown to play the central role in regulation [Mitchell and Tjian, 1989; Novina and Roy, 1996]. As many of the TF binding sites are short (6-12 bp) and their consensus sequences are often degenerated, it was an intricate problem to discriminate the genuine TF binding sites, which have biological significance in vivo, from insignificant sequences, which occur randomly and very frequently in the large volume of human genomic sequences [Fickett and Wasserman, 2000].

Comparative study of human and other organisms' sequences, namely comparative genomics, is a powerful method to extract biologically meaningful information as to which parts of the genomic sequences are likely to have functional relevance. It is expected that the functionally important regions, such as exons and promoter elements, are evolutionally conserved and could be discriminated from non-conserved ones, which are supposed to be subject to fewer functional constraints [Hardison, 2000; Boguski, 2002]. The almost-complete sequencing of both human and mouse genomes [Lander et al., 2001; Venter et al., 2001; Waterston et al., 2002] provided us with the basic material with which to initiate large-scale comparative studies of the promoters. If the positional information of the transcriptional start sites of mRNAs (TSSs) were available, the promoter sequences could be identified by computational mapping of the TSS onto the genomic sequences, since in most cases, the promoters are located just proximal to or overlapping with the TSS. Once promoter sequences were retrieved, they could be subjected to further analyses for the presence of particular TF binding sites.

Transcriptional start sites correspond to the 5'-ends of the full-length cDNAs. Therefore, obtaining the TSS information is equivalent to obtaining the 5'-end information of the full-length cDNAs. However, it was often difficult to obtain the 5'-end sequences of the full-length cDNAs from public databases. For most of the cDNAs registered there, the exact TSSs had not been determined either by S1 mapping, primer extension or 5'RACE and their authentic 5'-ends remain uncharacterized. Since these cDNAs cannot be regarded as full-length cDNAs in a strict sense, it would be inappropriate to use their 5'-end information for promoter retrieval. Indeed, even for the cDNAs registered in one of the most reliable cDNA databases, RefSeq (http://www.ncbi.nlm.nih.gov/RefSeq/)[Suzuki et al., 2002; Pruitt et al., 2003], about half of the 5'-ends of RefSeq sequences should be extended towards the 5'-end according to our previous observation [Suzuki et al., 2002].

We have developed a method of constructing a full-length enriched cDNA library using a cap selection method, "oligo-capping", and have been collecting the full-length cDNAs [Suzuki and Sugano, 2003]. Based on the human genomic DNA and full-length cDNA data, we recently reported identification and computational characterization of human gene promoters on a large-scale [Suzuki et al., 2001]. The 5'-end one-pass sequences of 217,402 of the full-length cDNAs were mapped onto the human genomic sequences and adjacent promoter sequences were identified [Suzuki et al., 2002].

In the present study, we expanded the human full-length cDNA as well as applying a similar strategy to mouse data. For mouse cDNA data, we used full-length cDNA sequences, which were derived from cDNA libraries constructed by another cap selection method, the "cap trapper" method [Carninci and Hayashizaki, 1999; Kawai et al., 2001; Okazaki et al., 2002]. It is estimated that more than 80% of the cDNA clones isolated from the cDNA libraries constructed either by the oligo-capping method or by the cap-trapper method should represent full-length cDNAs [Carninci and Hayashizaki, 1999; Suzuki and Sugano, 2003]. Here we report generation and initial characterization of a large-scale dataset of promoter sequences and construction of a database, DBTSS, for comparative studies of promoters of human and mouse genes.



Materials and methods

Processing of the full-length cDNA sequences and Mapping of the TSSs on the Genomic Sequences

For human TSSs, each sequence produced by the oligo-capping method was first processed to trim its vector site and its low quality parts. We also used FANTOM 5'-end sequences from Genbank (acc. No. BB561685-BB667065, BB838020-BB873800) and our full-length cDNA data to determine mouse TSSs. They were compared with human or mouse RefSeq using BLASTN. If a sequence alignment displayed an identity greated than 95% and a e-value less than 1.0e-100, it was regarded as identical to the RefSeq sequence. Sequences that had multiple hits in RefSeq were discarded. Then, the exact positions of the TSSs on the human (build 31) or mouse (mm2) genomic sequences were determined (http://genome.ucsc.edu/downloads.html), using the sim4 program (http://pbil.univ-lyon1.fr/sim4.html) [Florea et al., 1998]. In order to identify precise TSS information, we removed all the entries that were not mapped on the human genome sequence from their first base. Where fluctuating TSSs were observed, the most frequently used TSSs were defined as representatives. If the "most frequent TSSs" were multiple, we defined the median of them as a representative.


Generation of the correlation table between human and mouse counterparts

In order to generate the relational table between human and mouse counterparts, human and mouse representative transcripts were compared with each other using BLAST with a cut-off e-value of 1.0e-100. For the datasets of representative human and mouse transcripts, RefSeq and RTPS (representative transcripts and protein sequences from the FANTOM project) were used, respectively. The generated pairs were further sorted by having at least one Ref-full or RTPS per pair. Where homology searches gave ambiguous results (with mutually multiple hits), they were excluded from the table, so that the obtained relational table consisted only of the gene pairs of reciprocal best match homologs.


Sequence comparison between promoter pairs of human and mouse genes

Sequences of the promoters were compared between human and mouse homologues. Sequences of the -1000 to +200 bp relative to the TSSs were used and sequence identity was calculated. For the sequence alignment, LALIGN was run with the default parameters. The sequence identities were averaged for the 1200 bp regions. The identity counts of the regions where no alignment was generated using LALIGN were scored as 0.


Search for putative TF binding sites

Putative TF binding sites were searched by using the position weight matrices (PWMs) from TRANSFACa Professional 7.1. Searching was done by Match [Kel et al., 2003], a weight matrix-based tool for searching putative transcription factor binding sites in DNA sequences. Match is closely interconnected and distributed together with the TRANSFAC database. Match applies two cut-offs for the score values of the matrix matches: core cut-off for the 5 core nucleotides and matrix similarity cut-off for the whole match. Match allows usage of different cut-offs for every matrix. We used several sets of cut-offs (so called matrix profiles) provided by TRANSFAC: 1) minFP, to minimize the false positive (over-prediction error) rate, 2) minSUM, to minimize the sum of both errors. For analyzing putative AP-1, NF-κB and NF-AT sites, the PWMs of V$AP1_01 for AP-1, V$NFKAPPAB_01 for NF-κB, and V$NFAT_Q6 for NF-AT in TRANSFAC were used with the core and matrix similarity cut-offs of (0.8, 0.93), (0.8, 0.92), (0.8, 0.97), respectively.


Availability of the Database

From the download site at DBTSS, major resources used for the database construction are available by FTP, including the flat files of the human/mouse one-pass sequences with Genbank accession numbers, retrieved promoter sequences and correlation tables of the promoters. The DBTSS and the data it displays are freely available for academic, nonprofit, and personal use.



Results

Collection and clustering of the human and mouse full-length cDNAs

In total, our database, DBTSS, now records 400,225 full-length cDNA sequences, including an additional 182,823 sequences compared to the previous version (Genbank accession numbers are BP192706-BP383670). This additional data should have improved not only the coverage of genes represented in DBTSS but also the overall reliability of the identified TSSs, since the probability should have greatly increased for a particular TSS being a correctly identified TSS when the redundancy of the supporting full-length cDNAs increased. These cDNAs are isolated from 137 kinds of full-length cDNA libraries, all of which are constructed using the "oligo-capping" method (further details on the library information including the completeness (whether they are full-length) of each of the libraries are presented at http://dbtss.hgc.jp/ in the "Statistics" section).

The "oligo-capped" cDNA sequences were first searched against RefSeqs (as of November 14, 2002) using BLAST [Altschul et al., 1990]. When hits were found, the 5'-ends of them were compared with those of RefSeqs. When the RefSeq was truncated, its 5'-end sequence was complemented to obtain a putative representative full-length cDNA, which we refer to as a Ref-full. As summarized in Table 1, we could generate 9,270 Ref-fulls based on RefSeqs and our full-length cDNA sequences (for further details, see Material and methods). Sequence data obtained by 6,042 Ref-fulls were extended towards the 5'-ends by 71.6 bp on average (see Table 2 and Figure 1A for the distribution of the differences). Some of the extended parts overlapped with the open reading frames (ORFs), thus, were also useful to revise the currently truncated N-terminal sequences of the deduced amino acid sequences in RefSeq. In some of the sequences, upstream ATGs and ORFs are embedded (for further discussion on this issue, refer to our recent papers [Suzuki et al., 2000; Yamashita et al., 2003].

Table 1: Statistics of the collected promoter data.
  Human Mouse
RefSeq and Ref-full
Ref-full (promoter retrieval successful) 8,793 (48%) 6,875 (53%)
Ref-full (total) 9,270 (51%) 7,524 (58%)
    Ref-full that extended RefSeq 6,042 (33%) 5,018 (38%)
    Ref-full that did not extend RefSeq 3,228 (18%) 2,506 (19%)
RefSeq that are not covered by Ref-full 8,944 (49%) 5,557 (42%)
RefSeq (total) 18,214 (100%) 13,081 (100%)
One-pass sequences and genome mapping
Hit to RefSeq (genome mapping successful) 190,964 (48%) 195,446 (34%)
Hit to RefSeq (genome mapping ambiguous) 36,267 (9%) 36,624 (6%)
No hit to RefSeq 172,994 (43%) 348,139 (60%)
One-pass (total) 400,225 (100%) 580,209 (100%)
Statistics of the number of promoters, the redundancies of the supporting full-length cDNAs and the differences between the public data are shown.


Table 2: Statistics of full-length cDNA sequences used for the retrieval.
  Number of registered genes (average redundancy) Average length difference from RefSeq (mRNA level) Average length difference from RefSeq (genomic level)
Human 8,793 (21.7) 71.6 4,396
Mouse 6,875 (28.4) 76.0 4,027
Human/mouse pairs 3,324 (25.2/38.0) 63.3/68.8 3,998/3,380
Statistics of the full-length cDNAs used for the database construction is shown.


The remaining 3,228 cDNAs in which the 5'-ends of the Ref-fulls were almost consistent with the RefSeq 5'-ends and were used to confirm that the RefSeqs had originally represented the full-length cDNAs. In the present study, we excluded the cDNAs that did not correspond to RefSeqs, as the one-pass sequences without the RefSeq supports are singletons in many cases. Among them a number of spurious cDNAs, such as cloning artifacts and other kinds of aberrant transcripts, might be included. Besides, our recent analyses suggested that sporadic transcription from non-genic region regions are inherent in human and mouse genomes (Sakakibara et al., in preparation). Since it was a concern that incorporating this part of the data could make the dataset confusing, we did not include it to the current dataset.

As for the mouse genes, full-length cDNAs were obtained from the FANTOM database (http://fantom.gsc.riken.go.jp/) and processed by a similar procedure as that used for the human cDNAs. Starting from 580,209 one-pass sequences of the 5'-ends of the full-length cDNAs, 7,524 Ref-fulls were obtained of which 5,018 extended pre-existing RefSeq sequences by 76.0 bp on average (Figure 1A).



Figure 1: Comparison between Ref-fulls and RefSeqs. The distributions of the differences between Ref-fulls and RefSeqs are presented, when compared at the mRNA level (A) and genomic level (B). Black and gray bars represent the cases for human and mouse genes, respectively.


Retrieval of promoter sequences based on Ref-fulls

The one-pass sequences of 190,964 and 195,446 human and mouse cDNAs corresponding to 8,793 and 6,875 Ref-fulls were precisely mapped onto the human and mouse genomes, using strict criteria described in Materials and Methods. Exact positional information of the TSSs could be determined on each of the genomes (Table 1). The average redundancy, that is, the number of full-length cDNAs supporting TSS of each of the genes, was 21.7 and 28.4, respectively. Although 1,980 human TSSs and 691 mouse TSSs were determined by single full-length cDNA data (singletons), the others were supported by multiple full-length cDNA data (Figure 2). As the average frequency of the full-length cDNAs in the cDNA libraries is more than 80%, the probability should be low that the truncated erroneous full-length cDNAs happened to be mapped closely so as to lead to misidentification of the promoters.



Figure 2: Distribution of the numbers of mapped full-length cDNAs. The distribution of the numbers of mapped TSSs is shown for human and mouse genes by black and gray bars, respectively.


The average distances between the 5'-ends of the RefSeqs and the Ref-fulls calculated at the genomic level were 4,396 bp and 4,027 bp for human and mouse, respectively (Figure 1B). In this dataset, 62% and 56% of the mapped TSSs of the human and mouse genes are located in the CpG islands, respectively. As large introns were observed just downstream of the exact TSSs in many cases, the distances between the 5'-ends of RefSeqs and Ref-fulls were much greater than those calculated at the mRNA level. In these cases it was impossible to identify real promoters based solely on the RefSeqs, even if the differences calculated at the mRNA level were small.

Relating the promoters of human and mouse gene counterparts

In order to compare the retrieved human and mouse PPR sequences with each other, we wished to relate the human genes to the mouse gene counterparts. We started from RefSeqs and the RTPS dataset, which are the representative sets from mouse created in FANTOM mouse full-length cDNA annotation meetings (for further details see the reference Okazaki et al., 2002). We compared their sequences both at the nucleotide and amino acid level so that all of the related gene pairs should be 1:1 reciprocal best hit homologs. In total, we could correlate 8,185 human and mouse genes in total.

Using the obtained relational table, we could define 3,324 human and mouse gene pairs among our PPR dataset, supported by 83,708 (redundancy: 25.2) human and 126,326 (redundancy: 38.0) mouse full-length cDNA data. Of these, 2,256 promoter pairs were supported by more than three full-length cDNA sequence data of both human and mouse (in total more than six cDNAs were mapped). The PPR pairs were aligned with each other using a sequence alignment program, LALIGN [Huang et al., 1992]. On average, the overall sequence conservation between the promoter pairs was 45%, when evaluated in the regions from -1000 to +200 (TSS was designated as 0) of the 2,256 dataset. The average length of the aligned upstream sequences was 510 bp. However, the size and patterns of the sequence alignment were quite different between promoters (for further details, refer to Suzuki et al. in preparation). Figure 3 represents the extent to which the sequences were conserved between PPR pairs.



Figure 3: Distribution of the sequence conservation between human and mouse promoters Sequence alignments were performed using LALIGN with the default parameters. The sequence identity was evaluated as the number of aligned nucleotides in the regions of -1000 to +200 (TSS: 0).


Construction of DBTSS for comparative studies

All of the created data is made publicly available through our newly developed database, DBTSS. The schematic of the user interface is illustrated in Figure 4 and details of the database description are presented in Supplementary Information.



Figure 4: Schematic of the user interface of DBTSS. The boxes that are marked with asterisks (A~G*) correspond to the respective forms illustrated in Supplementary Figures S1 and S2.


It should be noted that this version of DBTSS has implemented the search for the PPRs by putative TF-binding sites that are conserved between human and mouse genes. For this search, arbitrary combinations/positions of the putative TF-binding sites can be set. For example, it is possible to search "TATA-plus PPRs containing NF-κB binding site(s) and either NF-AT site(s) or AP-1 site(s), all of which are conserved between human and mouse within 500 bp of the TSSs" (this combination of the TFs is frequently observed in the promoters responsible for inflammatory responses) [Baeuerle and Baltimore, 1996; Ho and Glimcher, 2002; Praz et al., 2002].

Practically, when the PPRs were searched using TRANSFAC [Matys et al., 2003] with strict parameters (minFP64.prf; see also Supplementary Information), 183,712 and 170,926 hits were detected from human and mouse promoters, respectively, in total. However, we were concerned that these matches might include a lot of false positive hits. To decrease the number of false predictions we used the comparative PPR data following the assumption that among the detected putative TF binding sites evolutionary conserved ones may have functionally relevance. Consistently, confidential data elucidated that most functionally relevant TF binding sites are conserved throughout evolution (between 64-75%; Hannenhalli and Levy, 2002; Sauer and Wingender, in preparation). Using the promoter alignment data as a filter for selecting the conserved TF binding sites, DBTSS could pick up 22,794 putative TF binding sites in human promoters which are conserved between human and mouse. By doing this, it is possible to select the TF binding sites that should have first priority for experimental validation. Results of the search for representative TFs are presented in Supplementary Information Table.

We temporarily focused on evolutionarily conserved TF binding sites. Actually, we observed that about 85% of the predicted conserved-TF binding sites are located in the conserved regions of the promoters. However, this does not imply that non-conserved predicted TF binding sites always should have no functional relevance. Some of the TF binding sites which are not conserved between human and mouse might play roles in a species-specific manner. This always should be kept in mind whenever this kind of search is attempted.

The so-called "phylogenetic foot printing" approach is the most powerful when the combination(s) of the TF binding sites is taken into account as well. For example, when promoters containing putative binding sites of NF-κB were searched using the standard cut-offs (for further details see Material and Methods), 1,491 and 983 sites were detected in the human and mouse promoters, respectively. However, when the hits were restricted to the conserved ones, the number of hits decreased to 36. When a similar search was performed for promoters containing putative NF-AT or AP-1 binding sites, the numbers of hits were 7,368, 5,545 and 652 for human, mouse and conserved, respectively. When searching for promoters containing both of the conserved NF-κB and NF-AT/AP-1, the number of hits was 22. These should be primary targets for initiating the experimental characterization of promoters as to whether they really respond to an inflammatory stimulus [Kel et al., 2003].



Discussion

In this paper, we described the large-scale collection of initial comprehensive comparative analyses of promoters of human and mouse genes. The dataset generated in this study as well as the newly developed database are unique, based on the experimentally identified TSSs. Although there are a number of databases which enable genome-wide comparison between human and mouse genes, such as HGB at UCSC, Ensembl at EBI, Map Viewer at NCBI [Clamp et al., 2003; Karolchik et al., 2003; Wheeler et al., 2003], they are mainly focused on the global alignments of the genomes, and are intended for finding exonic regions rather than for the characterization of promoters. To our knowledge, rVISTA) and GALA are rare exceptions, mainly focusing on promoter comparison [Loots et al., 2002; Giardine et al., 2003; Ureta-Vidal et al., 2003]. However, in all of these pre-existing databases, most of the "5'-flanking regions" are not defined by experimentally determined TSSs; therefore, it has been difficult to distinguish which part should correspond to exons and which should be regarded as promoters, even if conserved regions were identified. Actually previous observations reported that the average sequence identity of the "upstream regions" of human and mouse genes was approximately 70-75% [Waterston et al., 2002], which is apparently higher than our calculation (45%; Figure 3). This may have been caused by the fact that they used upstream 200 bp regions. The degree of the sequence identity might be lower at more upstream regions. Consistently, a previous report indicated that frequency of the alignable sequences becomes lower relatively rapidly in the upstream regions [Jareborg et al., 1999]. Since we used the entire -1000 bp to +200 bp regions in the present study, the calculated sequence identity might be lower than the previous result. Further extensive analyses of the sequence alignments generated from various global/local alignment programs should reveal how the sequences in the upstream regions of the TSSs are conserved between human and mouse.

Taking advantage of the large-scale collection of the full-length cDNAs, we could focus on the limited regions of the genomic sequences for the analysis of promoters. Also, we could take into account the positions of the predicted TF-sites relative to the TSS for the search of the analysis of the putative TF-binding sites. Recent reports described that the TF-sites predicted kilo bases apart from the TSS should have less probability of having biological consequences [Liu et al., 2003]. In order to expedite the experimental analyses of the promoters by minimizing the false positives, the target regions that should be used for the primary searches have to be defined for each of the TFs. Implementing this feature, DBTSS should be the first database which makes the most use of the promoter data for the practical requirements of experimental biologists.

In this work we did not consider the distant regions from the TSSs, more than 1 kb upstream and 0.2 kb downstream, in order to maintain the fidelity of the search results. Therefore, the current dataset does not cover the TF binding sites located very far from the TSSs. However, these are the regions where the sequence conservations were most significant throughout the neighboring 10 kb (data not shown) and where the transcriptional initiation events actually take place. Thus, it should be important to start the characterization of the promoters by investigating the nature of these regions.

Genome sequencing and full-length cDNA sequencing projects are underway for various kinds of model organisms, such as chimpanzee, macaque and zebrafish as well as many other microbes (http://www.nih.gov/science/models/). The progress of these projects should shortly accumulate genomic sequences and a large number of full-length cDNA data, from which promoter sequences could be retrieved and analyzed in a similar manner as described here. Also, very recently, new technologies named the CAGE and the 5'TSS library were developed. Using these technologies, accumulation of the TSS data in even higher throughput manner will be enabled without degrading the data quality [Shiroki et al., 2003; Hashimoto et al., 2004]. These data should be presented in DBTSS, which enable further accurate and versatile analyses of the promoters. Comprehensive analyses of the conservation/divergence of the promoters between human and monkeys, mouse, fish, flies, worms and other model organisms should identify which populations of promoters and what kinds of promoter elements therein play the roles for modulating the transcriptional network for each of the organisms. These analyses should clarify what features of the transcriptional network of human genes allow human cells to function as the cells of a human, a primate, a mammal and a multi-cellular organism and so on. To this end, our data resource together with newly developed database, DBTSS, should for the first time lay the firm foundation for this, as well as providing an invaluable platform for genome-wide comparative analysis of promoters. From this, the achievements of the genome projects, which would otherwise be no more than a meaningless DNA sequence, should truly come alive.



Acknowledgements

We are grateful to H. Hata and members of the HGC sequencing team for their excellent sequencing work. We are thankful to E. Nakajima and C. Gough for helpful discussion and critical reading of the manuscript. This study was supported by a Grant-in-Aid for Scientific Research on Priority Areas and by special coordination funds for promoting science and technology (SCF), both from the Ministry of Education, Culture, Sports, Science and Technology in Japan. This study was also supported by Research Grant for the RIKEN Genome Exploration Research Project from the Ministry of Education, Culture, Sports, Science and Technology of the Japanese Government to Y.H.




References




Footnotes:
a TRANSFAC is a registered trademark, Match is a trademark of BIOBASE GmbH, Germany