Large-scale collection and characterization of promoters of human and mouse genes
Yutaka Suzuki1*, Riu Yamashita1, Matsuyuki Shirota1, Yuta Sakakibara1,2, Joe Chiba2, Junko Mizushima-Sugano1, Alexander E. Kel3, Takahiro Arakawa4, Piero Carninci4,5, Jun Kawai4,5, Yoshihide Hayashizaki4, 5, Toshihisa Takagi1, Kenta Nakai1 and Sumio Sugano1
1 Human Genome Center, The Institute of Medical Science, The University of Tokyo: 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan;
We report the generation and initial characterization of a large-scale collection of sequences of putative promoter regions (PPRs) of human and mouse genes. Based on our unique collection of 400,225 and 580,209 human and mouse full-length cDNAs, we determined exact transcriptional start sites (TSSs). Using positional information of the TSSs, we could retrieve adjacent sequences as PPRs for 8,793 and 6,875 human and mouse genes, respectively. The positions of the PPRs were 4 kb upstream to previously reported 5'-ends of cDNAs on average, demonstrating that full-length cDNA information is indispensable for this purpose. Among those PPRs supported by experimentally validated TSSs, 3,324 could be paired as mutually homologous genes between human and mouse and were used for the comprehensive comparative studies. The sequence identities in the proximal regions of the TSSs were 45% on average, and 22,794 putative transcription factor binding sites that are conserved between human and mouse were identified. The data resource created in the present work and the results of the sequences' initial characterization should lay the firm foundation for deciphering the transcriptional modulations of human genes. All the data were deposited and made available through a database for comparative studies, DBTSS.
Key words: full-length cDNA, promoter, comparative genomics, transcriptional start sites