| In Silico Biology 6, 0015 (2006); ©2006, Bioinformation Systems e.V. |
1 Distributed Information Sub-centre
2 Interdisciplinary Biotechnology Unit Aligarh Muslim University, Aligarh 202002, India
3 International Center for Genetic Engineering and Biotechnology, Aruna Asaf Ali Marg, New Delhi 110 067, India
* Corresponding author
Email: huzzi99@hotmail.com
Phone: +91-571-2723088; Fax: +91-571-2721776
Edited by E. Wingender; received January 02, 2006; revised February 25 and March 03, 2006; accepted March 03, 2006; published June 17, 2006
The avian influenza (bird flu) is an infectious disease of birds, ranging from a mild to a severe form of illness. Influenza viruses pose significant challenges to both human and animal health. The proteins, nucleoprotein (NP), neuraminidase (NA) and hemagglutinin (HA) of influenza A virus (Bird flu virus) sub-type A/Hatay/2004/(H5N1) from chicken were selected for this study. Our in silico analysis predicted that HA of influenza A virus is highly sensitive to mutations and hence it is significant for its pathogenic nature. None of the mutations was detected as an important change except in NA where K332R was at a PKC phosphorylation site. Analysis of the sequence comparison showed that the maximum number of mutations were observed in HA. These mutations are significant as they are involved in change in polarity or hydrophobicity as well as in propensity of each amino acid residue to stabilize the secondary structure. The program MAPMUTATION can be used to monitor the mutations, and predict the trend of mutations.
Keywords: nucleoprotein, neuraminidase, hemagglutinin, mutation, protein kinase C phosphorylation
Influenza is peculiar among viral diseases that may infect repeatedly the same individual in spite of the immunity conferred by each infection. This is due to the high mutation rate of the genome of influenza virus, which allows it to escape the immune response of the host population [Tria et al., 2005]. Influenza viruses pose significant challenges to both human and animal health. "Avian influenza" or "Bird flu" is an infectious disease of birds, ranging from mild to severe form of illness. 15 subtypes of the influenza A virus are found in nature. Viruses of low pathogenicity of virulence can be changed to make highly pathogenic viruses while circulating in the poultry population due to high rate of mutation [Padhi et al., 2004].
Influenza virus belongs to the viral family of Orthomyxoviridae having a segmented, negative, single-stranded, sense RNA stranded genome in an enveloped virion [Smith et al., 1933]. Influenza A viruses cause epidemics and pandemics of influenza in mammals, birds while aquatic birds are known to be the natural reservoir of these viruses [Slemons et al., 1974; Webster et al., 1978; Hinshaw et al., 1980]. These viral particles are highly pleiomorphic, most of them are spherical/ovoid, 80-120 nm diameter, but many forms occur, including long filamentous particles (up to 2000 nm long × 80-120 nm diameter). The outer surface of the particle consists of a lipid envelope from which project prominent glycoprotein spikes of two types: haemagglutinin (HA), a 135 Å trimer and neuraminidase (NA), a 60 Å tetramer. The inner side of the envelope is lined by the matrix protein (Microbiology @ Leicester website, http://www-micro.msb.le.ac.uk/3035/Orthomyxoviruses.html#ortho).
The H5N1 sub-type is the only highly pathogenic avian viral sub-type that has been documented to cause an outbreak of respiratory disease in humans. An earlier study on humans exposed to chickens infected with a H5N2 virus failed to find any evidence on human infection [Bean et al., 1985]. The pathogenicity of avian H5N1 influenza viruses to mammals has been evolving since the mid-1980s [Chen et al., 2004]. During the past 6 years, infection of humans with avian influenza viruses of three subtypes (H5, H7, and H9) has been detected on multiple occasions [Subbarao and Katz, 2000; Webby and Webster, 2003]. In 1997, H5N1 avian influenza viruses transmitted from birds to humans in Hong Kong caused the deaths of 6 of 18 infected persons [Claas et al., 1998; Subbarao et al., 1998]. The virus was eradicated by the slaughter of all poultry in Hong Kong, but new genotypes of H5N1 virus continued to emerge in the poultry of Hong Kong in 2000 and 2001 [Guan et al., 2002; Webster et al., 2002]. Moreover, in 2003, antigenically and biologically novel H5N1 sub-type of influenza virus killed one of two infected humans [Sturm-Ramirez et al., 2004]. This outbreak was unique in the sense that the virus transmitted to humans is lethal in chickens [Lu et al., 1999; Katz et al., 2000]. An outbreak of highly pathogenic avian influenza A (H5N1) has recently spread to poultry in 9 Asian countries. H5N1 infections have caused >52 human deaths in Vietnam, Thailand, and Cambodia from January 2004 to April 2005 [The World Health Organization Global Influenza Program Surveillance Network, 2005]. There is no evidence of human to human transmission till date [MacKenzie, 2004]. Fortunately, these viruses lack the ability to 'hop' easily between people. However, in the future, this strain might acquire ability to spread infection among human, either by mutation or by recombination of genetic material with a human influenza virus [Pearson and Cyranoski , 2004].
In view of these outbreaks that occurred due to mutations or changes in genome of Influenza virus we initiated our in silico study to analyze the three genes NP, NA and HA for the amino acid replacements at different positions compared to the other strains of the same sub-type. The great genetic variability in influenza A virus lead to the difficulties in diagnosis, treatment, and prevention of influenza in humans. Therefore, it is significant to analyze these proteins for their mutation and their phylogenetic analysis comparing with other strains of influenza virus.
The genes for nucleoprotein, neuraminidase and hemagglutinin of influenza A virus (bird flu virus) sub-type A/Hatay/2004/(H5N1) from chicken were analyzed. These protein sequences are available at NCBI (http://www.ncbi.nlm.nih.gov/) with accession numbers [GenBank: CAI29280] for nucleoprotein (NP), [GenBank: CAI29279] for neuraminidase (NA) and [GenBank: CAI29278] for hemagglutinin (HA).
Protein protein Blast (Blast NCBI Server, http://www.ncbi.nlm.nih.gov/BLAST/) was performed at NCBI server and only 99 - 100% similar sequences of the sub-type H5N1 were selected for the analysis of respective genes (nucleoprotein, neuraminidase and heamagglutinin). The sequences of the following subtypes were selected for each protein in this study.
These sequences were then aligned by using ClustalW (1.83) (http://align.genome.jp/) multiple alignment tool available at GenomeNet service by Kyoto University Bioinformatics Center using Weight Matrix BLOSUM for proteins. These alignments were then analyzed for differences in their amino acid at specific positions. A perl program MAPMUTATION was developed to point out mutations in different strains [Anwar and Khan, 2006]. Hydrophobicity values were obtained from the tool ProtScale (http://www.expasy.org/tools/protscale.html) at ExPASy choosing Kyte & Doolittle hydrophobicity scale. Phylogenetic trees were generated using N-J Tree method at GenomeNet service by Kyoto University Bioinformatics Center. Secondary structures of the proteins were predicted using secondary structure prediction program NNPREDICT (http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html). Domains or motifs among all the three protein sequences of A/Hatay/2004/(H5N1) were searched using ScanProsite at Expasy Server (http://ca.expasy.org/tools/scanprosite/). Motifs with high probability of occurrence were also included in the search.
The studies to be described in the present communication were intended to study the mutations in the three proteins NP, NA and HA of Influenza A virus from chicken.
Sequence analysis
On comparing our sequence of NP after multiple alignment with the selected sequences of the same subtype having 99-100% similarity, it was found that at position 3 of the sequence the serine was replaced by proline (S3P) in our sequence of A/Hatay/2004/(H5N1). Proline is an uncharged amino acid while serine is a hydrophilic, polar amino acid. Arrese and Portela, 1996, demonstrated that serine 3 is essential for phosphorylation at the N-terminal end of the NP molecule in influenza virus A/Victoria/3/75. Then we find that at position 320 glutamic acid that is a hydrophilic, acidic amino acid was replaced by hydrophobic glycine being a smallest amino acid it fits into tight places inside a folded protein on the other hand glutamic acid helps in the formation of α-helix whereas glycine disrupts α-helix formation. At position 482 asparagine was replaced by serine, here both are hydrophilic polar amino acid acids (Tab. 1). McCullers et al., 2005, demonstrated that a mutation in the C-terminal domain of the M1 protein, a change from asparagine to serine at position 221, improved growth and virulence of B/Yamanashi/166/98 virus in mice, demonstrating that the serine 221 mutation was responsible for this deadly trait.
| Table 1: | Amino acid mutations in nucleoprotein, neuraminidase and hemagglutinin at specific position and their properties. |
| Base position | Hatay/2004 | Change in properties | Sec. structure | Hydrophobicity (Kyte & Doolittle) |
| Nucleoprotein | ||||
| 3 | S P | Hydrophilic, Polar Uncharged | Coil | -0.222 -0.311 |
| 320 | E G | Hydrophilic, Acidic Hydrophobic | Coil | -1.678 -1.333 |
| 482 | N S | Hydrophilic, Polar Hydrophilic, Polar | Coil | -1.256 -0.956 |
| Neuraminidase | ||||
| 145 | E V | Hydrophilic, Acidic Hydrophobic | Coil | -0.111 0.744 |
| 332 | K R | Hydrophilic, Basic Hydrophilic, Basic | Strand | -0.567 -0.633 |
| Hemagglutinin | ||||
| 8 | F L | Hydrophobic Hydrophobic | Helix | 3.2 3.311 |
| 59 | D A | Hydrophilic, Acidic Hydrophobic | Coil | 0.289 0.878 |
| 204 | T I | Hydrophilic, Polar Hydrophobic | Helix | -1.000 -0.422 |
| 272 | I L | Hydrophobic Hydrophobic | Helix | -0.467 -0.544 |
| 500 | N S | Hydrophilic, Polar Hydrophilic, Polar | Coil | -1.556 -1.256 |
In NA glutamic acid (hydrophilic) was replaced by valine (hydrophobic) at position 145. At position 332 lysine (hydrophilic, basic) was replaced by arginine (hydrophilic, basic).
In HA the mutations were found at position 8 where phenylalanine (hydrophobic) was replaced by leucine (hydrophobic), at position 59 aspartic acid (hydrophilic, acidic) was replaced by alanine (hydrophobic), at position 204 threonine (hydrophilic, polar) was replaced by isoleucine (hydrophobic), at position 272 isoleucine (hydrophobic) was replaced by leucine (hydrophobic) and at position 500 asparagine (hydrophilic, polar) was replaced by serine (hydrophilic, polar) (Tab. 1). The mutations at positions 59 and 204 may be important in the sense that here hydrophilic amino acids are being replaced by hydrophobic amino acids, which may help the protein to attain more stable conformation, while at positions 8, 272 and 500 there is substitution of the same type of amino acid.
The program MAPMUTATION reports all the mutations along with their specific position in the sequence. The results of MAPMUTATION were checked with the results produced by manually analyzing mutations after multiple alignment, the results were same in both the cases, thus, it was confirmed that the program MAPMUTATION produces accurate results. This program will be greatly helpful to bird flu researchers who are interested in finding out the rate of mutation in Influenza viruses, as the virus continuously undergoes antigenic shift and antigenic drift.
Secondary Structure Prediction
Through the analysis of the predicted structure for NP we noticed that there was no significant difference in the predicted regions in all the sequences except in strain HK/213/03 the region from 425 to 429 was predicted to be an extended strand while in all the other sequences the region 422 - 426 is a helix and the amino acids 427 - 428 exert extended strand propensity. In NA, of all the strains the region from 233 - 240 is a helical region while in the strain A/Dk/Vietnam/11/2004(H5N1) helix is present at the region 234 - 239. Secondary structure prediction of HA shows that in A/Hatay/2004/(H5N1) region 267 - 273 was found to be helix in A/Hatay/2004/(H5N1) while in the other two strains region 267 - 272 was detected as helix. Furthermore, at position 273 there is a strand propensity. The results of secondary structure prediction are given at http://www.geocities.com/amubioinfo/ InfluenzaAVirus.htm.
Domain/motif search
Different domains that were found in the 3 proteins are given in the Tables 2, 3 and 4. None of the mutations were found at these functional domains in NP (Tab. 2) while in NA, K332R was found at a predicted site of protein kinase C phosphorylation at positions 330 to 332 (SfR) (Prosite Documentation, http://ca.expasy.org/cgi-bin/nicedoc.pl?PDOC00005). Since the signature for PKC phosphorylation sites is [S or T] - X - [R or K], the mutation from K to R does not account for any difference in PKC phosphorylation at this site.
| Table 2: | Domains/motifs in nucleoprotein representing the site name, its position on the sequence and the sequence of the site. |
| Nucleoprotein | ||
| Site | Position | Domain |
| N-myristoylation site | 5 - 10 | GTkrSY |
| 177 - 182 | GAagAA | |
| 282 - 287 | GLavAS | |
| 356 - 361 | GQlsTR | |
| 362 - 367 | GVqiAS | |
| 393 - 398 | GGntNQ | |
| 431 - 436 | GNteGR | |
| Protein kinase C phosphorylation site | 6 - 8 | TkR |
| 359 - 361 | StR | |
| 450 - 452 | SaR | |
| Casein kinase II phosphorylation site | 15 - 18 | TggE |
| 50 - 53 | SdyE | |
| 69 - 72 | SafD | |
| 157 - 160 | TgmD | |
| 287 - 290 | SgyD | |
| N-glycosylation site | 21 - 24 | NATE |
| Tyrosine kinase phosphorylation site | 103 - 111 | KwvrElilY |
| Amidation site | 211 - 214 | nGRR |
| Tyrosine sulfation site | 282 - 296 | glavasgYdferegy |
| Table 3: | Domains/motifs in neuraminidase representing the site name, its position on the sequence and the sequence of the site. |
| Neuraminidase | ||
| Site | Position | Domain |
| N-myristoylation site | 27 - 32 | GNmiSI |
| 117 - 122 | GAllND | |
| 216 - 221 | GScfTV | |
| 311 - 316 | GTgsCG | |
| 336 - 341 | GVwiGR | |
| 420 - 425 | GSsiSF | |
| Protein kinase C phosphorylation site | 56 - 58 | TeK |
| 62 - 64 | SvK | |
| 85 - 87 | SiR | |
| 128 - 130 | TvK | |
| 195 - 197 | TiK | |
| 198 - 200 | SwR | |
| 232 - 234 | ShK | |
| 279 - 281 | SnR | |
| 330 - 332 | SfR | |
| 368 - 370 | SvK | |
| N-glycosylation site | 68 - 71 | NSSL |
| 126 - 129 | NGTV | |
| 215 - 218 | NGSC | |
| Casein kinase II phosphorylation site | 90 - 93 | SkgD |
| 105 - 108 | ShlE | |
| 128 - 131 | TvkD | |
| 152 - 155 | SrfE | |
| 176 - 179 | SgpD | |
| 349 - 352 | SgfE | |
| 361 - 364 | TetD | |
| 393 - 396 | TglD | |
| 436 - 439 | SwpD | |
| Tyrosine sulfation site | 255 - 269 | hyeecscYpdageit |
In case of HA, again, no mutation was reported in any functional domain (Tables 1 and 4).
| Table 4: | Domains/Motifs in hemagglutinin representing the site name, its position on the sequence and the sequence of the site. |
| Hemagglutinin | ||
| Site | Position | Domain |
| N-glycosylation site | 26 - 29 | NNST |
| 27 - 30 | NSTE | |
| 39 - 42 | NVTV | |
| 170 - 173 | NSTY | |
| 181 - 184 | NNTN | |
| 302 - 305 | NSSM | |
| 559 - 562 | NGSL | |
| Casein kinase II phosphorylation site | 34 - 37 | TimE |
| 139 - 142 | SshE | |
| 183 - 186 | TnqE | |
| 283 - 286 | SelE | |
| 314 - 317 | TigE | |
| 400 - 403 | SiiD | |
| 407 - 410 | TqfE | |
| N-myristoylation site | 79 - 84 | GNpmCD |
| 146 - 151 | GVssAC | |
| 288 - 293 | GNcnTK | |
| 299 - 304 | GAinSS | |
| 347 - 352 | GLfgAI | |
| 350 - 355 | GAiaGF | |
| 358 - 363 | GGwqGM | |
| 362 - 367 | GMvdGW | |
| 377 - 382 | GSgyAA | |
| 560 - 565 | GSlqCR | |
| ATP/GTP-binding site motif A (P-loop) | 150 - 157 | AcpyqGKS |
| cAMP- and cGMP-dependent protein kinase phosphorylation site | 168 - 171 | KKnS |
| Protein kinase C phosphorylation site | 175 - 177 | TiK |
| 239 - 241 | SgR | |
| 324 - 326 | SnR | |
| 387 - 389 | TqK | |
| 395 - 397 | TnK | |
| 497 - 499 | SvR | |
| Tyrosine sulfation site | 501 - 515 | gtydypqYseearlk |
Analysis of the sequence comparison shows that the maximum number of mutations were observed in HA (5 sites) followed by NP (3 sites) and NA (2 sites) (Fig. 2). Thus, we can say that HA of influenza A virus H5N1 is most sensitive to mutations. Then we find that these mutations involve change in polarity or hydrophobicity. Furthermore, not only the polarity or hydrophobicity is significantly altered by most mutations (Tab. 1) but also the propensity of each amino acid residue to stabilize the secondary structure [Prösch et al., 1990]. It was observed that in HA at position 57 and 58 the conformation was changed from coil to strand and helix respectively and at position 205 the conformation was changed from coil to helix. These changes in structural conformation may be due to change in sharing of side chains caused by the mutation at position 59 (D
A) and 204 (T
I). All the important domains in the three sequences were tracked (Table 2). The mutation K332R was found at PKC phosphorylation site in NA but it is not affecting phosphorylation according to in silico study but in future if mutation occurs at this position the virus may become lethal to humans.
Although we have not yet been predicted any mutation that may lead to an outbreak of bird flu rather we can in principle monitor the mutations along the time course, and predict the trend of mutations. Thus, further mutational analysis would have to be carried out to map a specific amino acid change in a protein causing the high pathogenicicity of the virus.
Our study revealed that HA is most prone to mutations (5 vs. 3 in NP and 2 in NA), thus we conclude that HA might be an important protein involved in the pathogenesis of influenza A virus. We also found that these mutations involve change in polarity or hydrophobicity. Furthermore, it is not only the polarity or hydrophobicity significantly altered by most mutations but also the propensity of each amino acid residue to stabilize the secondary structure. The mutation at PKC phosphorylation site in NA from K to R does not make any difference but, if another mutation occurs at this point then it might be fatal. Secondary structure prediction file of all the three proteins is available at http://www.geocities.com/amubioinfo/InfluenzaAVirus.htm. The program MAPMUTATION can be used to predict mutations in new strains.
The authors are grateful to Prof. M Saleemuddin for providing facilities to carryout this work and his morale support throughout this project. We also thanks to the Staff of the Distributed information sub-center for their technical help. Department of Biotechnology, Ministry of Science and Technology, Government of India is acknowledged for the financial support.
Additional file 1 - Secondary structure prediction results file at http://www.geocities.com/amubioinfo/InfluenzaAVirus.htm.