In Silico Biology 9, 0022 (2009); ©2009, Bioinformation Systems e.V.  


HIV1 V3 loop hypermutability is enhanced by the guanine usage bias in the part of env gene coding for it


Vladislav Victorovich Khrustalev




Department of General Chemistry, Belarussian State Medical University
83 Dzerzinskogo Prospect, Minsk 220000, Belarus


   Email: vvkhrustalev@mail.ru
   7-24 Communisticheskaya street, Minsk 220029, Belarus
   Phone/Fax: 80172845957





Edited by E. Wingender; received December 11, 2008; revised March 03 and May 05, 2009; accepted May 21, 2009; published August 02, 2009



Abstract

Guanine is the most mutable nucleotide in HIV genes because of frequently occurring G to A transitions, which are caused by cytosine deamination in viral DNA minus strands catalyzed by APOBEC enzymes. Distribution of guanine between three codon positions should influence the probability for G to A mutation to be nonsynonymous (to occur in first or second codon position). We discovered that nucleotide sequences of env genes coding for third variable regions (V3 loops) of gp120 from HIV1 and HIV2 have different kinds of guanine usage biases. In the HIV1 reference strain and 100 additionally analyzed HIV1 strains the guanine usage bias in V3 loop coding regions (2G>1G>>3G) should lead to elevated nonsynonymous G to A transitions occurrence rates. In the HIV2 reference strain and 100 other HIV2 strains guanine usage bias in V3 loop coding regions (3G>2G>1G) should protect V3 loops from hypermutability. According to the HIV1 and HIV2 V3 alignment, insertion of the sequence enriched with 2G (21 codons in length) occurred during the evolution of HIV1 predecessor, while insertion of the different sequence enriched with 3G (19 codons in length) occurred during the evolution of HIV2 predecessor. The higher is the level of 3G in the V3 coding region, the lower should be the immune escaping mutation occurrence rates. This hypothesis was tested in this study by comparing the guanine usage in V3 loop coding regions from HIV1 fast and slow progressors. All calculations have been performed by our algorithms "VVK In length", "VVK Dinucleotides" and "VVK Consensus" (www.barkovsky.hotmail.ru).

Keywords: HIV1, HIV2, APOBEC, mutational pressure, gp120; V3 loop, guanine usage, fast AIDS progressors, slow AIDS progressors, codon usage bias, immune escaping



Introduction

The third variable region of the human immunodeficiency virus (HIV) envelope glycoprotein gp120 is known to vary greatly between different viral strains as well as within each host despite the fact that its function is in interaction with entrance co-receptors (CCR5 or CXCR4) [Yamaguchi and Gojobori, 1997]. The role of gp120 V3 region (V3 loop) as the most variable part of HIV proteins has been discussed in many studies. Some of the investigators came to the conclusion that the V3 loop is not the most variable part of HIV proteins [Yang et al., 2003], while others suggest that it is under positive selection [Yamaguchi and Gojobori, 1997]. In our opinion the results of such studies are highly dependent on the number of clones and also on the characteristics of concrete nucleotide sequences coding for the V3 region.

Polypeptide gp160 is the precursor of gp120. It is encoded by the env gene. The precursor polypeptide is cleaved by cellular proteases into the mature proteins gp120 and gp41. The length of gp120 in the reference strain of HIV type 1 is 483 amino acid residues, the length of its third variable region is 36 amino acid residues. the V3 region is a strong discontinuous (3D) B-cell epitope [Spenlehauer et al., 1998]. Neutralizing antibodies directed againstv the V3 loop are found in almost every infected individual [Moore and Ho, 1993]. Because of the hypervariability of the third variable region during the course of HIV-infection, the immune system usually becomes misdirected and deregulated. Immune response cannot focus on more protective targets than the frequently changing V3 loop [Spenlehauer et al., 1998]. Hyperstimulation of B-cells may also lead to their transformation into lymphomas [Williamson, 2003].

It has been proved that the rates of disease progression are slower in persons infected with HIV type 2 than in those infected with HIV type 1 virus [Sankalé et al., 1995]. HIV2-associated AIDS is also developing slower than HIV1-associated immunodeficiency. V3 region of HIV2 has much lower level of variability than V3 region of HIV1 [Sankalé et al., 1995]. So, there is a widespread hypothesis connecting these two facts: the level of V3 region variability may influence the speed of AIDS progression, along with many other factors associated with critical features of the virus and its host. On the other hand, some studies show that V3 region diversity is not always positively correlated with disease progression rates [Williamson, 2003].

The goal of this study is to determine the cause of the increased variability of the HIV1 V3 region in comparison with that of the HIV2 V3 loop.

The main molecular mechanism of HIV hypermutation is thought to be identified: cellular RNA-editing enzymes of the APOBEC family deaminate cytosine in viral minus strand DNA [Liddament et al., 2004; Izumi et al., 2008; Pillai et al., 2008]. This process results in G to A (guanine to adenine) hypermutation on plus strand RNA [Izumi et al., 2008] and makes a majorcontribution to the mutational A-pressure in HIV genomes [Liddament et al., 2004; Izumi et al., 2008; Pillai et al., 2008]. The term "A-pressure" was proposed back in 1994 [Berkhout and van Hemert, 1994] after the discovery of the following fact: coding regions of HIV genes are enriched with adenine (A), especially in the neutral third codon positions. So, the most mutable nucleotide on plus strand RNA should be guanine (G). In recent studies several targets for certain APOBEC enzymes have been determined. APOBEC3G binds to GG dinucleotides catalyzing GG to AG mutations (one has to realize that in fact APOBEC3G binds to CC dinucleotide in DNA minus strand introducing CC to CU mutation) [Liddament et al., 2004]. APOBEC3F [Liddament et al., 2004], APOBEC3C [Bourara et al., 2007] and APOBEC3DE [Dang et al., 2006] preferably cause GA to AA mutations, while the aforementioned enzyme (APOBEC3G) is also able to cause GC to AC mutations.

In this study we showed that there is a significant guanine usage bias in nucleotide sequences coding for the HIV1 V3 loop: guanine is situated mostly in second and first codon positions, while the level of G in third codon positions is minimal. In nucleotide sequences coding for the HIV2 V3 loop the situation is absolutely different: guanine is situated mostly in third codon positions. It means that the probability for G to A transition to occur in the second or first codon position (i. e. to be nonsynonymous) is much higher in nucleotide sequences coding for HIV1 V3 than in sequences coding for the HIV2 V3 loop. As to dinucleotide usage, the total frequency of dinucleotides that can mutate nonsynonymously (frequency of GG, GA and GC dinucleotides situated in first and second and in second and third codon positions) is 14.96% for the region coding for HIV1 V3 and just 9.88% for that coding for HIV2 V3. It is known that GA to AA bias prevails in HIV-1 sequences derived from infected individuals [Liddament et al., 2004]. So, the usage of GA dinucleotides situated in first and second codon positions and in second and third ones for the region coding for HIV1 V3 is 8.4%, while for the region coding for HIV2 V3 it is just 2.97%.

Our hypothesis states that HIV1 gp120 V3 region hypermutability is enhanced by the biased guanine usage (2G>1G>>3G) in the nucleotide sequence coding for it. This bias should lead to elevated nonsynonymous mutations occurrence rates, and so, for increase in V3 variability.

To prove our hypotheses we performed calculations of guanine usage in 100 sequences coding for HIV1 V3 and 100 sequences coding for the HIV2 V3 region. We also compared the usage of G in sequences coding for HIV1 V3 from slow and fast AIDS-progressors.

The usage of guanine in codon positions along the length of genes as well as patterns of dinucleotide usage have been calculated in complete env genes from HIV1 and HIV2 reference strains. All calculations have been performed with the help of our computer algorithms "VVK in length", "VVK Consensus" and "VVK Dinucleotides" which can be downloaded for free from our web site (www.barkovsky.hotmail.ru).



Methods

Our first in-silico experiment has been performed on nucleotide sequences of env genes from HIV1 and HIV2 reference genomes (GenBank identificators: NC_001802 and NC_001722, respectively). The length of the V3 loop is not the same for HIV1 and HIV2 reference strains, so nucleotide sequence coding for HIV1 gp120 has been cut into 36 codon pieces by our MS Excel tool called "VVK in length", while the sequence coding for HIV2 gp120 (gp105) has been cut into 34 codon pieces. In this algorithm the length of pieces into which one is going to cut nucleotide sequence is variable. So, to use "VVK in length" one should just enter the nucleotide sequence in the following cell on the "full sequence" list and write the length of each separate piece (in codons) in the special cell on the "length" list. Right after these actions the nucleotide content in each of the nucleotide sequence pieces will be counted on the "content" list. With the help of "VVK in length" one can count not only total nucleotide content but the nucleotide content in three codon positions too. This method is very useful for determining the contribution of insertions, deletions and frameshifting into the evolution of coding region.

In this work we focused on guanine content distribution along the length of gp120 coding regions. We cut away the first 15 codons of the HIV1 env gene region coding for gp120 to count guanine content in the whole sequence (part 8) coding for V3 loop. In the HIV2 gp120 (gp105) coding region we cut away the first 17 codons to achieve the same aim. The result of our work can be seen in Fig. 1 and Fig. 2.



Click on the thumbnail to enlarge the picture
Figure 1: Sequence of HIV1 env gene coding for gp120 (beginning from codon N15) cut in pieces of 36 codons in length each. Levels of total guanine usage (G) and the usage of guanine in three codon positions (1G, 2G and 3G) have been counted for each piece of the sequence. Location of the piece coding for the V3 loop is shown.


Click on the thumbnail to enlarge the picture
Figure 2: Sequence of HIV2 env gene coding for gp105 (beginning from codon N17) cut in pieces of 34 codons in length each. Levels of total guanine usage (G) and the usage of guanine in three codon positions (1G, 2G and 3G) have been counted for each piece of the sequence. Location of the piece coding for the V3 loop is shown.

To make an accurate comparison of biases in guanine usage that occurred due to mutational pressure we purified the complete env genes of HIV1 and HIV2 from possible insertions and frameshiftings. We deleted all variable amino acids from the alignment and compared the levels of guanine usage in the three codon positions (1G, 2G and 3G) in regions coding only for conserved amino acid residues from HIV1 and HIV2 env genes.

We performed the second experiment with the help of our "VVK Dinucleotides" MS Excel tool. This algorithm counts total frequencies of each dinucleotide usage in the entered nucleotide sequence as well as the distribution of each type of dinucleotide between codon positions. A dinucleotide can be situated (i) in first and second, (ii) in second and third or (iii) in third and first codon positions. It is important to distinguish between these three variants, especially in regions coding for HIV1 V3 and HIV2 V3 loops. So, we counted dinucleotide frequencies and their distribution between codon positions in these two regions (Tab. 1 and Tab. 2) as well as in env genes from the HIV1 and HIV2 reference strains (Tab. 3 and Tab. 4). Moreover, we counted dinucleotide frequencies and their distribution between codon positions in five canonical variable and five conserved regions of HIV1 gp120 from the reference strain.


Table 1: Dinucleotide composition of region coding for HIV1 V3 loop. Total frequency of GC, GG and GA dinucleotides that can mutate nonsynonymously is 14.96% (84% of all hypermutable dinucleotides).
DinucleotideCCCGCUCAGCGGGUGA
Total frequency, %4.670.930.0011.221.874.673.7411.22
1-2 codon positions, %40100-2510080250
2-3 codon positions, %200-500207575
3-1 codon positions, %400-2500025
 UCUGUUUAACAGAUAA
Total frequency, %1.873.743.746.548.4112.158.4116.82
1-2 codon positions, %05025033465633
2-3 codon positions, %1002550292273317
3-1 codon positions, %025257145461150


Table 2: Dinucleotide composition of region coding for HIV2 V3 loop. Total frequency of GC, GG and GA dinucleotides that can mutate nonsynonymously is 9.88% (62% of all hypermutable dinucleotides).
DinucleotideCCCGCUCAGCGGGUGA
Total frequency, %3.960.003.9612.874.955.945.944.95
1-2 codon positions, %100-25232033330
2-3 codon positions, %0-506220501760
3-1 codon positions, %0-251560175040
 UCUGUUUAACAGAUAA
Total frequency, %4.954.956.936.937.9210.895.948.91
1-2 codon positions, %406043038285044
2-3 codon positions, %204043432436170
3-1 codon positions, %400145738363356


Table 3: Dinucleotide composition of region coding for HIV1 env. Total frequency of GC, GG and GA dinucleotides that can mutate nonsynonymously is 12.78% (68% of all hypermutable dinucleotides).
DinucleotideCCCGCUCAGCGGGUGA
Total frequency, %3.741.014.517.864.246.585.108.05
1-2 codon positions, %3023372842344236
2-3 codon positions, %4050343426323533
3-1 codon positions, %3027293832342331
 UCUGUUUAACAGAUAA
Total frequency, %3.317.166.307.515.849.228.3711.21
1-2 codon positions, %2930401040343836
2-3 codon positions, %4735324527223729
3-1 codon positions, %2435284533442535


Table 4: Dinucleotide composition of region coding for HIV2 env. Total frequency of GC, GG and GA dinucleotides that can mutate nonsynonymously is 12.93% (67% of all hypermutable dinucleotides).
DinucleotideCCCGCUCAGCGGGUGA
Total frequency, %4.492.485.58.135.276.624.767.47
1-2 codon positions, %3024303040303940
2-3 codon positions, %3638333728392424
3-1 codon positions, %3438373332313736
 UCUGUUUAACAGAUAA
Total frequency, %4.226.865.696.666.628.177.479.57
1-2 codon positions, %2933412040273736
2-3 codon positions, %4533304133283930
3-1 codon positions, %2634293927452434

The third experiment has been performed on 100 nucleotide sequences coding for the HIV1 V3 region and 100 nucleotide sequences coding for the HIV2 V3 region. GenBank accession numbers for nucleotide sequences coding for the V3 region are for HIV1: EU664612 - EU664683; U70809 - U70821; U70597 - U70614; for HIV2: U24287 - U24388.

With the help of another MS Excel tool called "VVK Consensus" we counted guanine content in 100 nucleotide sequences coding for the HIV1 V3 loop and 100 sequences coding for the HIV2 V3 loop. If one enters in cells on "sequences" list of "VVK Consensus" previously aligned nucleotide sequences, the nucleotide content of them will appear on "content" list. So, our algorithm provides a good opportunity to make a graph with nucleotide usage distribution for a number (no more than 100 sequences, up to 4000 nucleotide in length) of alleles or phylogenetically related genes with the help of MS Excel. Actually, the main function of "VVK Consensus" is to count nucleotide substitutions from the consensus sequence, but in this study we used only the previously described additional function of the tool. Two graphs have been built by us with the total level of G placed on the X-axis and levels of G in the three codon positions (1G, 2G and 3G) placed on the Y-axis: the first one (see Fig. 4) is for 100 HIV1 sequences coding for V3, the second one (see Fig. 5) is for 100 HIV2 sequences coding for the V3 loop.



Click on the thumbnail to enlarge the picture
Figure 4: Levels of guanine usage in 100 nucleotide sequences coding for HIV1 V3 loops. Dependences between total guanine usage (G) and guanine usage in three codon positions (1G, 2G and 3G) are shown.


Click on the thumbnail to enlarge the picture
Figure 5: Levels of guanine usage in 100 nucleotide sequences coding for HIV2 V3 loops. Dependences between total guanine usage (G) and guanine usage in three codon positions (1G, 2G and 3G) are shown.

The fourth in-silico experiment has been performed on HIV1 nucleotide sequences coding for V3 loop from two HIV-infected babies. All these sequences came from a single study [Ripamonti et al., 2007], their GenBank accessions are given below. There are 44 sequences from the fast progressor: EF657933 - EF657976; and 42 sequences from the slow progressor: EF657890 - EF657932. With the help of "VVK Consensus" we compared the guanine usage in nucleotide sequences coding for V3 regions from slow and fast HIV1 progressors.

For the alignment of gp120 and complete env polyproteins from HIV1 and HIV2 we used MEGA4 program [Tamura et al., 2007]. Alignments have been performed using the PAM matrix included in the program.



Results

As one can see in Fig. 1, the level of guanine varies along the length of nucleotide sequence coding for HIV1 gp120. There is no overlapping in the part of env gene coding strictly for gp120. Variations in guanine distribution between codon positions in the first 7 parts of this coding region may be interpreted as evidence of frequently occurred deletions, insertions and frameshifting during the evolution of env gene. If there was only mutational A-pressure associated with G to A hypermutation acting on this gene, then levels of guanine in third codon positions (3G) would be low and close to each other in all of its parts. As shown in Fig. 1, 3G is lower than 1G and 2G only along the length from part 8 (coding for V3 loop) to part 13.

Transitions in third codon positions, unlike transitions in first and second codon positions, are mostly synonymous [Berkhout and van Hemert, 1994]. The most common nucleotide mutation for HIV genes is G to A transition [Liddament et al., 2004; Izumi et al., 2008]. So, the probability for G to A mutation to take place in first or second codon position can be calculated easily. One should divide the sum of guanine usage levels in first and second codon positions (1G + 2G) by the sum of guanine usage levels in all three codon positions (1G + 2G + 3G). It means that the lower is 3G and the higher are 1G and 2G, the higher is the probability of nonsynonymous G to A transition. According to this calculation, the probability for G to A transition to be nonsynonymous in the sequence coding for the V3 loop of HIV1 is 87.0% (see Fig. 1).

In Fig. 2 one can see that 3G in the sequence coding for V3 loop of HIV2 is higher than 2G and 1G. The probability for a G to A transition to be nonsynonymous in this sequence (59.1%) is much lower than in the previously described one. In other words, first and second codon positions of the sequence coding for the V3 loop of HIV2 are relatively protected from G to A transitions by the "buffer" of guanine situated in neutral third codon positions. In the HIV1 V3 coding region this "buffer" is much lower, while the level of guanine in second codon positions is much higher.

To make an accurate comparison of guanine usage in the complete env genes from the HIV1 and HIV2 reference strains we deleted all variable amino acids from their alignment (PAM method was used). There are 155 conserved amino acid residues in HIV1 and HIV2 env genes. The level of 3G in the nucleotide sequence coding for these conserved amino acids is somewhat lower for the HIV1 env gene than for its HIV2 homologue (0.148 versus 0.174, consequently), unlike the level of 2G (0.290 for HIV1 "conserved" env and 0.277 for HIV2 "conserved" env). So, the probability for G to A transition to be nonsynonymous in the sequence coding for conserved amino acid residues of HIV1 env is 78.09%, while for conserved amino acid residues from HIV2 env this probability is equal to 74.76%.

The distribution of dinucleotides which are targets for APOBEC (GG, GA and GC) between codon positions of regions coding for HIV1 V3 (see Tab. 1) and HIV2 V3 (see Tab. 2) follows the distribution of G. Indeed, there are 17.76% of hypermutable dinucleotides in HIV1 V3 and 15.84% of them in the HIV2 V3 coding region. Furthermore, 38% of hypermutable dinucleotides in region coding for HIV2 V3 loop are situated in third and first codon positions: mutations in them will cause synonymous substitutions. For the HIV1 V3 coding region the percentage of "silent" hypermutable dinucleotides is much lower (16%).

GA is thought to be the most hypermutable dinucleotide in HIV genes [Liddament et al., 2004]. As can be seen in Tab. 1 and Tab. 2, the usage of GA dinucleotides situated in first and second codon positions and in second and third ones for the region coding for HIV1 V3 is 2.8 times higher than that for the region coding for HIV2 V3.

Interestingly, frequencies of hypermutable dinucleotides being able to mutate nonsynonymously in complete env genes from HIV1 (see Tab. 3) and HIV2 (see Tab. 4) are practically equal to each other. This fact may highlight the importance of guanine (and likewise of target dinucleotides) usage bias in the regions coding for V3 loop. However, one can see that the level of GA dinucleotides situated in first and second codon positions and in second and third ones for the HIV1 env gene (5.55%) is some higher than that for the HIV2 env gene (4.78%).

We made an alignment of HIV1 gp120 and HIV2 gp105 using PAM matrix to see what has happened with their V3 regions, which are so extremely different in their guanine usage. In Fig. 3 we show this alignment containing large gaps in both HIV1 and HIV2 sequences. It looks like "central" parts of V3 regions are nonhomologous to each other in HIV1 and HIV2. It is likely that an insertion of 21 codons has occurred sometime during the evolution of the HIV1 common predecessor (after the divergence between lineages leading to HIV1 and HIV2 viruses). This possible insertion is enriched with guanine in its second codon positions (2G = 0.381) as well as in first ones (1G = 0.286) while its 3G level is low (3G = 0.143).



Click on the thumbnail to enlarge the picture
Figure 3: Aligned amino acid sequences of V3 regions from HIV1 and HIV2 reference strains. In this figure only a part of gp120 HIV1 and gp105 HIV2 alignment is shown. For this alignment a PAM matrix has been used (see Methods). Conserved amino acid residues are written in bold. Borders of possible insertions in HIV1 and HIV2 V3 sequences are indicated.

The possible insertion that occurred in the common predecessor of HIV2 (sometime after the divergence between lineages leading to HIV2 and HIV1 viruses) is 19 codons in length. This sequence is poor in guanine (1G = 0.158; 2G = 0.053; 3G = 0.211), but the highest level of G is in its third codon positions. Here we can state that HIV1 and HIV2 sequences coding for the V3 loop are different in their guanine distribution between codon positions mostly due to the described guanine usage biases in the possible insertions that occurred during their separate evolution.

As can be seen in Fig. 3, the possible insertion of the HIV1 V3 loop is enriched with arginine (four amino acid residues out of twenty one) and glycine (four amino acid residues), which are coded by codons containing guanine in second codon positions. Probably, this insertion was fixed in the HIV1 population by natural selection due to its important features beneficial for the virus (better binding with co-receptor, for example). At the same time, this insertion is a hotspot for nonsynonymous G to A mutations. Mutational pressure caused by APOBEC editing has a tendency to produce various V3 loop mutants, while negative selection should eliminate these mutants from viral population. However, those frequently occurring mutants may still cause immune escaping and immune deception, as well as a switch in co-receptor usage, not being fixed in the whole viral population.

To see if the guanine usage bias observed in the reference strain (2G>1G>>3G) is characteristic for most HIV1 sequences coding for the V3 loop, we counted the guanine usage levels in 100 sequences. These sequences came from two different studies. In one of them sequences were obtained from patients infected with HIV1 subtype C [Tsibris et al., 2008], in the second study sequences were obtained from subtype B and subtype D infected patients [Brengel-Pesce et al., 1996]. We have chosen from GenBank nucleotide sequences with no missing data to make reliable calculations of guanine usage. The result of our calculation is shown in Fig. 4. It can be seen that the 3G level is much lower than 2G and 1G levels in all those 100 HIV1 nucleotide sequences coding for the V3 loop. There is also a significant correlation between 2G and total level of G (R = 0.54), and there is no correlation between 3G and total level of guanine (R = 0.07). These data confirm our suggestion that guanine in second codon positions of HIV1 V3 coding regions is the most probable target for mutation because of its high level, unlike guanine in third codon positions in which its level is extremely low.

In Fig. 5 one can see that guanine usage bias observed in the reference strain (3G>2G>1G) is characteristic for the majority of HIV2 nucleotide sequences coding for the V3 loop. Sequences of the HIV2 V3 coding regions were obtained during a single study focused on their variability [Sankalé et al., 1993]. There is a strong correlation between 3G and total level of guanine in HIV2 V3 coding regions (R = 0.92). This is the evidence that guanine in the third codon positions of HIV2 V3 coding regions may be a "buffer" protecting first and second codon positions from frequent mutations.

According to a widespread hypothesis, elevated variability of HIV epitopes, including the V3 loop, may play a significant role in AIDS-progression: the higher is the frequency of amino acid substitutions in V3 loop, the faster AIDS can develop [Williamson, 2003; Ripamonti et al., 2007]. The frequency of amino acid substitutions occurrence in the V3 loop depends on the intensity of the mutation process and also on the probability of nonsynonymous mutations to occur.

There is a wide intrapatient variability of V3 loops. Some of the amino acid substitutions in the V3 region can surely lead to a loss of viral infectivity or its decrease, but viral particles with "negatively" mutated V3 loops can still be targets for immune response, especially if they are occurring consequently during the course of infection. Higher levels of 3G in nucleotide sequences coding for HIV2 V3 loops may be one of the factors leading to the decreased variability of HIV2 V3 and relatively better survival of HIV2-infected persons.

Our final experiment in this study has been performed on sequences coding for the HIV1 V3 region from two siblings born from a HIV-infected mother [Ripamonti et al., 2007]. The first-born child was a slow HIV-progressor, sequences of V3 coding regions were derived from him between 9 and 67 months. The second-born child was a fast HIV-progressor, sequences coding for V3 loop were derived from him between 6 and 42 months. Three years passed before the second child was born. In the original study [Ripamonti et al., 2007] from which all the sequences came faster nucleotide substitution rates were observed for viruses from the fast progressor. In our in-silico work we counted guanine usage distribution between codon positions in all sequences from fast (Fig. 6) and slow (Fig. 7) progressors.



Click on the thumbnail to enlarge the picture
Figure 6: Levels of guanine usage in 44 nucleotide sequences coding for HIV1 V3 loops from a fast progressor. Dependences between total guanine usage (G) and guanine usage in the three codon positions (1G, 2G and 3G) are shown.


Click on the thumbnail to enlarge the picture
Figure 7: Levels of guanine usage in 42 nucleotide sequences coding for HIV1 V3 loops from a slow progressor. Dependences between total guanine usage (G) and guanine usage in the three codon positions (1G, 2G and 3G) are shown.

In Fig. 6 one can see decreased 3G levels and the correlation of 2G and 1G on total guanine usage level. In contrast, the graph in Fig. 7 shows the absence of a significant correlation of 2G and 1G with the total guanine usage level; the level of 3G is higher than in Fig. 6 and there is a correlation of 3G with the total G level.

An interesting detail is that both children have been infected by their mother. It seems like viral population in the mother's organism changed during the period of three years: the level of 3G in V3 coding regions decreased significantly. So, the first child has been infected with viruses the V3 loops of which were relatively protected from nonsynonymous G to A mutations. The second child has been infected with viruses the V3 coding regions of which had already lost their 3G "buffer".

Our hypothesis is working at least on these two siblings. The counting of guanine usage in nucleotide sequences coding for V3 loop may be used as a prognostic criterion in the future (when our hypothesis will be approved on more clinical and molecular data), while there is a lot of other even still undiscovered factors that can influence the course of this disease.



Discussion

In this in-silico study attention was focused on guanine usage levels in three codon positions of nucleotide sequences coding for HIV V3 loops. Guanine is thought to be the most mutable nucleotide in HIV genes [Izumi et al., 2008]. That is why the distribution of guanine between the three codon positions is so important.

G to A mutations in third codon positions are synonymous in most of the cases (AUG and TGG codons are two exceptions). The same kind of nucleotide mutation will result in an amino acid change if it occurs in the first or second codon position. To calculate the probability for G to A transition to take place in first and second codon positions of nucleotide sequences coding for V3 from HIV1 and HIV2 reference strains we simply divided the sum of guanine levels in their first and second codon positions (1G + 2G) by the sum of guanine levels in the three codon positions (1G + 2G + 3G). This probability may be calculated for other genes if the main direction of mutations and the most mutable nucleotide are known.

We observed significant differences in guanine usage between sequences coding for HIV1 and HIV2 V3 loops. The bias in the HIV1 V3 loop coding region can be written in form of the following inequality: 2G>1G>>3G. The bias in the HIV2 V3 loop coding region is absolutely different: 3G>2G>1G.

The distribution of hypermutable dinucleotides (GA, GG and GC) between codon positions follows the distribution of guanine itself: just 16% of these dinucleotides in the region coding for HIV1 V3 loop can mutate synonymously, while there are 38% of "silent" APOBEC targets in the region coding for HIV2 V3. The frequency of abovementioned hypermutable dinucleotides situated in first and second and second and third codon positions is 1.51 times higher for the region coding for the HIV1 V3 than for that coding for the HIV2 V3 loop. This frequency is higher in the region coding for the V3 loop than in the whole env gene of HIV1; in HIV2 the situation is opposite.

The cause of these different guanine usage patterns seems to be connected with insertion/deletion events more likely than with single nucleotide substitutions. According to our alignment, 21 codons from the HIV1 V3 coding region occurred due to insertion of genetic material with biased guanine usage in the env gene of its evolutionary predecessor (there is extremely high guanine usage level in second codon positions of this possibly inserted nucleotide sequence). The nucleotide sequence coding for the HIV2 V3 region possesses a nonhomological part of ~19 codons in length. This possible insertion that occurred in a HIV2 evolutionary predecessor has low 2G and 1G levels, but the level of 3G is somewhat higher inside it.

The significance of V3 loop hypervariability during the HIV-infection has been described in different ways [Spenlehauer, 1998]. Positive selection was found in sequences coding for the HIV1 V3 region from the same patient by different methods [Yamaguchi and Gojobori, 1997].

In this work we found out the possible cause of elevated nonsynonymous mutation rates in HIV1 V3 loop coding regions in relation to synonymous ones. The rates of synonymous mutations may be decreased because of low guanine levels in third codon positions. The same kind of situation has already been described by us [Khrustalev and Barkovsky, 2008] but for the different direction of mutational pressure (in ICP0 genes from Cercopithecine simplexviruses). In general, due to negative selection, substitutions caused by mutational pressure are fixed mostly in third codon positions. With the "melting of buffer" in neutral third codon positions the probability for substitution caused by mutational pressure to be synonymous decreases and the probability to be nonsynonymous increases [Khrustalev and Barkovsky, 2008].

The same kind of guanine usage bias (2G>1G>>3G) was observed in 100 nucleotide sequences coding for the HIV1 V3 region. On the other hand, HIV1 V3 coding regions from slow and fast progressors showed some differences in 3G. The process of 3G "melting" in V3 loop coding regions might take place in the viral population of an HIV-infected mother which gave birth to fast progressor baby three years after her first baby (slow progressor) had been born.

According to our data, we can describe the common hypothetical tendency for HIV1 and HIV2 V3 loop coding regions: the higher is the level of guanine in third codon positions - the lower is the probability for G to A transition to be nonsynonymous - the slower is the speed of immune escaping occurred due to amino acid substitutions in the V3 loop - the better is the course of HIV-infection.

To test whether the hypermutability of other variable regions of HIV1 env is enhanced by guanine usage bias we calculated the target dinucleotides (GA, GG and GC) frequency in regions coding for canonical gp120 conserved (C) and variable (V) regions, as well as their distribution between codon positions. Here we should say that the real borders of variable regions in HIV1 env are still discussed. Canonical borders have been estimated on the limited number of sequences, the V5 region is relatively short (8 amino acid residues in length), while the first and second variable regions are separated from each other by a single cysteine residue. So, according to our approach we can state that there is a bias in target dinucleotides usage not only in the region coding for V3, but also in the region coding for V4 (see Tab. 5).


Table 5: Frequency of target dinucleotides (GA, GG and GC) usage and their distribution between codon positions in conserved (C) and variable (V) regions of HIV1 gp120 coding region.
Region of HIV1 gp120
coding region
Length, ntTotal frequency of target dinucleotides (GA, GG and GC) usageFrequency of target dinucleotides situated in first and second and in second and third codon positionsFrequency of target dinucleotides situated in third and first codon positionsPercent of target dinucleotides which can mutate nonsynonymously, %
C13060.1410.0920.04965.25
V1-V21980.1680.1120.05666.67
C22970.1220.0810.04766.39
V31080.1780.1500.02884.27
C31590.1650.1270.03876.97
V4870.1160.1050.01190.52
C41380.1460.1170.02980.14
V5240.1740.1310.04375.29
C51320.2370.1760.06174.26

Hypervariability of the amino acid sequence may be caused by two main mechanisms: the variable region of a protein may be coded by a nucleotide sequence containing a lot of hot spots for mutations, or flanking regions of the variable region of a protein may be subject of stronger negative selection. So, hypervariability of any HIV1 gp120 region may be enhanced by guanine usage bias (just like in case with HIV1 V3 and V4) or may not.

The probability for G to A mutation in target dinucleotides of C1, V1-V2 and C3 regions to be nonsynonymous is practically the same, but the frequency of target dinucleotides situated in first and second and in second and third codon positions is higher in V1-V2 region than in C1 and C2 (see Tab. 5). We can hypothesize that hypermutability may be enhanced by both bias in target dinucleotides distribution between codon positions and increased level of target dinucleotides able to mutate nonsynonymously (just like in case of V3 region), only by bias (just like in case of V4 region) or only by increased levels of target dinucleotides (just like in case of V1-V2 region).

the V5 region is relatively short, the bias in target dinucleotides distribution between codon positions is lower but the level of target dinucleotides is higher inside it than in C4 region. C5 region has the highest level of target dinucleotides. According to our approach, nonsynonymous G to A mutations should occur frequently in this region.

In fact, consequences of G to A hypermutation in HIV genes caused by APOBEC enzymes may be different from immune escape acceleration. For example, some of these mutations in V3 coding regions may cause the switch in co-receptor usage. They also can lead to drug-resistance [Berkhout and de Ronde, 2004]. Mutations beneficial for the virus may be selected during the course of the disease. But to be selected these mutations have to occur. One should remember that synonymous nucleotide substitutions never affect the function of a protein because its amino acid structure stays the same. That is why our attention was focused on the probability of nonsynonymous G to A mutation.

From this point of view it seems strange that activators of APOBEC are proposed as a new remedy against HIV-infection [Izumi et al., 2008]. In case of increased G to A transitions rates the number of "vital" viral corpuscles will be decreased at the same time as the number of immune escape variants (including those variants with mutated V3 loop) and drug-resistant variants will grow up [Pillai et al., 2008]. In our opinion, usage of APOBEC activators may help the virus to escape immune answer (especially it may help HIV1 - due to biased guanine usage in region coding for V3 loop), instead of blocking its replication.

G to A hypermutation is also thought to be caused by HIV reverse transcriptase itself [Berkhout and de Ronde, 2004], but the goal of the present study does not depend on the cause of this hypermutation, because we analyzed probabilities of G to A mutation to be nonsynonymous in HIV1 and HIV2 third variable regions of gp120 (gp105) envelope glycoproteins.



References