Uveal melanoma is the most common primary intraocular tumour. Approximately 50% of patients die of metastases, primarily in the liver. It has been shown that the inclination to form metastases is significantly higher in patients with loss of an entire chromosome 3 (monosomy 3) in the tumour cells [1]. The reason for forming metastases are to be investigated at the genomic and the gene expression level. Several explanation patterns which are related to an anomaly on chromosome 3 come to mind:
The first explanation is difficult to investigate further as the expression level of most genes on chromosome 3 not involved in metastatic disease are also reduced in tumours with monosomy 3. Here, a method will be presented how the other explanation can be tested. Let n be a number of genes on a chromosome whose relative position with respect to one another is known well enough to determine the order in which they appear on the chromosome. Consider the case that all n genes are tested for expression under certain conditions, and that m out of n genes are measured as unexpressed. Lack of expression can be traced to either a lack of activity in protein coding or the absence of the gene in the current sample. If a strand of DNA is deleted a number of genes in a row would be rendered unexpressed. Assuming random ordering of expressed and unexpressed genes, the distribution of the number of unexpressed genes in a row can be calculated and the maximum length of unexpressed genes in the order on the chromosome of interest be tested against this distribution.
The problem can be reduced to a "consecutive k-out-of-n" problem, a variation of which is covered in the literature. Godbole (1990) gives a solution to the following problem: Consider a run of n Bernoulli trials with success probability p, and let k be the length of a run of zeros within the sequence. Let further be Nn,k the number of non-overlapping occurrences of runs of k zeros. The probability distribution of the number Nn,k of runs of length k within n trials can be expressed through the probability function:
To determine the probability that the longest run is of length k, we have to calculate 1-PG(Nn,k=0)-(1-PG(Nn,k+1=0)= PG(Nn,k+1=0)-PG(Nn,k=0).
In this case, we do not have a probability of no expression but a marginal number of unexpressed genes so that the probability of the longest run can be calculated conditional on the total number. With the above assumptions, the number of unexpressed genes will be larger than the marginal number as the expected number of unexpressed genes in a sequence with a longer run will be approximately k+p(n-k) which is larger than the marginal number of unexpressed genes m when p is estimated as m/n. The above formula can be used as an upper bound for the probability.
An exact expression can be formulated as well. The length of a run of unexpressed genes k out of a total number of unexpressed genes m in a then is then distributed according to a probability distribution that is given recursively here as
The calculation of the distribution is cumbersome as the number of expressions that have to be calculated is proportional to n3. Thus for larger values PG(Nn,k+1=0)-PG(Nn,k=0) can be used as an upper bound.
The problem here is a special case of the distribution of randomly occurring words in a sequence of letters from a given alphabet, a problem which arises in DNA sequencing. It has been shown [3] via the Stein-Chen method that PG(Nn,k) has an upper bound in the Poisson(npk) distribution. This leads to the approximation 1-exp(-npk) for the probability of runs of length k.
The above considerations allow us to construct a test for deletion or methylation of a contiguous strand of genes. This test is conservative where asymptotics are used, but not too far away from the true level when k is small compared to n. It is not necessary to give a power of the test since the length of the nonexpressed strand, when present, is not governed by chance.
Table 1: Critical values for a level-0.05 test for a sequence of nonexpressed genes from a total number of n genes in a specified order under the assumption of random ordering. Value has been determined eby the exact formula, gby Godbole's formula with p=m/n, pby Godbole's Poisson approximation with p=m/n.
=0.05 | m/n=0.2 | m/n=0.3 | m/n=0.4 | m/n=0.5 |
| n=20 | 4e | 4e | 5e | 7e |
| n=50 | 4e | 6g | 7g | 9g |
| n=100 | 5g | 6g | 8g | 10g |
| n=200 | 5p | 7p | 9p | 11p |
| n=500 | 6p | 8p | 10p | 13p |
Table 2: Critical values for a level-0.01 test for a sequence of nonexpressed genes from a total number of n genes.
| a=0.01 | m/n=0.2 | m/n=0.3 | m/n=0.4 | m/n=0.5 |
| n=20 | 4e | 5e | 6e | 8e |
| n=50 | 5e | 7g | 9g | 11g |
| n=100 | 5g | 8g | 10g | 12g |
| n=200 | 6p | 8p | 10p | 13p |
| n=500 | 7p | 9p | 12p | 15p |
If r tumour samples are available, the runs could be modelled as being independent under the null hypothesis, thus allowing to choose an overall bound
*=0.01 or 0.05 such that 1-(1-
)r=
*. From there, one can proceed to check for longer sequences that can be matched between the different expression patterns. Contrasting these samples with uvealmelanomas with disomy in chromosome 3 will allow to specify the hypothesis and further increase the power of the test.
The total length n of the sequence is the number of elements of the intersection of the sequenced and localised genes on the current chromosome and the genes that are contained in the design of the current DNA microarray.