ISB Home



- Article -





Volume 7

Special Issue
BGRS 2006



Full article

In Silico Biology 7, 0041 (2007); ©2007, Bioinformation Systems e.V.  



Quality assessment of the Affymetrix U133A&B probesets by target sequence mapping and expression data analysis

Yuriy L. Orlov1, Jiangtao Zhou1, Leonard L. Lipovich1, Atif Shahab2 and Vladimir A. Kuznetsov1*

1 Genome Institute of Singapore, 60 Biopolis str., Genome, Singapore, 138672
2 Bioinformatics Institute, Singapore, 30 Biopolis Street, Matrix, Singapore 138671

* Corresponding author
   Email: kuznetsov@gis.a-star.edu.sg


Edited by S. Rodin (guest editor) and N. Kolchanov; received March 15, 2006; revised and accepted May 04, 2007; published August 19, 2007


Abstract

Careful analysis of microarray probe design should be an obligatory component of MicroArray Quality Control (MACQ) project [Patterson et al., 2006; et al., 2006] initiated by the FDA (USA) in order to provide quality control tools to researchers of gene expression profiles and to translate the microarray technology from bench to bedside. The identification and filtering of unreliable probesets are important preprocessing steps before analysis of microarray data. These steps may result in an essential improvement in the selection of differentially expressed genes, gene clustering and construction of co-regulatory expression networks. We revised genome localization of the Affymetrix U133A&B GeneChip initial (target) probe sequences, and evaluated the impact of erroneous and poorly annotated target sequences on the quality of gene expression data. We found about 25% of Affymetrix target sequences overlapping with interspersed repeats that could cause cross-hybridization effects. In total, discrepancies in target sequence annotation account for up to ~30% of 44692 Affymetrix probesets. We introduce a novel quality control algorithm based on target sequence mapping onto genome and GeneChip expression data analysis. To validate the quality of probesets we used expression data from large, clinically and genetically distinct groups of breast cancers (249 samples). For the first time, we quantitatively evaluated the effect of repeats and other sources of inadequate probe design on the specificity, reliability and discrimination ability of Affymetrix probesets. We propose that only functionally reliable Affymetrix probesets that passed our quality control algorithm (~86%) for gene expression analysis should be utilized. The target sequence annotation and filtering is available upon request.


Keywords: U133 microarray, target sequences, noise signals, cross-hybridization, human genome, gene expression, sense-antisense gene pairs, interspersed repeats, breast cancer