ISB Home

- Article -

Volume 5

Full article

In Silico Biology 5, 0052 (2005); ©2005, Bioinformation Systems e.V.  

Comparative analysis of methodologies for the detection of horizontally transferred genes: a reassessment of first-order Markov models

Diego Q. Cortez, Antonio Lazcano and Arturo Becerra*

Institution Facultad de Ciencias, UNAM, Apdo. Postal 70-407 Cd. Universitaria, 04510, Mexico D.F., Mexico

* Corresponding author

Edited by J. Collado-Vides; received June 02, 2005; revised October 20, 2005; accepted October 25, 2005; published November 12, 2005


With the advent of larger genome databases detection of horizontal gene transfer events has been transformed into an increasingly important issue. Here we present a simple theoretical analysis based on the in silico artificial addition of known foreign genes from different prokaryotic groups into the genome of Escherichia coli K12 MG1655. Using this dataset as a control, we have tested the efficiency of four methodologies commonly employed to detect HTG (Horizontally transferred genes), which are based on (a) the codon adaptation index, codon usage, and GC percentage (CAI/GC); (b) a distributional profile (DP) approach made by a gene search in the closely related phylogenetic genomes; (c) a Bayesian model (BM); and (d) a first-order Markov model (MM). All methods exhibit limitations although, as shown here, the BM and the MM are better approximations. Moreover, the MM has demonstrated a more accurate rate of detections when genes from closely related organisms are evaluated. The application of the MM to detect recently transferred genes in the genomes of E. coli strains K12 MG1655, O157 EDL933, and Salmonella typhimurium, shows that these organisms have undergone a rather significant amount of HTG, most of which appear to be pseudogenes. Few of these sequences that have undergone HGT appear to have well defined functions and may be involved in the organism's adaptation.

Keywords: horizontally transferred gene, methodologies to detect HTG, first-order Markov model