In Silico Biology 7, 0015 (2007); ©2007, Bioinformation Systems e.V.  

Deep metazoan phylogeny

Daniel Gerlach1,2, Matthias Wolf1*, Thomas Dandekar1, Tobias Müller1, Andreas Pokorny1 and Sven Rahmann3

1 Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
2 Department of Genetic Medicine and Development, University of Geneva Medical School, 1 rue Michel-Servet, CH-1211 Geneva, Switzerland
3 Junior research group Computational Methods for Emerging Technologies (COMET); Genome Informatics, Faculty of Technology, Bielefeld University, D-33594 Bielefeld, Germany

* Corresponding author


Edited by E. Wingender; received November 11, 2006; revised February 14, 2007; accepted February 18, 2007; published February 24, 2007


We reconstructed a robust phylogenetic tree of the Metazoa, consisting of almost 1,500 taxa, by profile neighbor joining (PNJ), an automated computational method that inherits the efficiency of the neighbor joining algorithm. This tree supports the one proposed in the latest review on metazoan phylogeny. Our main goal is not to discuss aspects of the phylogeny itself, but rather to point out that PNJ can be a valuable tool when the basal branching pattern of a large phylogenetic tree must be estimated, whereas traditional methods would be computationally impractical.

Keywords: 18S rRNA, animals, Articulata, bootstrap, Coelomata, Deuterostomia, Ecdysozoa, large tree, Lophotrochozoa, Metazoa, phylogenetics, phylogeny, Platyzoa, PNJ, probabilistic sequence profile, ProfDist, profile neighbor-joining, supertree method, systematics


Reconstructing the deep metazoan phylogeny has been and still remains a challenging task for standard phylogenetic methods, mainly because of the large number of species to consider. The most recent comprehensive review of Halanych [1] delineates in detail how the view of metazoan evolution has changed over the past decades towards the now widely accepted so-called modern synthesis. We should point out that this is essentially a hand-crafted phylogeny without any calculated support, derived by merging the results obtained by small scale studies using different methods (neighbor joining (NJ), maximum parsimony (MP), maximum likelihood (ML), Bayesian, or supertree approaches), which on their part use morphological or molecular data each from only a few representative taxa.

This motivates the question whether the modern synthesis can be derived from large scale datasets (>1,000 sequences), e. g., from the metazoan 18S (small subunit) rRNA genes by an essentially automatic computational method. The large number of species makes the most accurate methods, i. e., maximum likelihood or Bayesian methods, computationally impractical, particularly if a bootstrap analysis is conducted and because they cannot make efficient use of prior information concerning known monophyletic taxa.

On the positive side, about 35 subclades of the deep metazoan phylogeny, each containing up to several hundreds of taxa, are well known and also well supported by bootstrap analysis; so the main challenge is to determine the basal branching pattern of the tree. Therefore, ideally, we desire a computationally efficient method that is able to use the prior knowledge and produces accurate and robust trees.

We recently developed the profile neighbor-joining method (PNJ [2, 3]) and its software implementation ProfDist [4] that fulfills all of the above desiderata by using sequence profiles (probabilistic sequence models) to represent known subclades. The method combines the NJ algorithm with profile-based maximum likelihood (or alternatively, LogDet) distance estimation. It has so far been validated in a detailed study on the Chlorophyceae [2], where it produced both robust and accurate trees. The basal branching pattern of almost 6,000 eukaryotic 18S SSU rRNA genes in nine monophyletic groups was estimated in [3]. One of these nine subclades represents the Metazoa, whose substructure was not further considered. These results and the recent review on the challenges of metazoan phylogeny by Halanych [1] suggest to reconstruct the basal branching pattern of the metazoan rRNA genes and to compare the result to the handcrafted modern synthesis.


From the European ribosomal RNA database [5] we obtained all 1,766 available metazoan 18S rRNA gene sequences in pre-aligned EMBL format (afterwards hand-corrected), as well as additional information about the taxonomic lineage of each sequence. In contrast to the global eukaryotic tree derived in [3], we sub-selected metazoan taxa to achieve a compromise between richness in taxon sampling and robustness, removing potentially problematic taxa as follows.

To avoid Long Branch Attraction (LBA), all long branches were first removed, and then added one by one, beginning with the shortest. This process was conducted until a new added lineage attracted a previously positioned lineage, in such a way that both lineages where attracted by the long branching outgroup. Additionally, trees were reconstructed by taxon deletion experiments, i.e., by using only slowly evolving sequences for Nematoda and Platyhelminthes in order to overcome LBA of these groups [6].

Using 36 pre-defined monophyletic clades (1,269 sequences) according to the review of Halanych [1] and their deduced sequence profiles, we ran ProfDist 0.9.5 with the iterative version of the PNJ algorithm. We use a minimal bootstrap confidence threshold of 51% that an estimated monophyletic group has to achieve before it is considered trustworthy and represented as a superprofile in the next iteration. This procedure, due to increasing bootstrap values, continues until no further profiles are generated. Moreover, an identity threshold of 90% for immediate profile formation in the first iteration was used.

The substitution model was estimated from the dataset and is available on request. The estimated evolutionary distances are based on the LogDet transformation that deals gracefully with possible lineage-specific changes in the substitution model [7, 8, 9].

Including bootstrap analysis based on 1,000 pseudo-replicates, the runtime of the whole reconstruction was about 9 minutes on a single-processor (Pentium 4®, 3 GHz, 1024 MB RAM).


The basal branching pattern of the ProfDist tree showed strong support for the modern synthesis supporting the Bilateria consisting of the Deuterostomia (including Hemichordata, Echinodermata, Tunicata, Cephalochordata and Craniata), the Ecdysozoa (including Kinorhyncha, Priapulida, Onychophora, Tardigrada, Nematomorpha, Nematoda, Chelicerata, Myriapoda, Crustacea, Hexapoda and Pentastomida), and the Lophotrochozoa (including Platyhelminthes, Gastrotricha, Cycliophora, Rotifera, Entoprocta, Bryozoa, Annelida, Nemertea, Sipuncula, Mollusca, Echiuridae, Siboglinidae (Vestimentifera, Pogonophora), Brachiopoda and Phoronida), and no support for, e. g., either Coelomata or Articulata (Fig. 1).

Figure 1: Phylogenetic tree of 36 metazoan taxa (1,269 sequences) calculated with ProfDist. The tree was obtained after three iterations. Numbers at nodes indicate bootstrap values greater than 50%. Numbers in brackets denote the number of sequences in the respective subtree. Note the high bootstrap values of 94%, 87% and 84% for the Deuterostomia, Ecdysozoa and Lophotrochozoa, respectively. Platyzoa, at the basis of the Lophotrochozoa are polyphyletic (indicated by a dashed line). Color code - green: Deuterostomia; red: Ecdysozoa; blue: Lophotrochozoa. (Animal pictures © BIODIDAC).

However, the phylogenetic position of some long branches (Gnathostomulida, Dicyemida, Myxozoa, Myzostomida, Chaetognatha, and Orthonectida), which had to be excluded from the phylogenetic analysis to avoid LBA phenomena, remained unclear. These taxa clustered together at the basis of the Bilateria, and it is not clear whether this is due to LBA or if some of these lineages are indeed basal. The positions of slowly evolving sequences from Acanthocephala and Acoelomorpha corresponded well with the ones proposed by Halanych [1], however with low bootstrap support. Furthermore, the order of divergence of sponges, ctenophorans, cnidarians, and placozoans could not be determined beyond doubt.


The results of any phylogenetic study depend on the chosen marker and on the taxon coverage of each subclade. Here we focused on taxon-rich sampling by choosing the 18S rRNA gene as phylogenetic marker; no other marker is known in a comparable number of metazoan taxa. It is conceivable that using more or different markers and fewer taxa may give different results. The goal of this communication, however, is not to argue about the phylogeny, but to point out that the profile-neighbor joining method is a valuable tool particularly for estimating the basal branching pattern of large phylogenetic trees. Additionally, monophyletic groups that are in question can easily be subdivided into new profiles, and a new ProfDist tree can be quickly calculated. In this way, many phylogenetic scenarios may be compared.

To summarize, this is the first study to suggest the feasibility of reconstructing a robust phylogenetic tree of the Metazoa, consisting of almost 1,500 taxa, by an efficient automated computational method. The ProfDist tree supports the manually crafted one from the latest review on metazoan phylogeny [1].


We thank Joachim Friedrich for fruitful discussions.


  1. Halanych, K. M. (2004). The new view of animal phylogeny. Annu. Rev. Ecol. Syst. 35, 229-256.

  2. Müller, T., Rahmann, S., Dandekar, T. and Wolf, M. (2004). Accurate and robust phylogeny estimation based on profile distances: a study of the Chlorophyceae (Chlorophyta). BMC Evol. Biol. 4, 20.

  3. Rahmann, S., Müller, T., Dandekar, T. and Wolf, M. (2006). Efficient and robust analysis of large phylogenetic datasets. In: Advanced Data Mining Techonolgies in Bioinformatics, Hsu, H. H. (ed.), Idea Group Publising, Hershey USA, pp. 104-117.

  4. Friedrich, J., Dandekar, T., Wolf, M. and Müller, T. (2005). ProfDist: a tool for the construction of large phylogenetic trees based on profile distances. Bioinformatics 21, 2108-2109.

  5. Wuyts, J., Perrière, G. and Van De Peer, Y. (2004). The European ribosomal RNA database. Nucleic Acids Res. 32, D101-103.

  6. Aguinaldo, A. M., Turbeville, J. M., Linford, L. S., Rivera, M. C., Garey, J. R., Raff, R. A. and Lake, J. A. (1997). Evidence for a clade of nematodes, arthropods and other moulting animals. Nature 387, 489-493.

  7. Lake, J. A. (1994). Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc. Natl. Acad. Sci. USA 91, 1455-1459.

  8. Lockhart, P. J., Steel, M. A., Hendy, M. D. and Penny, D. (1994). Recovering evolutionary trees under a more realistic model of sequence evolution. Mol. Biol. Evol. 11, 605-612.

  9. Steel, M. (1994). Recovering a tree from the leaf colourations it generates under a Markov model. Appl. Math. Lett. 7, 19-24.