Assembling reads gained by shotgun sequencing is a nontrivial task regarding the genome complexity of higher organisms and regarding the fact that sequencing itself is an errorprone chemical process. We present an actual snapshot of our work on a new assembler, which is part of an interdisciplinary effort in solving problems arising in sequence assembly and sequence finishing when using shotgun sequencing.
An effective assembly method for shotgunsequenced DNA has been developed which reduces the amount of editing steps to reconstruct the original DNA. Our approach for assembling contigs is based on the insight that the existing algo rithms work sequentially on a base (and perhaps base quality) oriented assembly and thus do not take into account the potential wealth of information present in the original DNA trace data and in additional, preassembly generated files. We have therefore developed an algorithm that constructs a multiple alignment of shotgun reads, starting with highreliability regions (HRR) and iteratively expanding the assembly with less reliable sequences. The assembler works in conjunction with an automated finisher which can analyse problematic regions in an assembly and propose alternative base calls when needed.
A multiphase concept has been worked out to perform this task:
(1) data pre
processing;
(2) whole shotgun prefiltering for potential readpairs;
(3) systematic
match inspection and quality criteria calculation;
(4) contig assembly and
(5) contig validation.
The assembler will currently be in intensivly tested phase at the Institute of Molec ular Biotechnology (IMB) Jena sequencing centre and -- although it is still being further developed -- has already proven useful when assembling shotgun data with a high proportion of repetitive sequences. We have, for example, successfully as sembled a 142 kilobase contig containing 47 spatially separated ALU sites without errors in the assembly, covering about 98% of the original target sequence using only high quality sequence parts. A description of the latest development state of the MIRA assembler and the EdIt automatic editor can be found at the project's homepage on the Web.