Double Digest Revisited: Complexity and Approximability in the Presence of Noisy Data
Authors: Mark Cieliebak, Stephan Eidenbenz, and Gerhard J. Woeginger
Reference: Technical Report 382, ETH Zurich, Department of Computer Science, November 2002
Download: Postscript (.ps) or PDF (.pdf)
We revisit the Double Digest problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes: we first show that double digest is strongly NP-complete, improving previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites turns out to be strongly NP-complete. In a second part, we model errors in data as they occur in real-life experiments: we propose several optimization variations of Double Digest that model partial cleavage errors, which occur for various reasons. We then show APX-completeness for most of these variations. In a third part, we investigate these variations with the additional restriction that conincident cut sites are disallowed and we show that it is NP-hard to even find feasible solutions in this case, thus making it impossible to guarantee any approximation ratio at all.

Note: electronic versions may not always correspond exactly to printed versions.