Untitled Document

We revisit the Double Digest problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes: we first show that Double Digest is strongly NP-complete, improving upon previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites turns out to be strongly NP-complete. In a second part, we model errors in data as they occur in real-life experiments: we propose several optimization variations of Double Digest that model partial cleavage errors. We then show APX-completeness for most of these variations. In a third part, we investigate these variations with the additional restriction that conincident cut sites are disallowed, and we show that it is NP-hard to even find feasible solutions in this case, thus making it impossible to guarantee any approximation ratio at all.

	Double Digest Revisited: Complexity and Approximability in the Presence of Noisy Data

Authors:	Mark Cieliebak, Stephan Eidenbenz, and Gerhard J. Woeginger
Reference:	In "Proceedings of the 9th International Computing and Combinatorics Conference (COCOON 2003)", Montana, USA, July 2003. Springer, LNCS 2697, pp. 519-527, 2003
Download:	Postscript (.ps) or PDF (.pdf)
Abstract:	We revisit the Double Digest problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes: we first show that Double Digest is strongly NP-complete, improving upon previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites turns out to be strongly NP-complete. In a second part, we model errors in data as they occur in real-life experiments: we propose several optimization variations of Double Digest that model partial cleavage errors. We then show APX-completeness for most of these variations. In a third part, we investigate these variations with the additional restriction that conincident cut sites are disallowed, and we show that it is NP-hard to even find feasible solutions in this case, thus making it impossible to guarantee any approximation ratio at all.
Remarks: