Untitled Document

We revisit the Double Digest problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes: we first show that double digest is strongly NP-complete, improving previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites turns out to be strongly NP-complete. In a second part, we model errors in data as they occur in real-life experiments: we propose several optimization variations of Double Digest that model partial cleavage errors, which occur for various reasons. We then show APX-completeness for most of these variations. In a third part, we investigate these variations with the additional restriction that conincident cut sites are disallowed and we show that it is NP-hard to even find feasible solutions in this case, thus making it impossible to guarantee any approximation ratio at all.

	Double Digest Revisited: Complexity and Approximability in the Presence of Noisy Data

Authors:	Mark Cieliebak, Stephan Eidenbenz, and Gerhard J. Woeginger
Reference:	Technical Report 382, ETH Zurich, Department of Computer Science, November 2002
Download:	Postscript (.ps) or PDF (.pdf)
Abstract:	We revisit the Double Digest problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes: we first show that double digest is strongly NP-complete, improving previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites turns out to be strongly NP-complete. In a second part, we model errors in data as they occur in real-life experiments: we propose several optimization variations of Double Digest that model partial cleavage errors, which occur for various reasons. We then show APX-completeness for most of these variations. In a third part, we investigate these variations with the additional restriction that conincident cut sites are disallowed and we show that it is NP-hard to even find feasible solutions in this case, thus making it impossible to guarantee any approximation ratio at all.
Remarks: