Noisy Data Make the Partial Digest Problem NP-hard | |

Authors: | Mark Cieliebak, Stephan Eidenbenz, and Paolo Penna |

Reference: | Technical Report 381, ETH Zurich, Department of Computer Science, October 2002 |

Download: | Postscript (.ps) or PDF (.pdf) |

Abstract: |
The Partial Digest problem -- well-known for its applications
in computational biology and for the intriguingly open status of its computational
complexity -- asks for the coordinates of
n points on a line
such that the pairwise distances of the points form a given multi-set
of $n \choose 2$ distances. In an effort to model real-life data,
we study the computational complexity of a minimization version of Partial
Digest, in which only a subset of all pairwise distances is given and
the rest are lacking due to experimental errors. We show that this variation
is NP-hard to solve exactly, thus making the existence of polynomial-time
algorithms for this problem extremely unlikely. Our result answers an
open question posed by Pevzner (2000). We then study a maximization version
of Partial Digest where a superset of all pairwise distances is given,
with some additional distances due to inaccurate measurements. We show
that this maximization version is NP-hard to approximate to within a factor
of $|D|^{\frac{1}{2} -\epsilon}$ for any $\epsilon > 0$, where $|D|$
is the number of input distances, which implies that polynomial-time algorithms
cannot even guarantee to find a solution for the problem that comes close
to the optimum. Our inapproximability result is tight up to low-order
terms as we give a trivial approximation algorithm that achieves a matching
approximation ratio. Our optimization variations model two different error
types that occur in real-life data. |

Remarks: |

Note: electronic versions may not always correspond exactly to printed versions.