Noisy Data Make the Partial Digest Problem NP-hard | |
Authors: | Mark Cieliebak, Stephan Eidenbenz, and Paolo Penna |
Reference: | In "Proceedings of the 3rd Workshop on Algorithms in Bioinformatics (WABI 2003)", Budapest, Hungary, September 2003. Springer, LNBI 2812, pp. 111-123, 2003 |
Download: | Postscript (.ps) or PDF (.pdf) |
Abstract: |
The problem to find the coordinates of n points
on a line such that the pairwise distances of the points form a given
multi-set of $n \choose 2$ distances is known as Partial Digest
problem, which occurs for instance in DNA physical mapping and de novo
sequencing of proteins. Although Partial Digest was -- as a combinatorial
problem -- already proposed in the 1930's, its computational complexity
is still unknown. In an effort to model real-life data, we introduce two
optimization variations of Partial Digest that model two different error
types that occur in real-life data. First, we study the computational
complexity of a minimization version of Partial Digest in which only a
subset of all pairwise distances is given and the rest are lacking due
to experimental errors. We show that this variation is NP-hard to solve
exactly. This result answers an open question posed by Pevzner (2000).
We then study a maximization version of Partial Digest where a superset
of all pairwise distances is given, with some additional distances due
to inaccurate measurements. We show that this maximization version is
NP-hard to approximate to within a factor of $|D|^{\frac{1}{2} -\varepsilon}$
for any $\varepsilon > 0$, where $|D|$ is the number of input distances.
This inapproximability result is tight up to low-order terms as we give
a trivial approximation algorithm that achieves a matching approximation
ratio.
|
Remarks: |