Noisy Data Make the Partial Digest Problem NPhard  
Authors:  Mark Cieliebak, Stephan Eidenbenz, and Paolo Penna 
Reference:  In "Proceedings of the 3rd Workshop on Algorithms in Bioinformatics (WABI 2003)", Budapest, Hungary, September 2003. Springer, LNBI 2812, pp. 111123, 2003 
Download:  Postscript (.ps) or PDF (.pdf) 
Abstract: 
The problem to find the coordinates of n points
on a line such that the pairwise distances of the points form a given
multiset of $n \choose 2$ distances is known as Partial Digest
problem, which occurs for instance in DNA physical mapping and de novo
sequencing of proteins. Although Partial Digest was  as a combinatorial
problem  already proposed in the 1930's, its computational complexity
is still unknown. In an effort to model reallife data, we introduce two
optimization variations of Partial Digest that model two different error
types that occur in reallife data. First, we study the computational
complexity of a minimization version of Partial Digest in which only a
subset of all pairwise distances is given and the rest are lacking due
to experimental errors. We show that this variation is NPhard to solve
exactly. This result answers an open question posed by Pevzner (2000).
We then study a maximization version of Partial Digest where a superset
of all pairwise distances is given, with some additional distances due
to inaccurate measurements. We show that this maximization version is
NPhard to approximate to within a factor of $D^{\frac{1}{2} \varepsilon}$
for any $\varepsilon > 0$, where $D$ is the number of input distances.
This inapproximability result is tight up to loworder terms as we give
a trivial approximation algorithm that achieves a matching approximation
ratio.

Remarks: 