Finding Submasses in Weighted Strings with Fast Fourier Transform | |
Authors: | Nikhil Bansal, Mark Cieliebak, and Zsuzsanna Lipták |
Reference: | Discrete Applied Mathematics (DAM), Special Issue on Computational Biology, to appear. |
Download: | Postscript (.ps) or PDF (.pdf) |
Abstract: | We study the Submass Finding Problem: Given a string s over a weighted alphabet, i.e., an alphabet Σ with a weight function μ : Σ → N, we refer to a mass M
as a submass of s if s has a substring whose weights sum up to M. Now, for a set
of input masses ${M_1, . . ., M_k}$, we want to find those Mi which are submasses of s,
and return one or all occurrences of substrings with mass M_i. We present efficient
algorithms for both the decision and the search problem. Furthermore, our approach
allows us to compute efficiently the number of different submasses of s. The main idea of our algorithms is to define appropriate polynomials such that we can determine the solution for the Submass Finding Problem from the coefficients of the product of these polynomials. We obtain very efficient running times by using Fast Fourier Transform to compute this product. Our main algorithm for the decision problem runs in time O(μ_s log μ_s), where μ_s is the total mass of string s. Employing methods for compressing sparse polynomials, this runtime can be viewed as O(σ(s) log σ(s)), where σ(s) denotes the number of different submasses of s. In this case, the runtime is independent of the size of the individual masses of characters. |
Remarks: |