Faster Pattern Matching under Edit Distance : A Reduction to Dynamic Puzzle Matching and the Seaweed Monoid of Permutation Matrices

We consider the approximate pattern matching problem under the edit distance. Given a text T of length n, a pattern P of length m, and a threshold k, the task is to find the starting positions of all substrings of T that can be transformed to P with at most k edits. More than 20 years ago, Cole and...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS) pp. 698 - 707
Main Authors Charalampopoulos, Panagiotis, Kociumaka, Tomasz, Wellnitz, Philip
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We consider the approximate pattern matching problem under the edit distance. Given a text T of length n, a pattern P of length m, and a threshold k, the task is to find the starting positions of all substrings of T that can be transformed to P with at most k edits. More than 20 years ago, Cole and Hariharan [SODA'98, J. Comput.'02] gave an \mathcal{O}(n+k^{4}\cdot n/m) time algorithm for this classic problem, and this runtime has not been improved since.Here, we present an algorithm that runs in time \mathcal{O}\left(n+ k^{3.5}\sqrt{\log m\log k}\cdot n/m\right), thus breaking through this longstanding barrier. In the case where n^{1/4+\varepsilon}\leq k\leq n^{2/5-\varepsilon} for some arbitrarily small positive constant \varepsilon, our algorithm improves over the state-of-the-art by polynomial factors: it is polynomially faster than both the algorithm of Cole and Hariharan and the classic \mathcal{O}(kn)-time algorithm of Landau and Vishkin [STOC'86, J. Algorithms'89].We observe that the bottleneck case of the alternative \mathcal{O}(n+k^4 \cdot n / m-time algorithm of Charalampopoulos, Kociumaka, and Wellnitz [FOCS'20] is when the text and the pattern are (almost) periodic. Our new algorithm reduces this case to a new Dynamic Puzzle Matching problem, which we solve by building on tools developed by Tiskin [SODA'10, Algorithmica'15] for the so-called seaweed monoid of permutation matrices. Our algorithm relies only on a small set of primitive operations on strings and thus also applies to the fully-compressed setting (where text and pattern are given as straight-line programs) and to the dynamic setting (where we maintain a collection of strings under creation, splitting, and concatenation), improving over the state of the art.
ISSN:2575-8454
DOI:10.1109/FOCS54457.2022.00072