Multiscale approximation methods (MAME) to locate embedded consecutive subsequences—its applications in statistical data mining and spatial statistics

In statistical data mining and spatial statistics, many problems (such as detection and clustering) can be formulated as optimization problems whose objective functions are functions of consecutive subsequences. Some examples are (1) searching for a high activity region in a Bernoulli sequence, (2)...

Full description

Saved in:
Bibliographic Details
Published inComputers & industrial engineering Vol. 43; no. 4; pp. 703 - 720
Main Author Huo, Xiaoming
Format Journal Article
LanguageEnglish
Published Seoul Elsevier Ltd 01.09.2002
Oxford Pergamon Press
New York, NY Pergamon Press Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In statistical data mining and spatial statistics, many problems (such as detection and clustering) can be formulated as optimization problems whose objective functions are functions of consecutive subsequences. Some examples are (1) searching for a high activity region in a Bernoulli sequence, (2) estimating an underlying boxcar function in a time series, and (3) locating a high concentration area in a point process. A comprehensive search algorithm always ends up with a high order of computational complexity. For example, if a length- n sequence is considered, the total number of all possible consecutive subsequences is ( n+1 2 )≈n 2/2. A comprehensive search algorithm requires at least O( n 2) numerical operations. We present a multiscale-approximation-based approach. It is shown that most of the time, this method finds the exact same solution as a comprehensive search algorithm does. The derived multiscale approximation methods (MAMEs) have low complexity: for a length- n sequence, the computational complexity of an MAME can be as low as O( n). Numerical simulations verify these improvements. The MAME approach is particularly suitable for problems having large size data. One known drawback is that this method does not guarantee the exact optimal solution in every single run. However, simulations show that as long as the underlying subjects possess statistical significance, a MAME finds the optimal solution with probability almost equal to one.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ISSN:0360-8352
1879-0550
DOI:10.1016/S0360-8352(02)00134-1