A Mapper Algorithm with Implicit Intervals and Its Optimization

The Mapper algorithm is an essential tool for visualizing complex, high-dimensional data in topological data analysis and has been widely used in biomedical research. It outputs a combinatorial graph whose structure encodes the shape of the data. However, the need for manual parameter tuning and fix...

Full description

Saved in:
Bibliographic Details
Published inJournal of computational biology Vol. 32; no. 8; pp. 781 - 796
Main Authors Tao, Yuyang, Ge, Shufei
Format Journal Article
LanguageEnglish
Published United States Mary Ann Liebert, Inc., publishers 01.08.2025
Subjects
Online AccessGet full text
ISSN1557-8666
1557-8666
DOI10.1089/cmb.2024.0919

Cover

Loading…
More Information
Summary:The Mapper algorithm is an essential tool for visualizing complex, high-dimensional data in topological data analysis and has been widely used in biomedical research. It outputs a combinatorial graph whose structure encodes the shape of the data. However, the need for manual parameter tuning and fixed (implicit) intervals, along with fixed overlapping ratios, may impede the performance of the standard Mapper algorithm. Variants of the standard Mapper algorithms have been developed to address these limitations, yet most of them still require manual tuning of parameters. Additionally, many of these variants, including the standard version found in the literature, were built within a deterministic framework and overlooked the uncertainty inherent in the data. To relax these limitations, in this work, we introduce a novel framework that implicitly represents intervals through a hidden assignment matrix, enabling automatic parameter optimization via stochastic gradient descent (SGD). In this work, we develop a soft Mapper framework based on a Gaussian mixture model for flexible and implicit interval construction. We further illustrate the robustness of the soft Mapper algorithm by introducing the Mapper graph mode as a point estimation for the output graph. Moreover, a SGD algorithm with a specific topological loss function is proposed for optimizing parameters in the model. Both simulation and application studies demonstrate its effectiveness in capturing the underlying topological structures. In addition, the application to an RNA expression dataset obtained from the Mount Sinai/JJ Peters VA Medical Center Brain Bank successfully identifies a distinct subgroup of Alzheimer’s Disease. The implementation of our method is available at https://github.com/FarmerTao/Implicit-interval-Mapper.git .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1557-8666
1557-8666
DOI:10.1089/cmb.2024.0919