DADApy: Distance-based analysis of data-manifolds in Python

DADApy is a Python software package for analyzing and characterizing high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering, and for comparing different distance metrics. We review the main func...

Full description

Saved in:
Bibliographic Details
Published inPatterns (New York, N.Y.) Vol. 3; no. 10; p. 100589
Main Authors Glielmo, Aldo, Macocco, Iuri, Doimo, Diego, Carli, Matteo, Zeni, Claudio, Wild, Romina, d’Errico, Maria, Rodriguez, Alex, Laio, Alessandro
Format Journal Article
LanguageEnglish
Published Elsevier Inc 14.10.2022
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:DADApy is a Python software package for analyzing and characterizing high-dimensional data manifolds. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering, and for comparing different distance metrics. We review the main functionalities of the package and exemplify its usage in a synthetic dataset and in a real-world application. DADApy is freely available under the open-source Apache 2.0 license. [Display omitted] •DADApy is a Python software library to characterize data manifolds•DADApy can compute intrinsic dimension, density, cluster structures, and optimal metrics•DADApy is not based on projections and can work also on topologically complex manifolds•DADApy has an easy-to-use Python interface and efficient C-compiled routines Data are often represented via many thousands of features. Fortunately, in most applications, such high-dimensional spaces are very sparsely populated, and data points effectively live on low-dimensional “data manifolds.” This is the key reason behind the success of dimensionality reduction schemes, which, however, cannot be easily deployed on data manifolds with nontrivial geometries and topologies, where a set of coordinates capable of describing the manifold globally cannot exist. In these scenarios, one can analyze the data manifold directly, without an explicit dimensional reduction step, and compute fundamental properties, such as the intrinsic dimension of the manifold and the density of the points lying on it. DADApy implements a set of methods recently developed to this aim. DADApy is easy-to-use as it is written entirely in Python, but also computationally efficient as time-consuming routines are C-compiled through Cython. Real-world data are typically represented by high-dimensional features, but live on low-dimensional data manifolds with a great deal of hidden structure. One can analyze such a structure, for instance, by estimating the intrinsic dimension of the manifold, as well as the density of the points lying on it. DADApy collects several algorithms for data manifolds characterization that have already proven effective in specific applications, aims to popularize them, and to make them available for data-science practitioners.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Lead contact
ISSN:2666-3899
2666-3899
DOI:10.1016/j.patter.2022.100589