k-Means NANI: an improved clustering algorithm for Molecular Dynamics simulations
One of the key challenges of -means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as -means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. Ho...
Saved in:
Published in | bioRxiv |
---|---|
Main Authors | , , , , |
Format | Journal Article Paper |
Language | English |
Published |
United States
Cold Spring Harbor Laboratory Press
08.03.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | One of the key challenges of
-means clustering is the seed selection or the initial centroid estimation since the clustering result depends heavily on this choice. Alternatives such as
-means++ have mitigated this limitation by estimating the centroids using an empirical probability distribution. However, with high-dimensional and complex datasets such as those obtained from molecular simulation,
-means++ fails to partition the data in an optimal manner. Furthermore, stochastic elements in all flavors of
-means++ will lead to a lack of reproducibility.
-means
-Ary Natural Initiation (NANI) is presented as an alternative to tackle this challenge by using efficient
-ary comparisons to both identify high-density regions in the data and select a diverse set of initial conformations. Centroids generated from NANI are not only representative of the data and different from one another, helping
-means to partition the data accurately, but also deterministic, providing consistent cluster populations across replicates. From peptide and protein folding molecular simulations, NANI was able to create compact and well-separated clusters as well as accurately find the metastable states that agree with the literature. NANI can cluster diverse datasets and be used as a standalone tool or as part of our MDANCE clustering package. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Working Paper/Pre-Print-1 content type line 23 |
ISSN: | 2692-8205 2692-8205 |
DOI: | 10.1101/2024.03.07.583975 |