An ensemble approach to determine the number of latent dimensions and assess its reliability
Determining the number of latent dimensions (LD) of a data set is a ubiquitous problem, for which numerous methods have been developed. We compare some of the most effective ones on synthetic data, which allows proper evaluation given that the true number of LD is known. Results show that their perf...
Saved in:
Published in | Communications in statistics. Simulation and computation Vol. 54; no. 7; pp. 2770 - 2795 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Taylor & Francis
03.07.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Determining the number of latent dimensions (LD) of a data set is a ubiquitous problem, for which numerous methods have been developed. We compare some of the most effective ones on synthetic data, which allows proper evaluation given that the true number of LD is known. Results show that their performance is sensitive to data set attributes such as sparsity, number of observations in relation to number of features, and underlying feature distributions. Results also show this sensitivity is different across methods. This observation brings us to devise an ensemble technique to combine LD estimates from multiple methods and achieve an estimate that is more reliable than any single method. We also demonstrate that the variance of the estimates across the single methods is a good indicator of the expected loss of the ensemble-based LD estimate. This observation leads, in turn, to deriving a method for the assessment of the reliability of the estimate. Finally, we discuss the practical implications of the findings. |
---|---|
ISSN: | 0361-0918 1532-4141 |
DOI: | 10.1080/03610918.2024.2328166 |