An ensemble approach to determine the number of latent dimensions and assess its reliability

Determining the number of latent dimensions (LD) of a data set is a ubiquitous problem, for which numerous methods have been developed. We compare some of the most effective ones on synthetic data, which allows proper evaluation given that the true number of LD is known. Results show that their perf...

Full description

Saved in:
Bibliographic Details
Published inCommunications in statistics. Simulation and computation Vol. 54; no. 7; pp. 2770 - 2795
Main Authors Neishabouri, Asana, Desmarais, Michel C.
Format Journal Article
LanguageEnglish
Published Taylor & Francis 03.07.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Determining the number of latent dimensions (LD) of a data set is a ubiquitous problem, for which numerous methods have been developed. We compare some of the most effective ones on synthetic data, which allows proper evaluation given that the true number of LD is known. Results show that their performance is sensitive to data set attributes such as sparsity, number of observations in relation to number of features, and underlying feature distributions. Results also show this sensitivity is different across methods. This observation brings us to devise an ensemble technique to combine LD estimates from multiple methods and achieve an estimate that is more reliable than any single method. We also demonstrate that the variance of the estimates across the single methods is a good indicator of the expected loss of the ensemble-based LD estimate. This observation leads, in turn, to deriving a method for the assessment of the reliability of the estimate. Finally, we discuss the practical implications of the findings.
ISSN:0361-0918
1532-4141
DOI:10.1080/03610918.2024.2328166