Subjects classification from high-dimensional and small-sample size datasets using a strategy based on Clustering Variables around Latent Components (CLV) method

High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations frequently are needed for it. Here it is examined a method for small samples based on clustering variables around latent variables (CLV) to subject cl...

Full description

Saved in:
Bibliographic Details
Main Author Abramov, Dimitri Marques
Format Journal Article
LanguageEnglish
Published 14.06.2017
Subjects
Online AccessGet full text
DOI10.48550/arxiv.1706.04633

Cover

Loading…
More Information
Summary:High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations frequently are needed for it. Here it is examined a method for small samples based on clustering variables around latent variables (CLV) to subject classification in two presumed groups. For it, a predictive model was developed to generate datasets with two groups of cases whose variables show randomness features (up to 30% of variables manifest difference between groups, and up to 7% of those are correlated between them). The method recovered the information of the latent factors to classify the subjects with 80 to 95% of agreement, with positive relationship between the classifier precision and the rate [number of variables / number of subjects].
DOI:10.48550/arxiv.1706.04633