Subjects classification from high-dimensional and small-sample size datasets using a strategy based on Clustering Variables around Latent Components (CLV) method

High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations frequently are needed for it. Here it is examined a method for small samples based on clustering variables around latent variables (CLV) to subject cl...

Full description

Saved in:
Bibliographic Details
Main Author Abramov, Dimitri Marques
Format Journal Article
LanguageEnglish
Published 14.06.2017
Subjects
Online AccessGet full text
DOI10.48550/arxiv.1706.04633

Cover

Abstract High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations frequently are needed for it. Here it is examined a method for small samples based on clustering variables around latent variables (CLV) to subject classification in two presumed groups. For it, a predictive model was developed to generate datasets with two groups of cases whose variables show randomness features (up to 30% of variables manifest difference between groups, and up to 7% of those are correlated between them). The method recovered the information of the latent factors to classify the subjects with 80 to 95% of agreement, with positive relationship between the classifier precision and the rate [number of variables / number of subjects].
AbstractList High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations frequently are needed for it. Here it is examined a method for small samples based on clustering variables around latent variables (CLV) to subject classification in two presumed groups. For it, a predictive model was developed to generate datasets with two groups of cases whose variables show randomness features (up to 30% of variables manifest difference between groups, and up to 7% of those are correlated between them). The method recovered the information of the latent factors to classify the subjects with 80 to 95% of agreement, with positive relationship between the classifier precision and the rate [number of variables / number of subjects].
Author Abramov, Dimitri Marques
Author_xml – sequence: 1
  givenname: Dimitri Marques
  surname: Abramov
  fullname: Abramov, Dimitri Marques
BackLink https://doi.org/10.48550/arXiv.1706.04633$$DView paper in arXiv
BookMark eNqFj7tOw0AQRbcIRQJ8ABVThsKOIychvQWiSAdKa429Y3uifVg7a0T4G_6UTURPNdKdq6N7FmrmvCOlHtZFvtlvt8UKwxd_5uvnYpcXm11ZztXP-9ScqI0CrUER7rjFyN5BF7yFgfsh02zJScrQADoNYtGYTNCOhkD4m0BjRKHEmIRdDwgSA0bqz9CkXEPCVWaSSOHyPmJgbAwJYPBTAh5S10WovB3TXpc4y-pwfAJLcfD6Tt10aITu_-6tenx9-ajesqtMPQa2GM71Raq-SpX_N34Bu2dc3A
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID EPD
GOX
DOI 10.48550/arxiv.1706.04633
DatabaseName arXiv Statistics
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 1706_04633
GroupedDBID EPD
GOX
ID FETCH-arxiv_primary_1706_046333
IEDL.DBID GOX
IngestDate Tue Jul 22 22:00:33 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_1706_046333
OpenAccessLink https://arxiv.org/abs/1706.04633
ParticipantIDs arxiv_primary_1706_04633
PublicationCentury 2000
PublicationDate 2017-06-14
PublicationDateYYYYMMDD 2017-06-14
PublicationDate_xml – month: 06
  year: 2017
  text: 2017-06-14
  day: 14
PublicationDecade 2010
PublicationYear 2017
Score 3.2533414
SecondaryResourceType preprint
Snippet High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Statistics - Applications
Title Subjects classification from high-dimensional and small-sample size datasets using a strategy based on Clustering Variables around Latent Components (CLV) method
URI https://arxiv.org/abs/1706.04633
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdZ1LT8MwDICtbScuCARovH3gAIfw6CNbj6hiTGjABabeqiRN0aTSoaZDwL_hnxInQ3DZtXEtuVFkO7U_A5yoMCgHWpZMEWfIJiiKCV1yFslLIa6Ssojd1cX9Ax8_R3dZnHUAf3thRPMxe_d8YGkuiO1yTkyrsAvdIKDk6vYx8z8nHYprKf8nZ2NM9-ifkxhtwPoyusNrvx2b0NH1Fnzbw0m3HQYVhapUm-M-B1JrBxIvmBXE2Pd8DLSpPZpXUVXMCEL3opl9aaRKTqOtDipUf0GBxmNlP5H8UIFWXVotiHpAy1ObAVNPlEHR0NwknFjZukU6_fOaaifwNJ1Mz9APkN6G49HNUzpmzqj8zRMocrI3d_aGO9Cr7Yt9QMWTgYhjUfCwjJLhUBYyLGz6xxW3XkqqXeiv0rK3emkf1gLyZDSuJzqAXtss9KH1w608cpvxA3dZkV0
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Subjects+classification+from+high-dimensional+and+small-sample+size+datasets+using+a+strategy+based+on+Clustering+Variables+around+Latent+Components+%28CLV%29+method&rft.au=Abramov%2C+Dimitri+Marques&rft.date=2017-06-14&rft_id=info:doi/10.48550%2Farxiv.1706.04633&rft.externalDocID=1706_04633