Subjects classification from high-dimensional and small-sample size datasets using a strategy based on Clustering Variables around Latent Components (CLV) method
High-dimensional complex systems can be studied through multivariate analysis, as Principal Component Analysis, however large samples of observations frequently are needed for it. Here it is examined a method for small samples based on clustering variables around latent variables (CLV) to subject cl...
Saved in:
Main Author | |
---|---|
Format | Journal Article |
Language | English |
Published |
14.06.2017
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.1706.04633 |
Cover
Abstract | High-dimensional complex systems can be studied through multivariate
analysis, as Principal Component Analysis, however large samples of
observations frequently are needed for it. Here it is examined a method for
small samples based on clustering variables around latent variables (CLV) to
subject classification in two presumed groups. For it, a predictive model was
developed to generate datasets with two groups of cases whose variables show
randomness features (up to 30% of variables manifest difference between groups,
and up to 7% of those are correlated between them). The method recovered the
information of the latent factors to classify the subjects with 80 to 95% of
agreement, with positive relationship between the classifier precision and the
rate [number of variables / number of subjects]. |
---|---|
AbstractList | High-dimensional complex systems can be studied through multivariate
analysis, as Principal Component Analysis, however large samples of
observations frequently are needed for it. Here it is examined a method for
small samples based on clustering variables around latent variables (CLV) to
subject classification in two presumed groups. For it, a predictive model was
developed to generate datasets with two groups of cases whose variables show
randomness features (up to 30% of variables manifest difference between groups,
and up to 7% of those are correlated between them). The method recovered the
information of the latent factors to classify the subjects with 80 to 95% of
agreement, with positive relationship between the classifier precision and the
rate [number of variables / number of subjects]. |
Author | Abramov, Dimitri Marques |
Author_xml | – sequence: 1 givenname: Dimitri Marques surname: Abramov fullname: Abramov, Dimitri Marques |
BackLink | https://doi.org/10.48550/arXiv.1706.04633$$DView paper in arXiv |
BookMark | eNqFj7tOw0AQRbcIRQJ8ABVThsKOIychvQWiSAdKa429Y3uifVg7a0T4G_6UTURPNdKdq6N7FmrmvCOlHtZFvtlvt8UKwxd_5uvnYpcXm11ZztXP-9ScqI0CrUER7rjFyN5BF7yFgfsh02zJScrQADoNYtGYTNCOhkD4m0BjRKHEmIRdDwgSA0bqz9CkXEPCVWaSSOHyPmJgbAwJYPBTAh5S10WovB3TXpc4y-pwfAJLcfD6Tt10aITu_-6tenx9-ajesqtMPQa2GM71Raq-SpX_N34Bu2dc3A |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | EPD GOX |
DOI | 10.48550/arxiv.1706.04633 |
DatabaseName | arXiv Statistics arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 1706_04633 |
GroupedDBID | EPD GOX |
ID | FETCH-arxiv_primary_1706_046333 |
IEDL.DBID | GOX |
IngestDate | Tue Jul 22 22:00:33 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_1706_046333 |
OpenAccessLink | https://arxiv.org/abs/1706.04633 |
ParticipantIDs | arxiv_primary_1706_04633 |
PublicationCentury | 2000 |
PublicationDate | 2017-06-14 |
PublicationDateYYYYMMDD | 2017-06-14 |
PublicationDate_xml | – month: 06 year: 2017 text: 2017-06-14 day: 14 |
PublicationDecade | 2010 |
PublicationYear | 2017 |
Score | 3.2533414 |
SecondaryResourceType | preprint |
Snippet | High-dimensional complex systems can be studied through multivariate
analysis, as Principal Component Analysis, however large samples of
observations... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Statistics - Applications |
Title | Subjects classification from high-dimensional and small-sample size datasets using a strategy based on Clustering Variables around Latent Components (CLV) method |
URI | https://arxiv.org/abs/1706.04633 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdZ1LT8MwDICtbScuCARovH3gAIfw6CNbj6hiTGjABabeqiRN0aTSoaZDwL_hnxInQ3DZtXEtuVFkO7U_A5yoMCgHWpZMEWfIJiiKCV1yFslLIa6Ssojd1cX9Ax8_R3dZnHUAf3thRPMxe_d8YGkuiO1yTkyrsAvdIKDk6vYx8z8nHYprKf8nZ2NM9-ifkxhtwPoyusNrvx2b0NH1Fnzbw0m3HQYVhapUm-M-B1JrBxIvmBXE2Pd8DLSpPZpXUVXMCEL3opl9aaRKTqOtDipUf0GBxmNlP5H8UIFWXVotiHpAy1ObAVNPlEHR0NwknFjZukU6_fOaaifwNJ1Mz9APkN6G49HNUzpmzqj8zRMocrI3d_aGO9Cr7Yt9QMWTgYhjUfCwjJLhUBYyLGz6xxW3XkqqXeiv0rK3emkf1gLyZDSuJzqAXtss9KH1w608cpvxA3dZkV0 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Subjects+classification+from+high-dimensional+and+small-sample+size+datasets+using+a+strategy+based+on+Clustering+Variables+around+Latent+Components+%28CLV%29+method&rft.au=Abramov%2C+Dimitri+Marques&rft.date=2017-06-14&rft_id=info:doi/10.48550%2Farxiv.1706.04633&rft.externalDocID=1706_04633 |