SPXYE: an improved method for partitioning training and validation sets

This study aimed to propose a sample selection strategy termed SPXYE (sample set partitioning based on joint X–Y–E distances) for data partition in multivariate modeling, where training and validation sets are required. This method was applied to choose the training set according to X (the independe...

Full description

Saved in:

Bibliographic Details
Published in	Cluster computing Vol. 22; no. Suppl 2; pp. 3069 - 3078
Main Authors	Gao, Ting, Hu, Lina, Jia, Zhizhen, Xia, Tianna, Fang, Chao, Li, Hongzhi, Hu, LiHong, Lu, Yinghua, Li, Hui
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2019 Springer Nature B.V
Subjects	Algorithms Calibration Computer Communication Networks Computer Science Datasets Dependent variables Independent variables Modelling Multivariate analysis Neural networks Operating Systems Partitioning Processor Architectures Random sampling Regression models Training Variables Sample set partitioning based on joint X–Y distances Chemical databases Kennard–Stone Set partition Partial least squares
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This study aimed to propose a sample selection strategy termed SPXYE (sample set partitioning based on joint X–Y–E distances) for data partition in multivariate modeling, where training and validation sets are required. This method was applied to choose the training set according to X (the independent variables), Y (the dependent variables), and E (the error of the preliminarily calculated results with the dependent variables) spaces. This selection strategy provided a valuable tool for multivariate calibration. The proposed technique SPXYE was applied to three household chemical molecular databases to obtain training and validation sets for partial least squares (PLS) modeling. For comparison, the training and validation sets were also generated using random sampling, Kennard–Stone, and sample set partitioning based on joint X–Y distances methods. The predictions of all associated PLS regression models were performed upon the same testing set, which was different from either the training set or the validation set. The results indicated that the proposed SPXYE strategy might serve as an alternative partition strategy.
ISSN:	1386-7857 1573-7543
DOI:	10.1007/s10586-018-1877-9