SPXYE: an improved method for partitioning training and validation sets

This study aimed to propose a sample selection strategy termed SPXYE (sample set partitioning based on joint X–Y–E distances) for data partition in multivariate modeling, where training and validation sets are required. This method was applied to choose the training set according to X (the independe...

Full description

Saved in:
Bibliographic Details
Published inCluster computing Vol. 22; no. Suppl 2; pp. 3069 - 3078
Main Authors Gao, Ting, Hu, Lina, Jia, Zhizhen, Xia, Tianna, Fang, Chao, Li, Hongzhi, Hu, LiHong, Lu, Yinghua, Li, Hui
Format Journal Article
LanguageEnglish
Published New York Springer US 01.03.2019
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This study aimed to propose a sample selection strategy termed SPXYE (sample set partitioning based on joint X–Y–E distances) for data partition in multivariate modeling, where training and validation sets are required. This method was applied to choose the training set according to X (the independent variables), Y (the dependent variables), and E (the error of the preliminarily calculated results with the dependent variables) spaces. This selection strategy provided a valuable tool for multivariate calibration. The proposed technique SPXYE was applied to three household chemical molecular databases to obtain training and validation sets for partial least squares (PLS) modeling. For comparison, the training and validation sets were also generated using random sampling, Kennard–Stone, and sample set partitioning based on joint X–Y distances methods. The predictions of all associated PLS regression models were performed upon the same testing set, which was different from either the training set or the validation set. The results indicated that the proposed SPXYE strategy might serve as an alternative partition strategy.
ISSN:1386-7857
1573-7543
DOI:10.1007/s10586-018-1877-9