SPXYE: an improved method for partitioning training and validation sets
This study aimed to propose a sample selection strategy termed SPXYE (sample set partitioning based on joint X–Y–E distances) for data partition in multivariate modeling, where training and validation sets are required. This method was applied to choose the training set according to X (the independe...
Saved in:
Published in | Cluster computing Vol. 22; no. Suppl 2; pp. 3069 - 3078 |
---|---|
Main Authors | , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.03.2019
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This study aimed to propose a sample selection strategy termed SPXYE (sample set partitioning based on joint X–Y–E distances) for data partition in multivariate modeling, where training and validation sets are required. This method was applied to choose the training set according to
X
(the independent variables),
Y
(the dependent variables), and
E
(the error of the preliminarily calculated results with the dependent variables) spaces. This selection strategy provided a valuable tool for multivariate calibration. The proposed technique SPXYE was applied to three household chemical molecular databases to obtain training and validation sets for partial least squares (PLS) modeling. For comparison, the training and validation sets were also generated using random sampling, Kennard–Stone, and sample set partitioning based on joint X–Y distances methods. The predictions of all associated PLS regression models were performed upon the same testing set, which was different from either the training set or the validation set. The results indicated that the proposed SPXYE strategy might serve as an alternative partition strategy. |
---|---|
ISSN: | 1386-7857 1573-7543 |
DOI: | 10.1007/s10586-018-1877-9 |