Workflow for the Supervised Learning of Chemical Data: Efficient Data Reduction-Multivariate Curve Resolution (EDR-MCR)
A new method termed efficient data reduction-multivariate curve resolution (EDR-MCR) has been devised for classification of high-dimensional data. The method introduces the coupling of EDR and MCR as a new strategy for data splitting, variable selection, and supervised classification of high dimensi...
Saved in:
Published in | Analytical chemistry (Washington) Vol. 93; no. 12; pp. 5020 - 5027 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
United States
American Chemical Society
30.03.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | A new method termed efficient data reduction-multivariate curve resolution (EDR-MCR) has been devised for classification of high-dimensional data. The method introduces the coupling of EDR and MCR as a new strategy for data splitting, variable selection, and supervised classification of high dimensionality data. The method reduces data dimensionality and selects the training set using principal component analysis (PCA) and convex geometry prior to data classification. Then, the reduced data are categorized using an MCR model, in which numerical constraints are imposed to resolve the data into classes and readily interpretable pure component signal weights. The performance of the EDR and supervised MCR methods were tested for their ability to enable discrimination between the constituents of two benchmark and two high-dimensional data sets. The results were compared with the output of the application of different data splitting methods including iterative random selection (IRS), Kennard–Stone (KS), and discrimination methods including partial least-squares-discriminant analysis (PLS-DA) and the ensemble-learning frameworks of linear discriminant analysis (LDA), k-nearest neighbors (KNN), classification and regression trees (CART), and support vector machine (SVM). Overall, EDR resulted in comparable results with other data splitting methods despite the small size of the training set samples that it created. The proposed MCR approach, in comparison with other commonly used supervised techniques, has the advantages of speed in implementation, tuning of fewer parameters, flexibility in the analysis of data characterized by low sample numbers and class imbalances, improved accuracy from the inclusion of additional system information in the form of numerical constraints, and the ability to resolve pure components signal weights. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0003-2700 1520-6882 |
DOI: | 10.1021/acs.analchem.0c01427 |