Comparative Study of Several Machine Learning Algorithms for Classification of Unifloral Honeys

Unifloral honeys are highly demanded by honey consumers, especially in Europe. To ensure that a honey belongs to a very appreciated botanical class, the classical methodology is palynological analysis to identify and count pollen grains. Highly trained personnel are needed to perform this task, whic...

Full description

Saved in:

Bibliographic Details
Published in	Foods Vol. 10; no. 7; p. 1543
Main Authors	Mateo, Fernando, Tarazona, Andrea, Mateo, Eva María
Format	Journal Article
Language	English
Published	Basel MDPI AG 03.07.2021 MDPI
Subjects	Algorithms Amino acids Artificial neural networks Beekeeping Bees botanical origin Carbohydrates Centroids Citrus fruits Classification Comparative studies Consumers Decision trees Dimensional analysis Discriminant analysis Electrical conductivity Electrical resistivity Eucalyptus Flowers & plants Food Food science Honey Learning algorithms Learning theory Machine learning Moisture content Neural networks NMR Nuclear magnetic resonance Origins Personnel physicochemical parameters Physicochemical properties Plant nectar Pollen Principal components analysis Spectrum analysis Sunflowers Support vector machines Test sets unifloral honeys Water content Central Europe Spain
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Unifloral honeys are highly demanded by honey consumers, especially in Europe. To ensure that a honey belongs to a very appreciated botanical class, the classical methodology is palynological analysis to identify and count pollen grains. Highly trained personnel are needed to perform this task, which complicates the characterization of honey botanical origins. Organoleptic assessment of honey by expert personnel helps to confirm such classification. In this study, the ability of different machine learning (ML) algorithms to correctly classify seven types of Spanish honeys of single botanical origins (rosemary, citrus, lavender, sunflower, eucalyptus, heather and forest honeydew) was investigated comparatively. The botanical origin of the samples was ascertained by pollen analysis complemented with organoleptic assessment. Physicochemical parameters such as electrical conductivity, pH, water content, carbohydrates and color of unifloral honeys were used to build the dataset. The following ML algorithms were tested: penalized discriminant analysis (PDA), shrinkage discriminant analysis (SDA), high-dimensional discriminant analysis (HDDA), nearest shrunken centroids (PAM), partial least squares (PLS), C5.0 tree, extremely randomized trees (ET), weighted k-nearest neighbors (KKNN), artificial neural networks (ANN), random forest (RF), support vector machine (SVM) with linear and radial kernels and extreme gradient boosting trees (XGBoost). The ML models were optimized by repeated 10-fold cross-validation primarily on the basis of log loss or accuracy metrics, and their performance was compared on a test set in order to select the best predicting model. Built models using PDA produced the best results in terms of overall accuracy on the test set. ANN, ET, RF and XGBoost models also provided good results, while SVM proved to be the worst.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2304-8158 2304-8158
DOI:	10.3390/foods10071543