Machine Learning Algorithms Based on Sampling Techniques for Raisin Grains Classification

Raisin grains are among the agricultural commodities that can benefit health. The production of raisin grains needs to be classified to achieve optimal results. In this case, the classification is carried out on two types of grains, namely Kecimen and Besni. However, inaccurate sample data can affec...

Full description

Saved in:

Bibliographic Details
Published in	JOIV : international journal on informatics visualization Online Vol. 7; no. 1; pp. 7 - 14
Main Authors	Bisri, Achmad, Man, Mustafa
Format	Journal Article
Language	English
Published	Politeknik Negeri Padang 2023
Subjects	classification data mining machine learning raisin grains sampling technique
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Raisin grains are among the agricultural commodities that can benefit health. The production of raisin grains needs to be classified to achieve optimal results. In this case, the classification is carried out on two types of grains, namely Kecimen and Besni. However, inaccurate sample data can affect the performance of the model. In this study, two sampling techniques are proposed: stratified and shuffled. The proposed classification model is RF, GBT, NB, LR, and NN. This study aims to identify the performance of classification models based on sampling techniques. Classification models are applied to the seven-features dataset, and modeling is done by cross-validation. The results of the models were tested with a different amount of test data. The performance of the models was evaluated related to accuracy and AUC. The best outcomes of all models based on stratified sampling were founded on tested data of 40 percent with a mean accuracy of 85.50% and an AUC of 0.921. In comparison, models based on shuffled sampling were founded on test data of 20 percent with a mean accuracy of 88.11% and an AUC of 0.935. On the other hand, classification models based on a stratified sampling of all data splits do not all models generate an excellent category. Whereas, based on shuffled sampling, all models resulted in the excellent category. Therefore, models based on shuffled sampling are superior to stratified sampling. The result of the significant test, RF, significantly differs based on sampling techniques.
ISSN:	2549-9610 2549-9904
DOI:	10.30630/joiv.7.1.970