Predictive Crop Yield Modeling and Soil Quality Classification using Machine Learning

Background: Soil condition and yield are essential practices in contemporary farming, having a direct impact on food production efficiency and balanced resources consumption. Inability to measure these parameters with traditional methods usually takes time, is expensive and beyond the reach of many...

Full description

Saved in:

Bibliographic Details
Published in	Indian journal of science and technology Vol. 18; no. 32; pp. 2594 - 2608
Main Authors	Sadasivan, Manju, Krithika, K Varshini, Shinty, P K, Deepa, A
Format	Journal Article
Language	English
Published	23.08.2025
Online Access	Get full text
ISSN	0974-6846 0974-5645
DOI	10.17485/IJST/v18i32.1135

Cover

Loading…

More Information
Summary:	Background: Soil condition and yield are essential practices in contemporary farming, having a direct impact on food production efficiency and balanced resources consumption. Inability to measure these parameters with traditional methods usually takes time, is expensive and beyond the reach of many farmers. Such constraints demonstrate the necessity of data-based alternatives that are effective. Objectives: The study aims to aid in the development of a comprehensive machine learning system, which will categorize the soil quality grade in some classes, recommend possibly cultivation of crops against a particular soil quality, and an estimation of the crop yield given a few vital soil parameters. The approach integrates classifiers, recommenders and regressors into a single decision tool aimed at delivering sound farmer-friendly decision support in farming and advisory networks of farmers and agronomists. The long-term aim would be to enhance agricultural productivity, optimize the resources and also make data driven decisions easy for farmers. Method: Two Kaggle real world datasets were considered– one for crop recommendation and other for yield prediction. A custom Soil Quality Index (SQI) was developed based on important parameters such as nitrogen, phosphorus, potassium and pH.In both classification and regression, three classifications of machine learning models were deployed; Random Forest, XGBoost, and support vector machines (SVM/SVR). The interpretation of the models and understanding of the data were also achieved with the help of advanced visual analytics (heatmaps, scatter plots, feature importance graphs). Findings: Out of all three models, models with Random Forest Classifier produced the best performance with soil classification accuracy of 97.73% and this was significantly better compared to XGBoost and SVM. Regarding the estimation of crop yield, the ensemble and the kernel based regressors were better in the low value error and high prediction performance. Novelty: The study presents a broad and understandable machine learning model where soil quality classification, crop suggestion, and crop yield forecast are implemented together with real-world data. One of the innovations is that it has developed a custom Soil Quality Index (SQI) which combines four factors: nitrogen, phosphorus, potassium, and pH into one with a single, significant value, and, thereby, improving model results and model interpretability. Compared to the usage of many ML models (Random Forest, XGBoost, and SVM/SVR) for classification and regression. The task covered in this work is the completely based on agriculture with the use of real and publically accessible data Keywords: Crop yield, Soil quality, Random Forest, Support Vector Machine, XGBoost, Nitrogen-Phosphorus-Potassium (NPK) Analysis, Precision Farming
ISSN:	0974-6846 0974-5645
DOI:	10.17485/IJST/v18i32.1135