DEVELOPING MACHINE LEARNING ALGORITHMS FOR PREDICTING SOYBEAN YIELD BASED ON WEATHER AND SOIL DATA
DOI:
https://doi.org/10.64038/eatf.01.2024.2Keywords:
Soybean Yield Prediction, Machine Learning, Environmental Data Integration, Random Forest, Model Interpretability, Precision AgricultureAbstract
Accurate forecasting of soybean (Glycine max) yield under variable environmental conditions is essential for optimizing management decisions and ensuring food security. In this study, we developed a hybrid machine learning pipeline that integrates weather (cumulative growing‐season precipitation, mean temperature, solar radiation, humidity) and soil (moisture, organic matter, pH, texture) data using principal component analysis and recursive feature elimination. We trained Random Forest, support vector machine, and deep neural network models on a multi‐year (2021–2023), multi‐region dataset and evaluated performance via five‐fold cross‐validation and independent test sets. Random Forest consistently outperformed alternatives, achieving a lowest test‐set RMSE of 1.15 t/ha, MAE of 0.82 t/ha, and R² up to 0.87, while five‐fold MAE ranged 0.83–0.90 t/ha. Regional assessments revealed the East region had the highest accuracy and the West the greatest error variance. SHAP‐based analysis ranked cumulative precipitation and mean temperature as the top drivers of yield variability, supported by feature‐importance bar charts, scatter plots of predicted versus actual yields, and error‐distribution visualizations. Correlation heatmaps confirmed low to moderate collinearity among key predictors, validating the benefit of multi‐modal data fusion. Our approach demonstrates robust, interpretable yield forecasting and can be transferred to other crops and regions with local calibration. This decision‐support tool offers stakeholders a scalable solution for enhancing soybean productivity under climatic uncertainty.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Muhammad Bilal, Muhammad Danial Ahmad Qureshi (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.







