Optimizing potato yield predictions in Uttar Pradesh, India: a comparative analysis of machine learning models.
Journal:
Scientific reports
Published Date:
Jul 24, 2025
Abstract
Potato as a staple food, plays a crucial role in ensuring a sustainable food supply and mitigating poverty and malnutrition in various regions across the globe. India, specifically holding the second position in global potato production, plays a significant role in the global potato market. The accurate prediction of potato yield is crucial for addressing global food security and sustainable farming practices. Looking into these facts, this study focused on seven districts in Uttar Pradesh, India, viz, Agra, Aligarh, Etawah, Farrukhabad, Firozabad, Hathras and Kannauj. This research aims to compare the performance of five machine learning models-Elastic Net (ELNET), Random Forest, Artificial Neural Network (ANN), Extreme Gradient Boosting (XGBoost) and Support Vector Regression (SVR) to identify the most effective approach for potato yield forecasting. Time series data spanning 16 years (2005-2021) was collected for seven districts in Uttar Pradesh, including potato yields and weather variables. The data was detrended, processed into weather indices, and split into 70% for training and 30% for testing. Each district was treated as a unique case, with models trained and validated independently. The ANN model demonstrated superior performance with the highest R values and the lowest error metrics, establishing it as the most reliable model for potato yield prediction. The overall ranking of the robust model performance can be given as: ANN > XGBoost > Random Forest > ELNET > SVR. The study emphasizes the effectiveness of ANN for potato yield forecasting and underscores the importance of tailoring models to local conditions to improve accuracy. Based on the ANN model's current performance, future potato yields in the studied districts can be predicted with over 98% accuracy, enabling proactive planning for food supply, market stabilization, and input resource optimization in the coming growing seasons. These findings contribute to advancing machine learning applications in agriculture, offering actionable insights for policymakers and stakeholders to support sustainable agricultural practices.