Emergent Mind

Abstract

The prediction of crop yields internationally is a crucial objective in agricultural research. Thus, this study implements 6 regression models (Linear, Tree, Gradient Descent, Gradient Boosting, K Nearest Neighbors, and Random Forest) to predict crop yields in 37 developing countries over 27 years. Given 4 key training parameters, insecticides (tonnes), rainfall (mm), temperature (Celsius), and yield (hg/ha), it was found that our Random Forest Regression model achieved a determination coefficient (r2) of 0.94, with a margin of error (ME) of .03. The models were trained and tested using the Food and Agricultural Organization of the United Nations data, along with the World Bank Climate Change Data Catalog. Furthermore, each parameter was analyzed to understand how varying factors could impact overall yield. We used unconventional models, contrary to generally used Deep Learning (DL) and Machine Learning (ML) models, combined with recently collected data to implement a unique approach in our research. Existing scholarship would benefit from understanding the most optimal model for agricultural research, specifically using the United Nations data.

Overview

  • The study uses regression models to predict crop yields globally by analyzing pesticide use, rainfall, temperature, and yield data.

  • It compares six regression models, with the Random Forest Regression model achieving the highest accuracy with an r2 of 0.94.

  • Data was gathered from the Climate Change Knowledge Portal & FAOSTAT, covering the period from 1960 to 2021, and was thoroughly preprocessed for model use.

  • Findings suggest regression models, particularly Random Forest, can be highly effective in agricultural yield prediction, handling complex and nonlinear relationships.

  • The study recognizes limitations like historical data dependency and challenges in ensemble model interpretability, hinting at future avenues for research improvement.

Understanding the complexities of global agricultural productivity is essential to meet the demands of a growing population under the challenging conditions of climate change. Predicting crop yields has long been a critical issue within agricultural research. A recent study has approached this issue by applying various regression models to predict crop yields across 196 countries, leveraging four key parameters: pesticide use, rainfall, temperature, and yield data.

The study applied six distinct regression models: Linear, Decision Tree, Stochastic Gradient Descent, Gradient Boosting, K-Nearest Neighbors, and Random Forest. These models were selected due to their diverse approaches to analyzing data, which could potentially capture the complex relationships between influencing factors and crop yields. Notably, the Random Forest Regression model shone in its performance, achieving a determination coefficient (r2) of 0.94, indicating a very high level of accuracy in its predictions.

Data was meticulously collected and preprocessed from sources provided by the Climate Change Knowledge Portal & FAOSTAT, spanning from 1960 to 2021. Researchers faced the challenge of merging and standardizing various sets of data to ensure uniformity for the application of the regression models. This process also included an exploratory analysis to identify trends and relationships underlying the raw data, significant for informing the methodology of the study.

The results of the study highlighted the potential of regression models beyond conventional Deep Learning methods for agricultural research. Specifically, the Random Forest model presented itself as a remarkably suitable tool for predicting yields because of its ability to handle the nonlinear relations and mutability of the parameters involved. It also emphasized the importance of understanding the collective influence of temperature, pesticide usage, and precipitation on agricultural outcomes.

However, the study acknowledges certain limitations, such as temporal constraints due to reliance on historical data and the potential complexities of model interpretability when dealing with ensemble methods like Random Forest. This paves the way for future research to improve upon these models, for instance, through real-time data integration and advanced techniques for optimization.

In conclusion, the study adds significant value to agricultural research by providing insights into the most effective models for crop yield prediction using UN data. By encompassing a wide geographic scope and considering a comprehensive set of parameters, the research enhances our understanding of crop production variability and offers a refined analytical tool for addressing the global demand for optimized agricultural productivity in the face of unprecedented climate conditions.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.