Amaro, RP; Todoroff, P; Christina, M; Duft, DG; Luciano, ACD (2025). Performance evaluation of Sentinel-2 imagery, agronomic and climatic data for sugarcane yield estimation. COMPUTERS AND ELECTRONICS IN AGRICULTURE, 237, 110522.
Abstract
Given the importance of the sugarcane sector, machine learning techniques are being used as an important tool to improve yield estimation. This study aims to select the most relevant predictors from Sentinel-2 imagery, agronomic, and climatic data, using the Random Forest algorithm (RF), to estimate sugarcane yield before the harvest in a mill in the west of Sao Paulo state. We used radiometric bands (Red-edge1 to Red-edge3, Red, NIR, SWIR1, and SWIR2) and vegetation indices from Sentinel-2 multispectral reflectance data (NDVIRE1 to NDVIRE3, EVI, CIRE1 to CIRE3, NDVI, NDWI1, NDWI2, SIWSI, NDMI, SAVI); agronomic data (soil type, number of harvests, variety, slope); climatic and agroclimatic data (temperature, precipitation, radiation, and crop water balance). We built four datasets to create yield estimation models for the mill: (i) the first dataset included all variables; (ii) in the second dataset, the strongly correlated variables from the dataset (i) were removed; (iii) the third dataset included the variables identified by feature selection within the 2nd dataset using RF algorithm's impurity index (best model results); (iv) the fourth dataset, consisting of the 20 highest ranked variables from dataset 1 selected by SHapley Additive exPlanations (SHAP). The models showed R2 values ranging from 0.58 to 0.70 with dataset 3, and the d-Willmott index ranged from 0.83 to 0.89. The most relevant variables for estimating sugarcane yield were the number of harvests, climatic data and vegetation indices that used Red-edge, near-infrared narrow, red and SWIR bands.
DOI:
10.1016/j.compag.2025.110522
ISSN:
1872-7107