Wang, F; Shi, Z; Biswas, A; Yang, ST; Ding, JL (2020). Multi-algorithm comparison for predicting soil salinity. GEODERMA, 365, 114211.

Soil salinization is one of the most predominant processes responsible for land degradation globally. However, monitoring large areas presents significant challenges due to strong spatial and temporal variability. Environmental covariates show promise in predicting salinity over large areas provided a reasonable relationship is developed with field measured salinity at few points. While simple regression-based approaches to complex data mining methods have been used in the prediction, a comprehensive comparison of their performances has not been explored, leading to uncertainty in which algorithms to select. This study compares thirteen popularly and non-popularly used algorithms and their performances following four criteria in predicting soil salinity from environmental covariates from Kuqa Oasis from Xinjiang, China. The environmental covariates used for the prediction include principal components of Landsat satellite images at multiple spectral bands, climate factors (referring to land surface temperature), vegetation indices, salinity and soil-related indices, soil moisture indices, DEM derived indices, land use, landform and soil type and categorized them under parameter categories of the SCORPAN (S, soils; C, climate; O, organisms, biotic factor; R, relief; P, parent material; A, age; and N, space) model. The predictive relationships were developed using the algorithms including some previously used ones such as Multiple Linear regression (MLR), Multi-Layer Perceptron-Artificial Neural Network (MLP-ANN), Stochastic Gradient Treeboost (SGT), M5 Model Tree (M5), Multivariate Adaptive Regression Splines (MARS), Classification and Regression Tree (CART), Random Forest (RF), and Support Vector Regression (SVR) and some that have not previously been used in predicting salinity such as Alternating Model Tree (ATM), Gaussian Processes Regression (GPR), Gaussian Radial Basis Functions (GRBF), Least Median Squared Linear Regression (LMSLR), and Reduced Error Pruning Tree (REPTree). Here, 5-fold cross-validation and an independent dataset (30% overall samples) at three depths, 010 cm, 1030 cm, 3050 cm, were used for parameter optimization and evaluating the performance of algorithms. The performances of these algorithms were compared against multiple criteria, including the parameterization, error level/fitting accuracy (determination coefficient, R2; root mean squared error, RMSE), stability (based on the Pearson correlation coefficient, R; mean absolute percent error, MAPE; root mean squared error, RMSE; Lins concordance correlation coefficient, LCCC) and computational efficiency of the algorithms. Finally, the result showed that CSRI is most important parameter for the prediction of soil salinity at the 010 cm and 1030 cm depths, whereas for the 3050 cm depth interval, VD was the most important predictor. For depths of 010 cm, 1030 cm and 3050 cm across all models, the model R2 values ranged from 0.60 to 0.74, 0.15 to 0.31, and 0.30 to 0.47, and the RMSE values ranged from 18.87 to 23.49 dS m-1, 9.94 to 13.48 dS m-1 and 3.79 to 7.11 dS m-1. The optimal algorithms at three depths of 010 cm, 1030 cm and 3050 cm are RF, M5 and GRBF with considering accuracy and stability. After a comprehensive assessment of algorithm performance, we recommend RF for mapping salinity in an arid environment such as that of Xinjiang and elsewhere globally. However, there is no algorithm that can perform ideally for all datasets. Therefore, we suggest that the algorithm should be carefully chosen according to the purposes of the study.