Fan, ZY; Zhan, QM; Yang, C; Liu, HM; Bilal, M (2020). Estimating PM2.5 Concentrations Using Spatially Local Xgboost Based on Full-Covered SARA AOD at the Urban Scale. REMOTE SENSING, 12(20), 3368.

The adverse effects caused by PM2.5 have drawn extensive concern and it is of great significance to identify its spatial distribution. Satellite-derived aerosol optical depth (AOD) has been widely used for PM2.5 estimation. However, the coarse spatial resolution and the gaps caused by data deficiency impede its better application at the urban scale. Additionally, obtaining accurate results in unsampled spatial areas when PM2.5 ground sites are insufficient and distribute sparsely is also a challenging issue for PM2.5 spatial distribution estimation. This paper aimed to develop a model, i.e., spatially local extreme gradient boosting (SL-XGB), combining the powerful fitting ability of machine learning and optimal bandwidths of local models, to better estimate PM2.5 concentration at the urban scale by using Beijing as the study area. This paper adopted simplified high-resolution MODIS aerosol retrieval algorithm (SARA) AOD at 500 m resolution as the major independent variable, hence, ensuring the estimation can be operated at a fine scale. Moreover, the extreme gradient boosting (XGBoost) model was adopted to fill the gaps in SARA AOD, thus improving its availability. Then, based on full-covered SARA AOD and other multisource data, the SL-XGB model, integrating multiple local XGBoost models and particular optimal bandwidths, was trained to estimate PM2.5 concentration. For comparison, SL-XGB and two other models, XGBoost and geographically weighted regression (GWR), were evaluated by 10-fold cross validation (CV). The sample-based CV results reveal that the SL-XGB performed the best as assessed through R-2 (0.88), root mean square error (RMSE = 24.08 mu g/m(3)) and mean prediction error (MPE = 16.90 mu g/m(3)). Additionally, SL-XGB also performed the best in the site-based CV with a R-2 of 0.86, a RMSE of 26.15 mu g/m(3) and a MPE of 17.97 mu g/m(3), which shows its good spatial generalization ability. These results demonstrate that SL-XGB can better simultaneously handle non-linear and spatial heterogeneity issues despite spatially limited data at the urban scale. As far as the PM2.5 concentration distribution was concerned, it presented a gradient increase in PM2.5 concentrations from the northwest to the southeast in Beijing, with abundant spatial details. Overall, the proposed approach for PM2.5 estimation showed outstanding performance and can support preventive pollution control and mitigation at the urban scale.