Chen, ZY; Zhang, TH; Zhang, R; Zhu, ZM; Yang, J; Chen, PY; Ou, CQ; Guo, YM (2019). Extreme gradient boosting model to estimate PM2.5 concentrations with missing-filled satellite data in China. ATMOSPHERIC ENVIRONMENT, 202, 180-189.
Abstract
Several studies have attempted to predict ground PM2.5 concentrations using satellite aerosol optical depth (AOD) retrieval. However, over 70%-90% of aerosol retrievals are non-random missing, which limits and biases the estimation. To the best of our knowledge, this issue has not been well resolved to date. The aim of this study was to develop an interpolation technique to handle the missing data retrieval problem and to estimate the daily PM2.5 for a high coverage dataset with 3-km resolution in China by fitting the complex temporal and spatial variations. We developed a two-step interpolation method (i.e., the mixed-effect model and inverse distance weighting technology) to replace the missing values in AOD. Next, the extreme gradient boosting (XGBoost) technique that includes a non-linear exposure-lag-response model (NELRM) was proposed and validated to estimate the daily levels of PM2.5 across China during 2014-2015. After two steps of interpolation, the missing value rate of daily AOD data was reduced from 87.91% to 13.83%. The cross-validation (CV) R-square, root mean square error (RMSE) and mean absolute percentage prediction error (MAPE) of the interpolation were 0.76, 0.10 and 21.41%, respectively. The cross-validation for the prediction of daily PM2.5 resulted in R-2 = 0.86, RMSE = 14.98, and MAPE = 23.72%. The results of this study indicate that the two-step interpolation method can largely resolve the non-random missing data problem and that the combined XGBoost methods have a good ability to estimate fine particulate matter concentrations.
DOI:
10.1016/j.atmosenv.2019.01.027
ISSN:
1352-2310