Zhou, HW; Pan, HD; Li, S; Lv, XQ (2020). Application of Empirical Orthogonal Function Interpolation to Reconstruct Hourly Fine Particulate Matter Concentration Data in Tianjin, China. COMPLEXITY, 2020, 9724367.

Fine particulate matter with diameters less than 2.5 mu m (PM2.5) concentration monitoring is closely related to public health, outdoor activities, environmental protection, and other fields. However, the incomplete PM2.5 observation records provided by ground-based PM2.5 concentration monitoring stations pose a challenge to the study of PM2.5 propagation and evolution model. Consequently, PM2.5 concentration data imputation has been widely studied. Based on empirical orthogonal function (EOF), a new spatiotemporal interpolation method, EOF interpolation (EOFI) is introduced in this paper, and then, EOFI is applied to reconstruct the hourly PM2.5 concentration records of two stations in the first half of the year. The main steps of EOFI here are to firstly decompose the spatiotemporal data matrix of the original observation site into mutually orthogonal temporal and spatial modes with EOF method. Secondly, the spatial mode of the missing data station is estimated by inverse distance weighting interpolation of the spatial mode of the observation sites. After that, the records of the missing data station can be reconstructed by multiplying the estimated spatial mode and the corresponding temporal mode. The optimal mode number for EOFI is determined by minimizing the root mean square error (RMSE) between reconstructed records and corresponding valid records. Finally, six evaluation indices (mean absolute error (MAE), RMSE, correlation coefficient (Corr), deviation rate bias, Nash-Sutcliffe efficiency (NSE), and index of agreement (IA)) are calculated. The results show that EOFI performs better than the other three interpolation methods, namely, inverse distance weight interpolation, thin plate spline, and surface spline interpolation. The EOFI has the advantages of less computation, less parameter selection, and ease of implementation, it is an alternative method when the number of observation stations is rare, and the proportion of missing value at some stations is large. Moreover, it can also be applied to other spatiotemporal variables interpolation and imputation.