Publications

Sun, WX; Li, J; Jiang, MH; Yuan, QQ (2023). Supervised and self-supervised learning-based cascade spatiotemporal fusion framework and its application. ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 203, 19-36.

Abstract
Spatiotemporal fusion (STF) is considered an effective way to address the mutual constraints of the spatiotem-poral resolutions of the remote sensing images from a single satellite sensor. Although the deep learning (DL) -based STF methods have shown great potential so far, there are still some deficiencies. Supervised DL-based methods usually train the network on the original scale data with richer spatial information, but they utilize only the data at auxiliary dates to generate the fusion data at prediction dates, which may perform poorly when significant spatiotemporal changes occur between the auxiliary and prediction dates. For self-supervised DL -based methods, the training can focus more on the data at prediction dates, but the network is generally trained on the down-sampled data with less spatial information, causing insufficient spatial structures of the fusion results. Based on the above, we innovatively combine supervised and self-supervised learning and propose a dual-stage cascade STF framework. In the first stage, a model based on supervised learning is trained on the data at auxiliary dates to extract abundant spatial features and obtain the initial fusion results. In the second stage, we utilize a self-supervised strategy to excavate the spatiotemporal features of prediction dates based on the initial fusion results and the observed data at prediction dates. In addition, to alleviate the insufficient consideration of the temporal correlation in existing STF methods, we design a temporal consistency loss function to fully utilize the temporal correlation information between multiple prediction dates to generate more accurate fusion results. The proposed framework can not only realize the STF of remote sensing images but also be well applied to the STF of remote sensing products such as land surface temperature (LST). Based on the quantitative results of three datasets, the proposed method improves the quantitative indices of root mean square error (RMSE), structural similarity (SSIM), erreur relative global adimensionnelle de synthese (ERGAS), peak signal to noise ratio (PSNR), and spectral angle mapper (SAM) by up to 0.2014, 0.0279, 0.6797, 0.7629, and 0.2006 at most compared with the best comparison method, which fully shows the superiority of our method.

DOI:
10.1016/j.isprsjprs.2023.07.022

ISSN:
1872-8235