Hu, PH; Pan, X; Yang, YB; Dai, Y; Chen, YC (2025). A Two-Stage Hierarchical Spatiotemporal Fusion Network for Land Surface Temperature With Transformer. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 63, 5002320.
Abstract
The potential applications of high spatiotemporal resolution land surface temperature (LST) products are extensive. However, the tradeoff between spatial and temporal resolutions of remote sensing data has significantly constrained the availability of such LST products. Existing spatiotemporal fusion methods appear to encounter certain limitations. This article proposes a two-stage hierarchical spatiotemporal fusion network (THSTNet) to fuse MODIS LST products and Landsat LST products. THSTNet adopts the shift windows (Swin) transformer architecture and employs a two-stage process to reconstruct the fine-resolution image at the target time. The innovation within THSTNet is multifaceted. It combines spatiotemporal mapping (ST mapping) with deep learning, enhancing the extraction of global information for fusion results; leveraging self-attention computation, it adopts a two-stage structure to improve the understanding of intricate LST changes. In addition, it incorporates a texture converter module aimed at enhancing spatial details within the reconstruction results. Validation of the model's predictions was conducted using actual images and ground observations, affirming the high reliability of THSTNet's predictive outcomes. Compared with two traditional methods [spatial and temporal adaptive reflectance fusion model (STARFM) and enhanced STARFM (ESTARFM)] and four deep learning-based methods [enhanced deep convolutional spatiotemporal fusion network (EDCSTFN), generative adversarial network (GAN)-based spatiotemporal fusion model (GANSTFM), spatiotemporal temperature fusion network (STTFN), and multistream fusion network (MSNet)], THSTNet demonstrated superior performance (average root-mean-square error (RMSE) is below 1.3 K and average structural similarity index (SSIM) is 0.939). The prediction results of THSTNet also maintain high consistency with ground observations (the average RMSE is 2.3 K and the average R-2 is 0.9). The code will be available at https://github.com/HuPengHua2021/THSTNet.
DOI:
10.1109/TGRS.2025.3552577
ISSN:
1558-0644