Publications

Yu, Y; Malone, BP; Renzullo, LJ; Burton, CA; Tian, SY; Searle, RD; Bishop, TFA; Walker, JP (2025). Spatial Soil Moisture Prediction From In Situ Data Upscaled to Landsat Footprint: Assessing Area of Applicability of Machine Learning Models. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 63, 4505019.

Abstract
The inherent spatial mismatch between satellite-derived and ground-observed near-surface soil moisture (SM) data necessitates cautious interpretation of point-to-pixel comparisons. Although data-driven upscaling of point-scale SM may enable statistically sound comparisons, the uncertainty across a spatial domain was less explored in previous studies. This gap underscores the need of addressing the spatial prediction uncertainties when extrapolating SM information to a broader spatial scale. Accordingly, this study presents a spatial prediction approach integrating machine learning (ML) and spatiotemporal fusion, which enables the characterization of SM variability at the Landsat satellite footprint. Spatially clustered SM from 28 in situ stations was extrapolated to a $100 \times 100$ km area at 100 m resolution over a cross-validation (CV) period (2016-2019) and an independent test period (2020-2021). The area of applicability (AOA), which represents the spatial extent within which a prediction model is considered reliable, was determined for two ML models: random forests (RFs) and extreme gradient boosting (XGB). The AOA of RF and XGB models encompassed 43.1% and 41.5% of the study area, respectively. The spatial SM predictions were further evaluated against multiple independent datasets, including field campaign data, in situ SM from different networks, and satellite retrievals. Specifically, RF-predicted SM achieved a spatial R of 0.62-0.64 against field campaign data, temporal R of 0.84-0.91 against network-recorded data, and spatiotemporal R of 0.87 against SM active passive (SMAP) L2 data during the CV period. SM predictions within the AOA showed markedly lower uncertainties, which were further validated across an extended area ( $300 \times 300$ km) with diverse physiographic conditions. Overall, this study demonstrated the use of AOA in delineating the statistically reliable spatial extent for ML-based SM predictions.

DOI:
10.1109/TGRS.2025.3565818

ISSN:
1558-0644