dos Santos, RS (2020). Estimating spatio-temporal air temperature in London (UK) using machine learning and earth observation satellite data. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 88, 102066.

Urbanisation generates greater population densities and an increase in anthropogenic heat generation. These factors elevate the urban-rural air temperature (T-a) difference, thus generating the Urban Heat Island (UHI) phenomenon. T-a is used in the fields of public health and epidemiology to quantify deaths attributable to heat in cities around the world: the presence of UHI can exacerbate exposure to high temperatures during summer periods, thereby increasing the risk of heat-related mortality. Measuring and monitoring the spatial patterns of T-a in urban contexts is challenging due to the lack of a good network of weather stations. This study aims to produce a parsimonious model to retrieve maximum T-a (T-max) at high spatio-temporal resolution using Earth Observation (EO) satellite data. The novelty of this work is twofold: (i) it will produce daily estimations of T-max for London at 1 km(2) during the summertime between 2006 and 2017 using advanced statistical techniques and satellite-derived predictors, and (ii) it will investigate for the first time the predictive power of the gradient boosting algorithm to estimate T-max for an urban area. In this work, 6 regression models were calibrated with 6 satellite products, 3 geospatial features, and 29 meteorological stations. Stepwise linear regression was applied to create 9 groups of predictors, which were trained and tested on each regression method. This study demonstrates the potential of machine learning algorithms to predict T-max: the gradient boosting model with a group of five predictors (land surface temperature, Julian day, normalised difference vegetation index, digital elevation model, solar zenith angle) was the regression model with the best performance (R-2 = 0.68, MAE = 1.60 degrees C, and RMSE = 2.03 degrees C). This methodological approach is capable of being replicated in other UK cities, benefiting national heat-related mortality assessments since the data (provided by NASA and the UK Met Office) and programming languages (Python) sources are free and open. This study provides a framework to produce a high spatio-temporal resolution of T-max, assisting public health researchers to improve the estimation of mortality attributable to high temperatures. In addition, the research contributes to practice and policy-making by enhancing the understanding of the locations where mortality rates may increase due to heat. Therefore, it enables a more informed decision-making process towards the prioritisation of actions to mitigate heat-related mortality amongst the vulnerable population.