Publications

Ramo, R; Garcia, M; Rodriguez, D; Chuvieco, E (2018). A data mining approach for global burned area mapping. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 73, 39-51.

Abstract
Global burned are algorithms provide valuable information for climate modellers since fire disturbance is responsible of a significant part of the emissions and their related impact on humans. The aim of this work is to explore how four different classification algorithms, widely used in remote sensing, such as Random Forest (RF), Support Vector Machine (SVM), Neural Networks (NN) and a well-known decision tree algorithm (C5.0), for classifying burned areas at global scale through a data mining methodology using 2008 MODIS data. A training database consisting of burned and unburned pixels was created from 130 Landsat scenes. The resulting database was highly unbalanced with the burned class representing less than one percent of the total. Therefore, the ability of the algorithms to cope with this problem was evaluated. Attribute selection was performed using three filters to remove potential noise and to reduce the dimensionality of the data: Random Forest, entropy-based filter, and logistic regression. Eight out of fifty-two attributes were selected, most of them related to the temporal difference of the reflectance of the bands. Models were trained using an 80% of the database following a ten-fold approach to reduce possible overfitting and to select the optimum parameters. Finally, the performance of the algorithms was evaluated over six different regions using official statistics where they were available and benchmark burned area products, namely MCD45 (V5.1) and MCD64 (V6). Compared to official statistics, the best agreement was obtained by MCD64 (OE = 0.15, CE = 0.29) followed by RF (OE = 0.27, CE = 0.21). For the remaining three areas (Angola, Sudan and South Africa), RF (OE = 0.47, CE = 0.45) yielded the best results when compared to the reference data. NN and SVM showed the worst performance with omission and commission error reaching 0.81 and 0.17 respectively. SVM and NN showed higher sensitivity to unbalanced datasets, as in the case of burned area, with a clear bias towards the majority class. On the other hand, tree based algorithms are more robust to this issue given their own mechanisms to deal with big and unbalanced databases.

DOI:
10.1016/j.jag.2018.05.027

ISSN:
0303-2434