This data set includes deforestation maps, located in the border between the west of Brazil and the north of Bolivia (corresponding to Sentinel-2's tile 20LKP). The source images for this dataset came from ESA's Sentinel-2A satellite. They were processed from top of the atmosphere to surface reflectance using the Sen2Cor 2.8 software and their clouds were masked using the algorithm Fmask 4.0.
The K-Fold technique was used to select the best Random Forest (RF) model varying different combinations of Sentinel-2A bands and vegetation indices. The RF models were trained using the time series of 481 samples included in this data set. The two selected models that presented the highest median of F1 score for the Deforestation class were: 1) the combination of the blue, bnir, green, nnir, red, swir1, and swir2 bands (hereafter Bands); and 2) the combination of Enhanced Vegetation Index, Normalized Difference Moisture Index, and Normalized Difference Vegetation Index (hereafter Indices).
Each RF model produced a deforestation map. During training, we used RF models of 1000 trees and the full depth of the Sentinel-2A time series, comprising 36 observations ranging from August 2018 to July 2019.
To assess the map's accuracy, good practices were followed [1]. To determine the validation data set size (n), the user accuracy was conjectured using a bootstrapping technique. Two validation data sets (n=252) were collected independently to assess the maps' accuracy.
For Deforestation, the Bands classification model has the highest values of the F1 score (93.1%) when compared with the Indices model (91.9%). The Forest and Other classes had better results of the F1 score using the Indices (85.8% and 82.2%, respectively) than using the Bands (85.3% and 78.7%, respectively). Our classifications have an overall accuracy of 88.9% for Bands and 84.9% for Indices, and the following user's and producer's accuracy for the models:
Accuracy of classification using Bands:
Deforestation: UA - 97.4% PA - 89.2%
Forest: UA - 80.8% PA - 90.4%
Other: UA - 80.2% PA - 77.3%
Accuracy of classification using Indices:
Deforestation: UA - 96.1% PA - 88.0%
Forest: UA - 88.6% PA - 83.3%
Other: UA - 77.0% PA - 88.0%
To produce the maps, the R package sits was used. The sits is an open source software that provides tools for time series analysis and classification. The sits packages can be found at GitHub (https://github.com/e-sensing/sits), and the scripts used in work can be found at http://doi.org/10.5281/zenodo.3932013.
The following data sets are provided:
(a) The classified map in compressed GeoTIFF format (10-meter resolution) using the Bands model.
(b) The classified map in compressed GeoTIFF format (10-meter resolution) using the Indices model.
(c) CSV file with the training data set.
(d) CSV file with the validation data set for the Bands model.
(e) CSV file with the validation data set for the Indices model.
(f) A QGIS style file for displaying the data in the QGIS software.
Note: The GeoTIFF raster files use the UTM Projection, which is the same cartographical projection used by the input Sentinel-2 images. When opening the GeoTIFF raster maps in QGIS, to ensure correct navigation please use the UTM zone 20S projection (EPSG:32720). The projection string parameters are:
"+proj=utm +zone=20 +south +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"