Pol-insar-island - a benchmark dataset for multi-frequency pol-insar data land cover classification

Abstract: The strong scientific interest and the accompanying rapid development of machine learning, in particular deep learning, has led to a significant improvement in automatic image interpretation in recent years. Research generally focuses on classification or segmentation of optical images, but there are already several successful approaches that apply deep learning techniques to the analysis of PolSAR or Pol-InSAR images. While the success of learning-based methods for the analysis of optical images has been strongly driven by public benchmark datasets such as ImageNet and Cityscapes, which contain a large number of annotated training and test data, comparable datasets for the PolSAR and especially the Pol-InSAR domain are almost non-existent. This conclusion and the demand for large and representative expert-annotated benchmark datasets for the SAR community is also reached by Zhu et al. (2021) in their analysis of the current state of deep learning-based SAR image analysis. To fill this gap, this work presents a new multi-frequency Pol-InSAR benchmark dataset for training and testing learning-based methods. This dataset is intended to improve the development of new approaches or the adaptation and improvement of existing ones. Furthermore, a defined division of the data into training and testing sections will ensure the comparability of approaches of different works.

For the segmentation of PolSAR images, there already exist a few annotated datasets that are frequently referenced in the respective literature. These include the PolSF dataset, which contains PolSAR images from various spaceborne and airborne systems over San Francisco, the E-SAR dataset from Oberpfaffenhofen, and the Flevoland dataset, which contains an AIRSAR image depicting agricultural areas. The described datasets have several limitations that make them inadequate for evaluating learning-based image segmentation approaches. One weakness is the small amount of data, which is usually insufficient for training deep networks. Another problem is the lack of complexity of the segmentation task due to generic classes that are too easily distinguishable or spatial distributions of classes that are too simplistic, regular, and resembling each other. As a result, very high classification performances can already be achieved by simple classifiers, so that a comparison of more sophisticated classifiers, which are necessary for more challenging real-world tasks, is not possible. Another disadvantage of existing datasets is that the division into training and test areas is not fixed, which prevents the comparability of research that uses these datasets for evaluation.

Our annotated, multi-frequency Pol-InSAR dataset named $\textbf{Pol-InSAR-Island}$ provides information-rich data as well as a challenging segmentation task consisting of 12 land cover classes. The Pol-InSAR data of the dataset were acquired in April 2022, on behalf of the Lower Saxony Water Management, Coastal Defence and Nature Conservation Agency (NLWKN), with the airborne F-SAR system (Reigber, 2020) of the German Aerospace Center (DLR) during a measurement campaign over the German Wadden Sea. The basics necessary for this measurement campaign in terms of data acquisition and processing were developed within the GeoWAM project (Pinheiro, 2020; Schmitz, 2022). The data acquisition was performed simultaneously in S- and L-band. Interferometric analysis is enabled by imaging the area two times with a time offset of 12 minutes and a vertical baseline of 12m. The Pol-InSAR-Island dataset does not include all flight data of the measurement campaign, but only data sections that cover the island Baltrum. Based on co-registered interferometric image pairs the coherency matrix $\mathbf{T}_6$ is calculated for each pixel, which is obtained from the scalar products of the Pauli scattering vectors $\mathrm{k}_1$ and $\mathrm{k}_2$:

$\mathbf{T}6 = \begin{bmatrix} \mathrm{k}_1 \ \mathrm{k}_2 \end{bmatrix} \begin{bmatrix} \mathrm{k}_1^{T} & \mathrm{k}_2^{T} \end{bmatrix} = \begin{bmatrix} \mathbf{T}{11} & \mathbf{\Omega}{12}\ \mathbf{\Omega}{21} & \mathbf{T}_{22} \end{bmatrix}.$

Subsequently, this representation is projected from the slant-range to the ground-range geometry. The ground resolution of the geocoded products has a resolution of 1m x 1m.

The island of Baltrum is home to many different biotopes and vegetation species and is surrounded by the North Sea and tidal flats. The diverse land cover results in the target classes of the demanding Pol-InSAR-Island dataset. For a precise annotation of the data, an existing biotope type map, published by the the Lower Saxon Wadden Sea National Park Authority via Marine.Daten.Infrastruktur.Niedersachsen (MDI-NI), is used, which is a result of the Trilateral Monitoring and Assessment Program processing from 2013. In this map the island Baltrum is classified into 40 different biotope types. Since the biotope map was generated several years prior to the SAR measurement campaign, significant changes in the extent of the different biotope types are to be expected, making the map not sufficient for accurate annotation. Therefore, the map is revised within a semi-automatic process. Automatically, transition areas between different biotope types are removed and strongly similar biotope types are grouped into one class each. The decision about the grouping of biotope types is based on the semantic context and an analysis of the class separability based on polarimetric features. The analysis of class separability is performed visually based on a projection of the polarimetric features into a two-dimensional feature space using the UMAP algorithm. Manual improvements to the resulting annotation are made by visually interpreting the SAR images and simultaneously acquired optical images. The Pol-InSAR-Island dataset is split into two spatially disjoint subsets, using a chessboard grid that alternately defines training and test patches. Such a split pattern was chosen to ensure that all 12 classes are represented in both subsets and additionally mapped at different incidence angles.

Overall, this results in a labeled multi-frequency Pol-InSAR dataset that provides a challenging segmentation task. The public availability of the dataset can make a crucial contribution to the further development and comparability of learning-based segmentation methods for Pol-InSAR images in the future.

References

Pinheiro, M., Amao-Oliva, J., Scheiber, R., Jäger, M., Horn, R., Keller, M., Fischer, J. & Reigber, A. (2020). Dual-Frequency Airborne SAR for Large Scale Mapping of Tidal Flats. Remote Sensing, 12(11), 1827.

Reigber, A., Jäger, M., Fischer, J., Horn, R., Scheiber, R., Prats, P., & Nottensteiner, A. (2011). System Status and Calibration of the F-SAR Airborne SAR Instrument. 2011 IEEE International Geoscience and Remote Sensing Symposium, 1520-1523.

Schmitz, S., Hammer, H., & Thiele, A. (2022). Multi-Frequency PolInSAR Data Are Advantageous for Land Cover Classification – A Visual and Quantitative Analysis. ISPRS Annals of Photogrammetry, Remote Sensing & Spatial Information Sciences, V-1-2022, 49–56.

Zhu, X. X., Montazeri, S., Ali, M., Hua, Y., Wang, Y., Mou, L., Shi, Y., Xu, F. & Bamler, R. (2021). Deep Learning Meets SAR: Concepts, Models, Pitfalls, and Perspectives. IEEE Geoscience and Remote Sensing Magazine, 9(4), 143-172. TechnicalRemarks: # Pol-InSAR-Island dataset:

This folder contains multi-frequency Pol-InSAR data acquired by the F-SAR system of the German Aerospace Center (DLR) over Baltrum and corresponding land cover labels.

Data structure: - data - FP1 # Flight path 1 - L # Frequency band - T6 # Pol-InSAR data - pauli.bmp # Pauli-RGB image of the master scene - S - ... - FP2 # Flight path 2 - ... - label - FP1 - label_train.bin - ... - FP2 - ...

Data format: The data is provided as flat-binary raster files (.bin) with an accompanying ASCII header file (*.hdr) in ENVI-format. For Pol-InSAR data the real and imaginary components of the diagonal elments and upper triangle elements of the 6 x 6 coherency matrix are stored in seperated files (T11.bin, T12_real.bin, T12_imag.bin,...)

Land cover labels contained in label_train.bin and label_test.bin are encoded as integers using the following mapping:

0 - Unassigned 1 - Tidal flat 2 - Water 3 - Coastal shrub 4 - Dense, high vegetation 5 - White dune 6 - Peat bog 7 - Grey dune 8 - Couch grass 9 - Upper saltmarsh 10 - Lower saltmarsh 11 - Sand 12 - Settlement

BibTex: