Deep-sea sediments of the global ocean mapped with Random Forest machine learning algorithm

The seafloor lithology of deep-sea sediments of the global ocean was spatially predicted. Five lithology classes were predicted: Calcareous sediment, Clay, Diatom ooze, Lithogenous sediment, and Radiolarian ooze. The dataset contains probability surfaces of the five seafloor lithologies, the probability of the most probable class (maximum probability) and the predicted seafloor lithology. The results are presented as geo-referenced floating-point TIFF-files with a spatial resolution of 10 km and Wagner IV equal-area projection as spatial reference. Seafloor lithologies were mapped by building a predictive spatial model. This entails a two-step approach: Initially, the relationship between a set of predictor variables and a response variable is modelled from observations (samples). The established model is then employed to predict the response variable at unsampled locations for which values of the predictor variables are known. The response variable is seafloor lithology, a qualitative multinomial variable. Seafloor lithology data were sourced from Dutkiewicz et al. (2015) and pre-processed in the following way: Only samples deeper than 500 m were used, and duplicates were removed from the original sample dataset. The number of records was therefore reduced from 14,400 to 10,438. The original classification with 13 classes (Dutkiewicz et al., 2015) was reduced to 5 classes: Clay, Diatom ooze, and Radiolarian ooze were retained. Calcareous ooze and Fine-grained calcareous sediment were grouped together as Calcareous sediment. Gravel and coarser, Sand and Silt were grouped together as Lithogenous sediment. The rare classes Ash and volcanic sand/gravel, Sponge spicules and Shells and coral fragments and the mixed classes Fine-grained calcareous sediment and siliceous mud were removed. The choice of predictor variables was initially informed by the current understanding of the controls on the distribution of deep-sea sediments and the availability of data with full coverage of the deep sea at a reasonable resolution. The predictor variable raster layers from Bio-ORACLE (Assis et al., 2018; Tyberghein et al., 2012) and MARSPEC (Sbrocco and Barber, 2013) were utilised. Whenever available, statistics of the variable other than mean were downloaded. These included the minimum, maximum and the range (maximum – minimum). The raster layers were stacked, limited to water depths below 500 m and projected to Wagner IV global equal-area projection with a pixel resolution of 10 km by 10 km. A variable selection wrapper algorithm (Kursa and Rudnicki 2010) was used to identify important predictor variables. Subsequently, the set of variables was reduced to those that were uncorrelated (|r| < 0.5). The selected predictor variables, in decreasing order of importance, were sea-surface maximum salinity, bathymetry, sea-floor minimum temperature, sea-surface minimum silicate, sea-surface maximum primary productivity, sea-surface temperature range, distance to shore and sea-surface salinity range. A Random Forest (Breiman 2001) classification model was trained and the model accuracy assessed by applying a spatial leave-one-out cross validation scheme. A balanced version of Random Forest was utilised to account for imbalances in the input data set. Initial tuning of the number of trees in the forest and the number of variables to consider at any given split showed a very limited impact on model performance, while at the same time the tuning process was very time-consuming. It was therefore decided to use the default parameter values.

Data and Resources

This dataset has no data

Cite this as

Diesing, Markus (2020). Dataset: Deep-sea sediments of the global ocean mapped with Random Forest machine learning algorithm. https://doi.org/10.1594/PANGAEA.911692

DOI retrieved: 2020

Additional Info

Field Value
Imported on November 30, 2024
Last update November 30, 2024
License CC-BY-4.0
Source https://doi.org/10.1594/PANGAEA.911692
Author Diesing, Markus
Given Name Markus
Family Name Diesing
Source Creation 2020
Publication Year 2020
Subject Areas
Name: Ecology

Name: LandSurface

Related Identifiers
Title: Bio-ORACLE v2.0: Extending marine data layers for bioclimatic modelling
Identifier: https://doi.org/10.1111/geb.12693
Type: DOI
Relation: References
Year: 2018
Source: Global Ecology and Biogeography
Authors: Assis Jorge , Tyberghein Lennert , Bosch Samuel , Verbruggen Heroen , Serrão Ester A , De Clerck Olivier , Breiman Leo , Dutkiewicz Adriana , Müller R Dietmar , O'Callaghan Simon , Jónasson Hjörtur , Kursa Miron B , Rudnicki Witold R , Sbrocco Elizabeth J , Barber Paul H , Tyberghein Lennert , Verbruggen Heroen , Pauly Klaas , Troupin Charles , Mineur Frederic , De Clerck Olivier .

Title: Random Forests
Identifier: https://doi.org/10.1023/A:1010933404324
Type: DOI
Relation: References
Year: 2001
Source: Machine Learning
Authors: Assis Jorge , Tyberghein Lennert , Bosch Samuel , Verbruggen Heroen , Serrão Ester A , De Clerck Olivier , Breiman Leo , Dutkiewicz Adriana , Müller R Dietmar , O'Callaghan Simon , Jónasson Hjörtur , Kursa Miron B , Rudnicki Witold R , Sbrocco Elizabeth J , Barber Paul H , Tyberghein Lennert , Verbruggen Heroen , Pauly Klaas , Troupin Charles , Mineur Frederic , De Clerck Olivier .

Title: Census of seafloor sediments in the world's ocean
Identifier: https://doi.org/10.1130/G36883.1
Type: DOI
Relation: References
Year: 2015
Source: Geology
Authors: Assis Jorge , Tyberghein Lennert , Bosch Samuel , Verbruggen Heroen , Serrão Ester A , De Clerck Olivier , Breiman Leo , Dutkiewicz Adriana , Müller R Dietmar , O'Callaghan Simon , Jónasson Hjörtur , Kursa Miron B , Rudnicki Witold R , Sbrocco Elizabeth J , Barber Paul H , Tyberghein Lennert , Verbruggen Heroen , Pauly Klaas , Troupin Charles , Mineur Frederic , De Clerck Olivier .

Title: Feature Selection with the Boruta Package
Identifier: https://doi.org/10.18637/jss.v036.i11
Type: DOI
Relation: References
Year: 2010
Source: Journal of Statistical Software
Authors: Assis Jorge , Tyberghein Lennert , Bosch Samuel , Verbruggen Heroen , Serrão Ester A , De Clerck Olivier , Breiman Leo , Dutkiewicz Adriana , Müller R Dietmar , O'Callaghan Simon , Jónasson Hjörtur , Kursa Miron B , Rudnicki Witold R , Sbrocco Elizabeth J , Barber Paul H , Tyberghein Lennert , Verbruggen Heroen , Pauly Klaas , Troupin Charles , Mineur Frederic , De Clerck Olivier .

Title: MARSPEC: ocean climate layers for marine spatial ecology
Identifier: https://doi.org/10.1890/12-1358.1
Type: DOI
Relation: References
Year: 2013
Source: Ecology
Authors: Assis Jorge , Tyberghein Lennert , Bosch Samuel , Verbruggen Heroen , Serrão Ester A , De Clerck Olivier , Breiman Leo , Dutkiewicz Adriana , Müller R Dietmar , O'Callaghan Simon , Jónasson Hjörtur , Kursa Miron B , Rudnicki Witold R , Sbrocco Elizabeth J , Barber Paul H , Tyberghein Lennert , Verbruggen Heroen , Pauly Klaas , Troupin Charles , Mineur Frederic , De Clerck Olivier .

Title: Bio-ORACLE: a global environmental dataset for marine species distribution modelling
Identifier: https://doi.org/10.1111/j.1466-8238.2011.00656.x
Type: DOI
Relation: References
Year: 2012
Source: Global Ecology and Biogeography
Authors: Assis Jorge , Tyberghein Lennert , Bosch Samuel , Verbruggen Heroen , Serrão Ester A , De Clerck Olivier , Breiman Leo , Dutkiewicz Adriana , Müller R Dietmar , O'Callaghan Simon , Jónasson Hjörtur , Kursa Miron B , Rudnicki Witold R , Sbrocco Elizabeth J , Barber Paul H , Tyberghein Lennert , Verbruggen Heroen , Pauly Klaas , Troupin Charles , Mineur Frederic , De Clerck Olivier .