Soil information is valuable for many disciplines (e.g. agriculture, geomorphology, geology, archaeology) and can be used to produce maps or statistics on soil productivity. As part of the project CRC1070 ResourceCulture, we collected information on the soil quality in the Dohuk province of the Kurdistan region of Iraq. In total, 561 samples were collected at 136 locations in 2017, 2018, 2022, and 2023. These samples were collected at different depth increments (0 - 10, 10 - 30, 30 - 50, 50 - 70 and 70 - 100 cm) with an auger before being prepared and measured with mid-infrared (MIR) spectroscopy. Part of these samples (109) were selected to be analyzed in a laboratory, measuring texture, pH, organic and total carbon, nitrogen, sulfur, electrical conductivity, bulk density and calcium carbonate. A Cubist model was used to predict the remaining samples based on the MIR spectra. We then modelled digital soil mapping with machine learning methods (ensemble learning, linear regression, decision trees) for these soil components. Additionally, we mapped the soil depth using the information collected in the field. This dataset can help any researcher regarding soil information, forming a unique regional database.
For the MIR spectra prediction data the samples were air dried (35 - 45 °C) for 24 h; root fragments were removed, sieved (< 2 mm), and ground below 1 µm with a (Fritsch, Pulverisette 5/4, classic line). The samples were measured in MIR spectroscopy with wavenumber (Bruker, Vertex 80v-mir) from 375 - 4,995 cm-1 in absorbance, with a 4 cm-1 interval. Spectra between 350 – 499 cm-1 and 2,451 – 2,500 cm-1 were removed for low signal interference during prediction. Some spectra were transformed according to the literature for better prediction results. The predicted values are based on a Cubist model computed on the raw spectra (pH, MWD), Savitzky-Golay transform spectra with second polynomial order and a window size of eleven (CaCO3, Clay), or Standard Normal Variate of the Savitzky-Golay transform spectra with second polynomial order and a window size of eleven (Nt, Ct, Corg, Ec, Sand, Silt). pH is expressed in absolute value, MWD in mm, Nt, Ct, Corg, Sand, Silt and Clay in %, EC in µS/cm, and wave-number in absorbance in 1/cm.