PALADIN: Benchmarks, Experimental Settings, and Evaluation

This collection includes all the data and scripts necessary to reproduce the results from the experimental study of PALADIN.

Data

The data is generated using the Synthetic Data Generator which generates process-based breast cancer treatment data following the distribution in a real population of breast cancer patients. The collection comprises a total of 18 data sets, nine for relational databases and nine for RDF-based knowledge graphs. For each data format, there are three different sizes of data sets:

  • Small models 1,000 patients
  • Medium-sized models 10,000 patients
  • Large models 100,000 patients

There are three data sets of each size. They differ in the parameter used for the mutation probability of the data generator. The lower this value is, the closer the data is to following the treatment guideline for breast cancer patients with an amplified HER2 gene.

The data is available for download in

  • Turtle format: synth_data_ttl.zip
  • Preloaded for the use with Virtuoso 7.20.3237: synth_data_virtuoso.zip
  • MySQL 8.1 dump: synth_data_sql.zip

PALADIN Schemas

The file paladin_schemas.zip contains the different PALADIN schemas used in the experimental study. There are mainly seven different schemas. One of them represents the treatment guideline for breast cancer patients with an amplified HER2 gene. The remaining six shemas are used in the study of the scalability. They divide the patients based on the ranges over their IDs. They comprise of 16, 32, 64, 128, 256, 512, and 1024 nodes, respectively.

Experimental Environment

In order to reproduce the results, download the file experiments.zip. Once unzipped, execute the file run_experiments.sh. Note that you need to have Docker installed. The script run_experiments.sh should be executed with sudo permissions in order to let the script automatically transfer the ownership of the files created with Docker to your user.

Data and Resources

Cite this as

Philipp D. Rohde, Antonio Jesus Diaz-Honrubia, Emetis Niazmand, Maria-Esther Vidal (2023). Dataset: PALADIN: Benchmarks, Experimental Settings, and Evaluation. https://doi.org/10.57702/kf5tc88r

DOI retrieved: November 15, 2023

Additional Info

Field Value
Created November 10, 2023
Last update April 11, 2024
License cc-by-sa: Creative Commons Attribution Share-Alike
Author Philipp D. Rohde
More Authors
Antonio Jesus Diaz-Honrubia
Emetis Niazmand
Maria-Esther Vidal
Author Email Philipp D. Rohde