
SynthLC models lung cancer patient data. Each patient has the following attributes:

  • Age category
  • Sex
  • Smoking habit
  • Comorbidities
  • Biomarkers
  • Drugs taken
  • Relapse

The attribute values are assigned at random. Hence, no patterns within real lung cancer patients can be observed. Alongside a generator for synthetic lung cancer data, there are already generated datasets modeling 1,000, 10,000, and 100,000 patients, respectively. The following files are part of this SynthLC entry.

  • SynthLC CSV: The 1k, 10k, and 100k dataset in CSV format.
  • SynthLC RDF: The 1k, 10k, and 100k dataset in RDF.
  • SynthLC Virtuoso: The 1k, 10k, and 100k dataset preloaded in Virtuoso 07.20.3238.
  • SynthLC Shapes: 25 SHACL shapes consisting of biomarker, drug, and relapse combinations. There are two variants, one is using SPARQL constraints while the other uses a non-standard approach for specifying the shapes target via query.
  • SynthLC Generator: The script used to create the 1k, 10k, and 100k dataset, and the shapes. It can be used to create more shapes or datasets of a different size.
