SynthLC
SynthLC models lung cancer patient data. Each patient has the following attributes:
- Age category
- Sex
- Smoking habit
- Comorbidities
- Biomarkers
- Drugs taken
- Relapse
The attribute values are assigned at random. Hence, no patterns within real lung cancer patients can be observed.
Alongside a generator for synthetic lung cancer data, there are already generated datasets modeling 1,000, 10,000, and 100,000 patients, respectively. The following files are part of this SynthLC entry.
- SynthLC CSV: The 1k, 10k, and 100k dataset in CSV format.
- SynthLC RDF: The 1k, 10k, and 100k dataset in RDF.
- SynthLC Virtuoso: The 1k, 10k, and 100k dataset preloaded in Virtuoso 07.20.3238.
- SynthLC Shapes: 25 SHACL shapes consisting of biomarker, drug, and relapse combinations. There are two variants, one is using SPARQL constraints while the other uses a non-standard approach for specifying the shapes target via query.
- SynthLC Generator: The script used to create the 1k, 10k, and 100k dataset, and the shapes. It can be used to create more shapes or datasets of a different size.
BibTex: