Changes

View changes from to

On April 11, 2024 at 7:22:59 AM UTC, Philipp D. Rohde:

Updated description of PALADIN: Benchmarks, Experimental Settings, and Evaluation from
This collection includes all the data and scripts necessary to reproduce the results from the experimental study of [PALADIN](https://github.com/SDM-TIB/PALADIN). __Data__ The data is generated using the [Synthetic Data Generator](https://github.com/SDM-TIB/Synthetic-Data-Generator) which generates process-based breast cancer treatment data following the distribution in a real population of breast cancer patients. The collection comprises a total of 18 data sets, nine for relational databases and nine for RDF-based knowledge graphs. For each data format, there are three different sizes of data sets: + _Small_ models 1,000 patients + _Medium-sized_ models 10,000 patients + _Large_ models 100,000 patients There are three data sets of each size. They differ in the parameter used for the mutation probability of the data generator. The lower this value is, the closer the data is to following the treatment guideline for breast cancer patients with an amplified HER2 gene. The data is available for download in * Turtle format: `synth_data_ttl.zip` * Preloaded for the use with Virtuoso 7.20.3237: `synth_data_virtuoso.zip` * MySQL 8.1 dump: `synth_data_sql.zip` __PALADIN Schemas__ The file `paladin_schemas.zip` contains the different PALADIN schemas used in the experimental study. There are mainly four different schemas. One of them represents the treatment guideline for breast cancer patients with an amplified HER2 gene. The remaining three shemas are used in the study of the scalability. They divide the patients based on the ranges over their IDs. They comprise of 64, 256, and 1024 nodes, respectively. __Experimental Environment__ In order to reproduce the results, download the file `experiments.zip`. Once unzipped, execute the file `run_experiments.sh`. Note that you need to have Docker installed. The script `run_experiments.sh` should be executed with sudo permissions in order to let the script automatically transfer the ownership of the files created with Docker to your user.
to
This collection includes all the data and scripts necessary to reproduce the results from the experimental study of [PALADIN](https://github.com/SDM-TIB/PALADIN). __Data__ The data is generated using the [Synthetic Data Generator](https://github.com/SDM-TIB/Synthetic-Data-Generator) which generates process-based breast cancer treatment data following the distribution in a real population of breast cancer patients. The collection comprises a total of 18 data sets, nine for relational databases and nine for RDF-based knowledge graphs. For each data format, there are three different sizes of data sets: + _Small_ models 1,000 patients + _Medium-sized_ models 10,000 patients + _Large_ models 100,000 patients There are three data sets of each size. They differ in the parameter used for the mutation probability of the data generator. The lower this value is, the closer the data is to following the treatment guideline for breast cancer patients with an amplified HER2 gene. The data is available for download in * Turtle format: `synth_data_ttl.zip` * Preloaded for the use with Virtuoso 7.20.3237: `synth_data_virtuoso.zip` * MySQL 8.1 dump: `synth_data_sql.zip` __PALADIN Schemas__ The file `paladin_schemas.zip` contains the different PALADIN schemas used in the experimental study. There are mainly seven different schemas. One of them represents the treatment guideline for breast cancer patients with an amplified HER2 gene. The remaining six shemas are used in the study of the scalability. They divide the patients based on the ranges over their IDs. They comprise of 16, 32, 64, 128, 256, 512, and 1024 nodes, respectively. __Experimental Environment__ In order to reproduce the results, download the file `experiments.zip`. Once unzipped, execute the file `run_experiments.sh`. Note that you need to have Docker installed. The script `run_experiments.sh` should be executed with sudo permissions in order to let the script automatically transfer the ownership of the files created with Docker to your user.

              
    
          
          
        
        
            f 1 { f 1 {
            2   "author": "Philipp D. Rohde", 2   "author": "Philipp D. Rohde",
            3   "author_email": "philipp.rohde@tib.eu", 3   "author_email": "philipp.rohde@tib.eu",
            4   "creator_user_id": "de54a873-8acf-4fa5-bb95-5342cf7fd041", 4   "creator_user_id": "de54a873-8acf-4fa5-bb95-5342cf7fd041",
            5   "doi": "10.57702/kf5tc88r", 5   "doi": "10.57702/kf5tc88r",
            6   "doi_date_published": "2023-11-15", 6   "doi_date_published": "2023-11-15",
            7   "doi_publisher": "TIB", 7   "doi_publisher": "TIB",
            8   "doi_status": true, 8   "doi_status": true,
            9   "domain": "https://service.tib.eu/ldmservice", 9   "domain": "https://service.tib.eu/ldmservice",
            10   "extra_authors": [ 10   "extra_authors": [
            11     { 11     {
            12       "extra_author": "Antonio Jesus Diaz-Honrubia", 12       "extra_author": "Antonio Jesus Diaz-Honrubia",
            13       "orcid": "0000-0001-5464-0714" 13       "orcid": "0000-0001-5464-0714"
            14     }, 14     },
            15     { 15     {
            16       "extra_author": "Emetis Niazmand", 16       "extra_author": "Emetis Niazmand",
            17       "orcid": "0000-0001-8194-8079" 17       "orcid": "0000-0001-8194-8079"
            18     }, 18     },
            19     { 19     {
            20       "extra_author": "Maria-Esther Vidal", 20       "extra_author": "Maria-Esther Vidal",
            21       "orcid": "0000-0003-1160-8727" 21       "orcid": "0000-0003-1160-8727"
            22     } 22     }
            23   ], 23   ],
            24   "extras": [ 24   "extras": [
            25     { 25     {
            26       "__extras": { 26       "__extras": {
            27         "id": "840592f3-b97a-4382-a9c1-80a1144589a6", 27         "id": "840592f3-b97a-4382-a9c1-80a1144589a6",
            28         "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df", 28         "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df",
            29         "state": "active" 29         "state": "active"
            30       }, 30       },
            31       "key": "", 31       "key": "",
            32       "value": "" 32       "value": ""
            33     } 33     }
            34   ], 34   ],
            35   "groups": [], 35   "groups": [],
            36   "id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df", 36   "id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df",
            37   "isopen": true, 37   "isopen": true,
            38   "license_id": "cc-by-sa", 38   "license_id": "cc-by-sa",
            39   "license_title": "Creative Commons Attribution Share-Alike", 39   "license_title": "Creative Commons Attribution Share-Alike",
            40   "license_url": "http://www.opendefinition.org/licenses/cc-by-sa", 40   "license_url": "http://www.opendefinition.org/licenses/cc-by-sa",
            41   "maintainer": "", 41   "maintainer": "",
            42   "maintainer_email": "", 42   "maintainer_email": "",
            43   "metadata_created": "2023-11-10T15:17:42.752028", 43   "metadata_created": "2023-11-10T15:17:42.752028",
            n 44   "metadata_modified": "2024-04-11T07:21:09.211249", n 44   "metadata_modified": "2024-04-11T07:22:59.344222",
            45   "name": "paladin-benchmarks-experimental-settings-and-evaluation", 45   "name": "paladin-benchmarks-experimental-settings-and-evaluation",
            46   "notes": "This collection includes all the data and scripts  46   "notes": "This collection includes all the data and scripts 
            47 necessary to reproduce the results from the experimental study of  47 necessary to reproduce the results from the experimental study of 
            48 LADIN](https://github.com/SDM-TIB/PALADIN).\r\n\r\n__Data__\r\n\r\nThe  48 LADIN](https://github.com/SDM-TIB/PALADIN).\r\n\r\n__Data__\r\n\r\nThe 
            49 data is generated using the [Synthetic Data  49 data is generated using the [Synthetic Data 
            50 Generator](https://github.com/SDM-TIB/Synthetic-Data-Generator) which  50 Generator](https://github.com/SDM-TIB/Synthetic-Data-Generator) which 
            51 generates process-based breast cancer treatment data following the  51 generates process-based breast cancer treatment data following the 
            52 distribution in a real population of breast cancer patients. The  52 distribution in a real population of breast cancer patients. The 
            53 collection comprises a total of 18 data sets, nine for relational  53 collection comprises a total of 18 data sets, nine for relational 
            54 databases and nine for RDF-based knowledge graphs. For each data  54 databases and nine for RDF-based knowledge graphs. For each data 
            55 format, there are three different sizes of data sets:\r\n\r\n+ _Small_  55 format, there are three different sizes of data sets:\r\n\r\n+ _Small_ 
            56 models 1,000 patients\r\n+ _Medium-sized_ models 10,000 patients\r\n+  56 models 1,000 patients\r\n+ _Medium-sized_ models 10,000 patients\r\n+ 
            57 _Large_ models 100,000 patients\r\n\r\nThere are three data sets of  57 _Large_ models 100,000 patients\r\n\r\nThere are three data sets of 
            58 each size. They differ in the parameter used for the mutation  58 each size. They differ in the parameter used for the mutation 
            59 probability of the data generator. The lower this value is, the closer  59 probability of the data generator. The lower this value is, the closer 
            60 the data is to following the treatment guideline for breast cancer  60 the data is to following the treatment guideline for breast cancer 
            61 patients with an amplified HER2 gene.\r\n\r\nThe data is available for  61 patients with an amplified HER2 gene.\r\n\r\nThe data is available for 
            62 download in\r\n\r\n* Turtle format: `synth_data_ttl.zip`\r\n*  62 download in\r\n\r\n* Turtle format: `synth_data_ttl.zip`\r\n* 
            63 Preloaded for the use with Virtuoso 7.20.3237:  63 Preloaded for the use with Virtuoso 7.20.3237: 
            64 `synth_data_virtuoso.zip`\r\n* MySQL 8.1 dump:  64 `synth_data_virtuoso.zip`\r\n* MySQL 8.1 dump: 
            65 `synth_data_sql.zip`\r\n\r\n__PALADIN Schemas__\r\n\r\nThe file  65 `synth_data_sql.zip`\r\n\r\n__PALADIN Schemas__\r\n\r\nThe file 
            66 `paladin_schemas.zip` contains the different PALADIN schemas used in  66 `paladin_schemas.zip` contains the different PALADIN schemas used in 
            n 67 the experimental study. There are mainly four different schemas. One  n 67 the experimental study. There are mainly seven different schemas. One 
            68 of them represents the treatment guideline for breast cancer patients  68 of them represents the treatment guideline for breast cancer patients 
            t 69 with an amplified HER2 gene. The remaining three shemas are used in  t 69 with an amplified HER2 gene. The remaining six shemas are used in the 
            70 the study of the scalability. They divide the patients based on the  70 study of the scalability. They divide the patients based on the ranges 
            71 ranges over their IDs. They comprise of 64, 256, and 1024 nodes,  71 over their IDs. They comprise of 16, 32, 64, 128, 256, 512, and 1024 
            72 respectively.\r\n\r\n__Experimental Environment__\r\n\r\nIn order to  72 nodes, respectively.\r\n\r\n__Experimental Environment__\r\n\r\nIn 
            73 reproduce the results, download the file `experiments.zip`. Once  73 order to reproduce the results, download the file `experiments.zip`. 
            74 unzipped, execute the file `run_experiments.sh`. Note that you need to  74 Once unzipped, execute the file `run_experiments.sh`. Note that you 
            75 have Docker installed. The script `run_experiments.sh` should be  75 need to have Docker installed. The script `run_experiments.sh` should 
            76 executed with sudo permissions in order to let the script  76 be executed with sudo permissions in order to let the script 
            77 automatically transfer the ownership of the files created with Docker  77 automatically transfer the ownership of the files created with Docker 
            78 to your user. ", 78 to your user. ",
            79   "num_resources": 5, 79   "num_resources": 5,
            80   "num_tags": 0, 80   "num_tags": 0,
            81   "orcid": "0000-0002-9835-4354", 81   "orcid": "0000-0002-9835-4354",
            82   "organization": { 82   "organization": {
            83     "approval_status": "approved", 83     "approval_status": "approved",
            84     "created": "2017-11-23T17:30:37.757128", 84     "created": "2017-11-23T17:30:37.757128",
            85     "description": "The German National Library of Science and  85     "description": "The German National Library of Science and 
            86 Technology, abbreviated TIB, is the national library of the Federal  86 Technology, abbreviated TIB, is the national library of the Federal 
            87 Republic of Germany for all fields of engineering, technology, and the  87 Republic of Germany for all fields of engineering, technology, and the 
            88 natural sciences.", 88 natural sciences.",
            89     "id": "0c5362f5-b99e-41db-8256-3d0d7549bf4d", 89     "id": "0c5362f5-b99e-41db-8256-3d0d7549bf4d",
            90     "image_url":  90     "image_url": 
            91 3conf/ext/tib_tmpl_bootstrap/Resources/Public/images/TIB_Logo_en.png", 91 3conf/ext/tib_tmpl_bootstrap/Resources/Public/images/TIB_Logo_en.png",
            92     "is_organization": true, 92     "is_organization": true,
            93     "name": "tib", 93     "name": "tib",
            94     "state": "active", 94     "state": "active",
            95     "title": "TIB", 95     "title": "TIB",
            96     "type": "organization" 96     "type": "organization"
            97   }, 97   },
            98   "owner_org": "0c5362f5-b99e-41db-8256-3d0d7549bf4d", 98   "owner_org": "0c5362f5-b99e-41db-8256-3d0d7549bf4d",
            99   "private": false, 99   "private": false,
            100   "relationships_as_object": [], 100   "relationships_as_object": [],
            101   "relationships_as_subject": [], 101   "relationships_as_subject": [],
            102   "resources": [ 102   "resources": [
            103     { 103     {
            104       "auto_update": "No", 104       "auto_update": "No",
            105       "auto_update_last_update": "", 105       "auto_update_last_update": "",
            106       "auto_update_url": "", 106       "auto_update_url": "",
            107       "cache_last_updated": null, 107       "cache_last_updated": null,
            108       "cache_url": null, 108       "cache_url": null,
            109       "created": "2023-11-10T15:19:14.750905", 109       "created": "2023-11-10T15:19:14.750905",
            110       "description": "Synthetic data used in the experiments; MySQL  110       "description": "Synthetic data used in the experiments; MySQL 
            111 8.1 dump", 111 8.1 dump",
            112       "format": "ZIP", 112       "format": "ZIP",
            113       "hash": "", 113       "hash": "",
            114       "id": "3fcc4321-feed-4654-9f7e-d08ee1e9739c", 114       "id": "3fcc4321-feed-4654-9f7e-d08ee1e9739c",
            115       "last_modified": "2023-11-10T15:19:14.720525", 115       "last_modified": "2023-11-10T15:19:14.720525",
            116       "metadata_modified": "2023-11-10T15:19:14.744026", 116       "metadata_modified": "2023-11-10T15:19:14.744026",
            117       "mimetype": "application/zip", 117       "mimetype": "application/zip",
            118       "mimetype_inner": null, 118       "mimetype_inner": null,
            119       "name": "synth_data_sql.zip", 119       "name": "synth_data_sql.zip",
            120       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df", 120       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df",
            121       "position": 0, 121       "position": 0,
            122       "resource_type": null, 122       "resource_type": null,
            123       "size": 87101091, 123       "size": 87101091,
            124       "state": "active", 124       "state": "active",
            125       "url":  125       "url": 
            126 rce/3fcc4321-feed-4654-9f7e-d08ee1e9739c/download/synth_data_sql.zip", 126 rce/3fcc4321-feed-4654-9f7e-d08ee1e9739c/download/synth_data_sql.zip",
            127       "url_type": "upload" 127       "url_type": "upload"
            128     }, 128     },
            129     { 129     {
            130       "auto_update": "No", 130       "auto_update": "No",
            131       "auto_update_last_update": "", 131       "auto_update_last_update": "",
            132       "auto_update_url": "", 132       "auto_update_url": "",
            133       "cache_last_updated": null, 133       "cache_last_updated": null,
            134       "cache_url": null, 134       "cache_url": null,
            135       "created": "2023-11-10T15:22:57.708332", 135       "created": "2023-11-10T15:22:57.708332",
            136       "description": "Synthetic data used in the experiments; RDF in  136       "description": "Synthetic data used in the experiments; RDF in 
            137 Turtle serialization", 137 Turtle serialization",
            138       "format": "ZIP", 138       "format": "ZIP",
            139       "hash": "", 139       "hash": "",
            140       "id": "f44ce80d-a920-4446-ad3a-2cce36657303", 140       "id": "f44ce80d-a920-4446-ad3a-2cce36657303",
            141       "last_modified": "2023-11-10T15:22:57.529781", 141       "last_modified": "2023-11-10T15:22:57.529781",
            142       "metadata_modified": "2023-11-10T15:22:57.562511", 142       "metadata_modified": "2023-11-10T15:22:57.562511",
            143       "mimetype": "application/zip", 143       "mimetype": "application/zip",
            144       "mimetype_inner": null, 144       "mimetype_inner": null,
            145       "name": "synth_data_ttl.zip", 145       "name": "synth_data_ttl.zip",
            146       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df", 146       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df",
            147       "position": 1, 147       "position": 1,
            148       "resource_type": null, 148       "resource_type": null,
            149       "size": 155607030, 149       "size": 155607030,
            150       "state": "active", 150       "state": "active",
            151       "url":  151       "url": 
            152 rce/f44ce80d-a920-4446-ad3a-2cce36657303/download/synth_data_ttl.zip", 152 rce/f44ce80d-a920-4446-ad3a-2cce36657303/download/synth_data_ttl.zip",
            153       "url_type": "upload" 153       "url_type": "upload"
            154     }, 154     },
            155     { 155     {
            156       "auto_update": "No", 156       "auto_update": "No",
            157       "auto_update_last_update": "", 157       "auto_update_last_update": "",
            158       "auto_update_url": "", 158       "auto_update_url": "",
            159       "cache_last_updated": null, 159       "cache_last_updated": null,
            160       "cache_url": null, 160       "cache_url": null,
            161       "created": "2023-11-10T15:23:59.014155", 161       "created": "2023-11-10T15:23:59.014155",
            162       "description": "Synthetic data used in the experiments; Virtuoso  162       "description": "Synthetic data used in the experiments; Virtuoso 
            163 7.20.3237 database", 163 7.20.3237 database",
            164       "format": "ZIP", 164       "format": "ZIP",
            165       "hash": "", 165       "hash": "",
            166       "id": "8926dd73-afb8-4580-91cc-8047e3e7916c", 166       "id": "8926dd73-afb8-4580-91cc-8047e3e7916c",
            167       "last_modified": "2023-11-10T15:23:58.984212", 167       "last_modified": "2023-11-10T15:23:58.984212",
            168       "metadata_modified": "2023-11-10T15:23:59.009455", 168       "metadata_modified": "2023-11-10T15:23:59.009455",
            169       "mimetype": "application/zip", 169       "mimetype": "application/zip",
            170       "mimetype_inner": null, 170       "mimetype_inner": null,
            171       "name": "synth_data_virtuoso.zip", 171       "name": "synth_data_virtuoso.zip",
            172       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df", 172       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df",
            173       "position": 2, 173       "position": 2,
            174       "resource_type": null, 174       "resource_type": null,
            175       "size": 285547801, 175       "size": 285547801,
            176       "state": "active", 176       "state": "active",
            177       "url":  177       "url": 
            178 926dd73-afb8-4580-91cc-8047e3e7916c/download/synth_data_virtuoso.zip", 178 926dd73-afb8-4580-91cc-8047e3e7916c/download/synth_data_virtuoso.zip",
            179       "url_type": "upload" 179       "url_type": "upload"
            180     }, 180     },
            181     { 181     {
            182       "auto_update": "No", 182       "auto_update": "No",
            183       "auto_update_last_update": "", 183       "auto_update_last_update": "",
            184       "auto_update_url": "", 184       "auto_update_url": "",
            185       "cache_last_updated": null, 185       "cache_last_updated": null,
            186       "cache_url": null, 186       "cache_url": null,
            187       "created": "2023-11-10T15:24:32.024426", 187       "created": "2023-11-10T15:24:32.024426",
            188       "description": "PALADIN schemas used in the experiments", 188       "description": "PALADIN schemas used in the experiments",
            189       "format": "ZIP", 189       "format": "ZIP",
            190       "hash": "", 190       "hash": "",
            191       "id": "44280984-73d1-40df-97e4-7fab0f8694fd", 191       "id": "44280984-73d1-40df-97e4-7fab0f8694fd",
            192       "last_modified": "2024-04-11T07:20:18.946477", 192       "last_modified": "2024-04-11T07:20:18.946477",
            193       "metadata_modified": "2024-04-11T07:20:18.959384", 193       "metadata_modified": "2024-04-11T07:20:18.959384",
            194       "mimetype": "application/zip", 194       "mimetype": "application/zip",
            195       "mimetype_inner": null, 195       "mimetype_inner": null,
            196       "name": "paladin_schemas.zip", 196       "name": "paladin_schemas.zip",
            197       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df", 197       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df",
            198       "position": 3, 198       "position": 3,
            199       "resource_type": null, 199       "resource_type": null,
            200       "size": 76116, 200       "size": 76116,
            201       "state": "active", 201       "state": "active",
            202       "url":  202       "url": 
            203 ce/44280984-73d1-40df-97e4-7fab0f8694fd/download/paladin_schemas.zip", 203 ce/44280984-73d1-40df-97e4-7fab0f8694fd/download/paladin_schemas.zip",
            204       "url_type": "upload" 204       "url_type": "upload"
            205     }, 205     },
            206     { 206     {
            207       "auto_update": "No", 207       "auto_update": "No",
            208       "auto_update_last_update": "", 208       "auto_update_last_update": "",
            209       "auto_update_url": "", 209       "auto_update_url": "",
            210       "cache_last_updated": null, 210       "cache_last_updated": null,
            211       "cache_url": null, 211       "cache_url": null,
            212       "created": "2023-11-10T15:25:37.491260", 212       "created": "2023-11-10T15:25:37.491260",
            213       "description": "Complete experiment environment for reproducing  213       "description": "Complete experiment environment for reproducing 
            214 the results", 214 the results",
            215       "format": "ZIP", 215       "format": "ZIP",
            216       "hash": "", 216       "hash": "",
            217       "id": "21bcc348-5e53-4994-97da-dc094c6113fb", 217       "id": "21bcc348-5e53-4994-97da-dc094c6113fb",
            218       "last_modified": "2024-04-11T07:21:09.202251", 218       "last_modified": "2024-04-11T07:21:09.202251",
            219       "metadata_modified": "2024-04-11T07:21:09.215541", 219       "metadata_modified": "2024-04-11T07:21:09.215541",
            220       "mimetype": "application/zip", 220       "mimetype": "application/zip",
            221       "mimetype_inner": null, 221       "mimetype_inner": null,
            222       "name": "experiments.zip", 222       "name": "experiments.zip",
            223       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df", 223       "package_id": "4e81c650-6b82-4a0a-bf3c-ab81394cf8df",
            224       "position": 4, 224       "position": 4,
            225       "resource_type": null, 225       "resource_type": null,
            226       "size": 362101043, 226       "size": 362101043,
            227       "state": "active", 227       "state": "active",
            228       "url":  228       "url": 
            229 source/21bcc348-5e53-4994-97da-dc094c6113fb/download/experiments.zip", 229 source/21bcc348-5e53-4994-97da-dc094c6113fb/download/experiments.zip",
            230       "url_type": "upload" 230       "url_type": "upload"
            231     } 231     }
            232   ], 232   ],
            233   "services_used_list": "", 233   "services_used_list": "",
            234   "state": "active", 234   "state": "active",
            235   "tags": [], 235   "tags": [],
            236   "title": "PALADIN: Benchmarks, Experimental Settings, and  236   "title": "PALADIN: Benchmarks, Experimental Settings, and 
            237 Evaluation", 237 Evaluation",
            238   "type": "dataset", 238   "type": "dataset",
            239   "url": "", 239   "url": "",
            240   "version": "" 240   "version": ""
            241 } 241 }