Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion

This dataset is part of the bachelor thesis "Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion". It was created for the finetuning of Bert Based models pre-trained on the SQUaD dataset. The Dataset was created using semi-automatic approach on the ORKG data.

The dataset.csv file contains the entire data (all properties) in a tabular for and is unsplit. The json files contain only the necessary fields for training and evaluation, with additional fields (index of start and end of the answers in the abstracts). The data in the json files is split (training data) and evaluation data. We create 4 variants of the training and evaluation sets for each one of the question labels ("no label", "how", "what", "which")

For detailed information on each of the fields in the dataset, refer to section 4.2 (Corpus) of the Thesis document that can be found in https://www.repo.uni-hannover.de/handle/123456789/12958.

The script used to generate the dataset can be found in the public repository https://github.com/as18cia/thesis_work and https://gitlab.com/TIBHannover/orkg/nlp/experiments/orkg-fine-tuning-squad-based-models

Data and Resources

Cite this as

Moussab Hrou (2022). Dataset: Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion. https://doi.org/10.25835/blecbkwf

DOI retrieved: November 1, 2022

Additional Info

Field Value
Imported on January 12, 2023
Last update August 4, 2023
License CC-BY-3.0
Source https://data.uni-hannover.de/dataset/evaluating-squad-based-question-answering-for-the-open-research-knowledge-graph-completion
Author Moussab Hrou
Maintainer Moussab Hrou
Source Creation 01 November, 2022, 00:05 AM (UTC+0000)
Source Modified 05 December, 2022, 04:04 AM (UTC+0000)