Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion

This dataset is part of the bachelor thesis "Evaluating SQuAD-based Question Answering for the Open Research Knowledge Graph Completion". It was created for the finetuning of Bert Based models pre-trained on the SQUaD dataset. The Dataset was created using semi-automatic approach on the ORKG data.

The dataset.csv file contains the entire data (all properties) in a tabular for and is unsplit. The json files contain only the necessary fields for training and evaluation, with additional fields (index of start and end of the answers in the abstracts). The data in the json files is split (training data) and evaluation data. We create 4 variants of the training and evaluation sets for each one of the question labels ("no label", "how", "what", "which")

For detailed information on each of the fields in the dataset, refer to section 4.2 (Corpus) of the Thesis document that can be found in https://www.repo.uni-hannover.de/handle/123456789/12958.

The script used to generate the dataset can be found in the public repository https://github.com/as18cia/thesis_work and https://gitlab.com/TIBHannover/orkg/nlp/experiments/orkg-fine-tuning-squad-based-models

BibTex: