ORKG Similar Papers Recommendation Service Evaluation Dataset

doi:doi:10.25835/qftvbgo4

ORKG Similar Papers Recommendation Service Evaluation Dataset

This dataset was created to compare and evaluate the Semantic Scholar recommendation service and Open Research Knowledge Graph (ORKG) similar papers recommendation service based on Elastic Search. The dataset includes 30 random ORKG comparisons, each of them is provided with 50 similar papers recommended by Semantic Scholar and 50 papers recommended by Elastic Search, including 10 most relevant papers that were manually labeled.

Dataset columns:

Comparison resource ID: the identifier of the ORKG comparison resource
Elastic search score: the relevancy score provided by Elastic Search
Curated score: papers that are the most relevant to the comparison were manually labeled as "1"
ES DOI: digital object identifier of the paper, recommended by Elastic Search
ES Title: title of the paper, recommended by Elastic Search
ES Abstract: abstract of the paper, recommended by Elastic Search
Semantic scholar rank: the relevancy rank provided by Semantic Scholar
SS DOI: digital object identifier of the paper, recommended by Semantic Scholar
SS Title: title of the paper, recommended by Semantic Scholar
SS Abstract: abstract of the paper, recommended by Semantic Scholar

Evaluation results:

Average precision (P@k) and recall (R@k) for Semantic Scholar results:

P@50 = 0.11; R@50 = 0.54
P@40 = 0.13; R@40 = 0.50
P@30 = 0.16; R@30 = 0.48
P@20 = 0.20; R@20 = 0.40
P@10 = 0.29; R@10 = 0.29

Average precision (P@k) and recall (R@k) for Elastic Search results:

P@50 = 0.20; R@50 = 1.00
P@40 = 0.25; R@40 = 0.98
P@30 = 0.32; R@30 = 0.97
P@20 = 0.46; R@20 = 0.92
P@10 = 0.63; R@10 = 0.63

BibTex:

@dataset{Vladyslav_Nechakhin_and__Jennifer_D’Souza_2023,
    abstract = {This dataset was created to compare and evaluate the Semantic Scholar recommendation service and Open Research Knowledge Graph (ORKG) similar papers recommendation service based on Elastic Search. The dataset includes 30 random ORKG comparisons, each of them is provided with 50 similar papers recommended by Semantic Scholar and 50 papers recommended by Elastic Search, including 10 most relevant papers that were manually labeled.

## Dataset columns:
* Comparison resource ID: the identifier of the ORKG comparison resource
* Elastic search score: the relevancy score provided by Elastic Search
* Curated score: papers that are the most relevant to the comparison were manually labeled as "1"
* ES DOI: digital object identifier of the paper, recommended by Elastic Search
* ES Title: title of the paper, recommended by Elastic Search
* ES Abstract: abstract of the paper, recommended by Elastic Search
* Semantic scholar rank: the relevancy rank provided by Semantic Scholar
* SS DOI: digital object identifier of the paper, recommended by Semantic Scholar
* SS Title: title of the paper, recommended by Semantic Scholar
* SS Abstract: abstract of the paper, recommended by Semantic Scholar

## Evaluation results:
Average precision (P@k) and recall (R@k) for Semantic Scholar results:

* P@50 = 0.11; R@50 = 0.54
* P@40 = 0.13; R@40 = 0.50
* P@30 = 0.16; R@30 = 0.48
* P@20 = 0.20; R@20 = 0.40
* P@10 = 0.29; R@10 = 0.29

Average precision (P@k) and recall (R@k) for Elastic Search results:

* P@50 = 0.20; R@50 = 1.00
* P@40 = 0.25; R@40 = 0.98
* P@30 = 0.32; R@30 = 0.97
* P@20 = 0.46; R@20 = 0.92
* P@10 = 0.63; R@10 = 0.63},
    author = {Vladyslav Nechakhin and  Jennifer D’Souza},
    doi = {10.25835/qftvbgo4},
    institution = {TIB},
    month = {jan},
    publisher = {LUIS},
    title = {ORKG Similar Papers Recommendation Service Evaluation Dataset},
    url = {https://service.tib.eu/ldmservice/vdataset/luh-orkg-similar-papers-recommendation-service-evaluation-dataset},
    year = {2023}
}