CS-NER
Computer Science Named Entity Recognition in the Open Research Knowledge Graph
1) About
This work proposes a standardized CS-NER task by defining a set of seven contribution-centric scholarly entities for CS NER viz., research problem , solution , resource , language , tool , method , and dataset .
The main contributions are:
1) Merges annotations for contribution-centric named entities from related work as the following datasets:
-
The dataset proposed in Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers (Gupta & Manning, IJCNLP 2011) is the source for ftd, annotated for both titles and abstracts for the following select entities mapped to our standardized types focus -> solution ; domain -> research problem ; and technique -> method
-
The dataset proposed in Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction (Luan et al., EMNLP 2018) is the source for scierc, annotated for abstracts for the following select entities with mappings task -> research problem
-
The dataset proposed in SemEval-2021 Task 11: NLPContributionGraph - Structuring Scholarly NLP Contributions for a Research Knowledge Graph (D’Souza et al., SemEval 2021) is the source for ncg, annotated for both titles and abstracts for research problem
-
https://paperswithcode.com/ as the pwc annotated for both titles and abstracts for task -> research problem and method entities.
2) Additionally, supplies a new annotated dataset for the titles in the ACL anthology in the acl repository where titles are annotated with all seven entities.
2) Dataset Statistics for full dataset
Titlestrain.data
| NER | Count |
| --- | --- |
| solution | 65,213 |
| research problem | 43,033 |
| resource | 19,759 |
| method | 19,645 |
| tool | 4,856 |
| dataset | 4,062 |
| language | 1,704 |
dev.data
| NER | Count |
| --- | --- |
| solution | 3,685 |
| research problem | 2,717 |
| resource | 1,224 |
| method | 1,172 |
| tool | 264 |
| dataset | 191 |
| language | 79 |
test.data
| NER | Count |
| --- | --- |
| solution | 29,287 |
| research problem | 11,093 |
| resource | 8,511 |
| method | 7,009 |
| tool | 2,272 |
| dataset | 947 |
| language | 690 |
Abstractstrain-abs.data
| NER | Count |
| --- | --- |
| research problem | 15,498 |
| method | 12,932 |
dev-abs.data
| NER | Count |
| --- | --- |
| research problem | 1,450 |
| method | 839 |
test-abs.data
| NER | Count |
| --- | --- |
| research problem | 4,123 |
| method | 3,170 |
The reamining repositories have specialized README files with the respective dataset statistics.
3) Citation
Accepted for publication in ICADL 2022 proceedings.
Citation information forthcoming
Preprint
@article{d2022computer,
title={Computer Science Named Entity Recognition in the Open Research Knowledge Graph},
author={D'Souza, Jennifer and Auer, S{\"o}ren},
journal={arXiv preprint arXiv:2203.14579},
year={2022}
}
4) Additional resources
CS NER Software trained on the dataset in this repositoryCodebase: https://gitlab.com/TIBHannover/orkg/nlp/orkg-nlp-experiments/-/tree/master/orkg_cs_ner
Service URL - REST API: https://orkg.org/nlp/api/docs#/annotation/annotates_paper_annotation_csner_post
Service URL - PyPi: https://orkg-nlp-pypi.readthedocs.io/en/latest/services/services.html#cs-ner-computer-science-named-entity-recognition