You're currently viewing an old version of this dataset. To see the current version, click here.

CS-NER

Computer Science Named Entity Recognition in the Open Research Knowledge Graph

1) About

This work proposes a standardized CS-NER task by defining a set of seven contribution-centric scholarly entities for CS NER viz., research problem , solution , resource , language , tool , method , and dataset .

The main contributions are:

1) Merges annotations for contribution-centric named entities from related work as the following datasets:

2) Additionally, supplies a new annotated dataset for the titles in the ACL anthology in the acl repository where titles are annotated with all seven entities.

2) Dataset Statistics for full dataset

Titles

train.data

| NER | Count |

| --- | --- |

| solution | 65,213 |

| research problem | 43,033 |

| resource | 19,759 |

| method | 19,645 |

| tool | 4,856 |

| dataset | 4,062 |

| language | 1,704 |

dev.data

| NER | Count |

| --- | --- |

| solution | 3,685 |

| research problem | 2,717 |

| resource | 1,224 |

| method | 1,172 |

| tool | 264 |

| dataset | 191 |

| language | 79 |

test.data

| NER | Count |

| --- | --- |

| solution | 29,287 |

| research problem | 11,093 |

| resource | 8,511 |

| method | 7,009 |

| tool | 2,272 |

| dataset | 947 |

| language | 690 |

Abstracts

train-abs.data

| NER | Count |

| --- | --- |

| research problem | 15,498 |

| method | 12,932 |

dev-abs.data

| NER | Count |

| --- | --- |

| research problem | 1,450 |

| method | 839 |

test-abs.data

| NER | Count |

| --- | --- |

| research problem | 4,123 |

| method | 3,170 |

The reamining repositories have specialized README files with the respective dataset statistics.

3) Citation

Accepted for publication in ICADL 2022 proceedings.

Citation information forthcoming

Preprint

@article{d2022computer, title={Computer Science Named Entity Recognition in the Open Research Knowledge Graph}, author={D'Souza, Jennifer and Auer, S{\"o}ren}, journal={arXiv preprint arXiv:2203.14579}, year={2022} }

4) Additional resources

CS NER Software trained on the dataset in this repository

Codebase: https://gitlab.com/TIBHannover/orkg/nlp/orkg-nlp-experiments/-/tree/master/orkg_cs_ner

Service URL - REST API: https://orkg.org/nlp/api/docs#/annotation/annotates_paper_annotation_csner_post

Service URL - PyPi: https://orkg-nlp-pypi.readthedocs.io/en/latest/services/services.html#cs-ner-computer-science-named-entity-recognition

Data and Resources

Cite this as

Jennifer D'Souza (2022). Dataset: CS-NER. https://doi.org/10.25835/hodc41f5

DOI retrieved: October 7, 2022

Additional Info

Field Value
Imported on January 12, 2023
Last update January 12, 2023
License CC-BY-SA-3.0
Source https://data.uni-hannover.de/dataset/cs-ner-dataset
Author Jennifer D'Souza
Author Email Jennifer D'Souza
Maintainer Jennifer D'Souza
Source Creation 07 October, 2022, 06:56 AM (UTC+0000)
Source Modified 07 October, 2022, 07:02 AM (UTC+0000)