Dataset - LDM

VLSP

The dataset used for part-of-speech tagging and named entity recognition tasks.
- Dataset
- JSON
DocRED dataset

The DocRED dataset was built from Wikipedia and Wikidata, covering various relations related to science, art, personal life, etc.
- Dataset
- JSON
Rare-NER, Bio-NER, and Twitter-POS datasets

The Rare-NER, Bio-NER, and Twitter-POS datasets are used for named entity recognition and part-of-speech tagging.
- Dataset
- JSON
Wall Street Journal

The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees.
- Dataset
- JSON
CMeEE

The CMeEE V1 and V2 datasets for Chinese nested medical NER.
- Dataset
- JSON
Wiki-40B, PG-19, C4, etc.

The dataset used in the paper is not explicitly described. However, it is mentioned that the authors used various benchmarks such as Wiki-40B, PG-19, C4, etc.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

6 datasets found