Part-of-speech tagging - Groups

Universal Dependencies

Universal Dependencies (Nivre et al., 2020) provides an extensive testing ground for such scenarios: Its language diversity is constantly increasing (from 10 in v1.0 to 104 in...

Dataset
JSON

French Treebank (FTB)

The dataset used for the French Treebank (FTB) task, which is a part-of-speech tagging task.

Dataset
JSON

PANX and UDPOS datasets

The PANX and UDPOS datasets are used for Named Entity Recognition and Part-of-Speech Tagging tasks among the CJKV languages.

Dataset
JSON

MasakhaPOS

MasakhaPOS is an Igbo parts-of-speech dataset.

Dataset
JSON

CoNLL 2003 dataset

The CoNLL 2003 dataset is a collection of news-wire articles used for sequence labeling tasks.

Dataset
JSON

Rare-NER, Bio-NER, and Twitter-POS datasets

The Rare-NER, Bio-NER, and Twitter-POS datasets are used for named entity recognition and part-of-speech tagging.

Dataset
JSON

Wall Street Journal

The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees.

Dataset
JSON

7 datasets found