-
Universal Dependencies
Universal Dependencies (Nivre et al., 2020) provides an extensive testing ground for such scenarios: Its language diversity is constantly increasing (from 10 in v1.0 to 104 in... -
French Treebank (FTB)
The dataset used for the French Treebank (FTB) task, which is a part-of-speech tagging task. -
PANX and UDPOS datasets
The PANX and UDPOS datasets are used for Named Entity Recognition and Part-of-Speech Tagging tasks among the CJKV languages. -
MasakhaPOS
MasakhaPOS is an Igbo parts-of-speech dataset. -
CoNLL 2003 dataset
The CoNLL 2003 dataset is a collection of news-wire articles used for sequence labeling tasks. -
Rare-NER, Bio-NER, and Twitter-POS datasets
The Rare-NER, Bio-NER, and Twitter-POS datasets are used for named entity recognition and part-of-speech tagging. -
Wall Street Journal
The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees.