-
MasakhaPOS
MasakhaPOS is an Igbo parts-of-speech dataset. -
CoNLL 2003 dataset
The CoNLL 2003 dataset is a collection of news-wire articles used for sequence labeling tasks. -
Rare-NER, Bio-NER, and Twitter-POS datasets
The Rare-NER, Bio-NER, and Twitter-POS datasets are used for named entity recognition and part-of-speech tagging. -
Wall Street Journal
The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees.