-
CoNLL 2003 dataset
The CoNLL 2003 dataset is a collection of news-wire articles used for sequence labeling tasks. -
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is... -
SeqMix: Augmenting Active Sequence Labeling via Sequence Mixup
Active learning is an important technique for low-resource sequence labeling tasks. However, current active sequence labeling methods use the queried samples alone in each...