-
Automata-based constraints for language model decoding
The dataset used in this paper is a collection of regular expressions and grammars for constraining language models. -
Penn Tree Bank
The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The... -
Wikitext-2
The dataset used in this paper is not explicitly described. However, it is mentioned that the authors used the Wikitext-2 dataset for text generation tasks. -
SlimPajama
The dataset is used to evaluate the performance of the xLSTM architecture on various tasks, including language modeling, question answering, and text classification. -
Chinese Poetry
The Chinese Poetry dataset is a dataset of Chinese poems used for language modeling. -
Penn Treebank
The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths. -
Penn Treebank (PTB) dataset
The Penn Treebank (PTB) dataset is used for word ordering task. The dataset is used to evaluate the performance of different models for word ordering.