-
SlimPajama
The dataset is used to evaluate the performance of the xLSTM architecture on various tasks, including language modeling, question answering, and text classification. -
Penn Treebank (PTB) and WikiText-2 (WT-2)
The dataset used in the paper is Penn Treebank (PTB) and WikiText-2 (WT-2), which are language modeling datasets. -
Patrika Dataset
Patrika dataset is used as independent test set. -
Nayadiganta Dataset
Nayadiganta dataset is used as independent test set. -
Hindinews and Livehindustan Articles
Hindinews, Livehindustan and Patrika newspaper articles available open source in Kaggle encompassing similar domains. -
Bengali and Hindi News Articles
Bengali dataset consists of articles from online public news portals such as Prothom-Alo, BDNews24 and Nayadiganta. The articles encompass domains such as politics,... -
Chinese Poetry
The Chinese Poetry dataset is a dataset of Chinese poems used for language modeling. -
Penn Treebank
The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths. -
Wikitext-103
The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles. -
Penn Treebank (PTB) dataset
The Penn Treebank (PTB) dataset is used for word ordering task. The dataset is used to evaluate the performance of different models for word ordering.