-
SAM: Semantic Attribute Modulation for Language Modeling and Style Variation
The Semantic Attribute Modulation (SAM) for language modeling and style variation. -
Penn Tree Bank (PTB)
The Penn Tree Bank (PTB) dataset used for language modeling. -
ControlVAE: Controllable Variational Autoencoder
The dataset used for language modeling, disentangled representation learning, and image generation. -
BookCorpus Dataset
The dataset used in the paper is the bookcorpus dataset. -
Morfessor 2.0 dataset
Morfessor 2.0 dataset for English, Finnish and Turkish language models -
Den samiske tekstbanken dataset
Den samiske tekstbanken dataset for North S´ami language model -
Morpho Challenge 2010 dataset
Morpho Challenge 2010 dataset for English, Finnish and Turkish language models -
Wikitext-2
The dataset used in this paper is not explicitly described. However, it is mentioned that the authors used the Wikitext-2 dataset for text generation tasks. -
Billion Word Benchmark Dataset
The dataset contains 768M tokens for language modeling. -
SlimPajama
The dataset is used to evaluate the performance of the xLSTM architecture on various tasks, including language modeling, question answering, and text classification. -
Penn Treebank (PTB) and WikiText-2 (WT-2)
The dataset used in the paper is Penn Treebank (PTB) and WikiText-2 (WT-2), which are language modeling datasets. -
Patrika Dataset
Patrika dataset is used as independent test set. -
Nayadiganta Dataset
Nayadiganta dataset is used as independent test set. -
Hindinews and Livehindustan Articles
Hindinews, Livehindustan and Patrika newspaper articles available open source in Kaggle encompassing similar domains. -
Bengali and Hindi News Articles
Bengali dataset consists of articles from online public news portals such as Prothom-Alo, BDNews24 and Nayadiganta. The articles encompass domains such as politics,...