34 datasets found

Tags: Language Modeling

Filter Results
  • SlimPajama

    The dataset is used to evaluate the performance of the xLSTM architecture on various tasks, including language modeling, question answering, and text classification.
  • YELP

    The YELP dataset is used for language modeling.
  • PTB

    Object tracking by reconstruction with view-specific discriminative correlation filters.
  • Penn Treebank (PTB) and WikiText-2 (WT-2)

    The dataset used in the paper is Penn Treebank (PTB) and WikiText-2 (WT-2), which are language modeling datasets.
  • Patrika Dataset

    Patrika dataset is used as independent test set.
  • Nayadiganta Dataset

    Nayadiganta dataset is used as independent test set.
  • Hindinews and Livehindustan Articles

    Hindinews, Livehindustan and Patrika newspaper articles available open source in Kaggle encompassing similar domains.
  • Bengali and Hindi News Articles

    Bengali dataset consists of articles from online public news portals such as Prothom-Alo, BDNews24 and Nayadiganta. The articles encompass domains such as politics,...
  • Chinese Poetry

    The Chinese Poetry dataset is a dataset of Chinese poems used for language modeling.
  • Text8

    Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.
  • Penn Treebank

    The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.
  • Wikitext-103

    The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles.
  • GLUE

    Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...
  • Penn Treebank (PTB) dataset

    The Penn Treebank (PTB) dataset is used for word ordering task. The dataset is used to evaluate the performance of different models for word ordering.
You can also access this registry using the API (see API Docs).