8 datasets found

Tags: dataset

Filter Results
  • AGNews, 20News, NYT, IMDB

    AGNews, 20News, NYT, IMDB are datasets used for weakly supervised text classification.
  • HateXplain

    The HateXplain dataset, containing 20,000 posts from Gab and Twitter, annotated with hate/offensive/normal labels.
  • 20News

    Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging.
  • NYT

    Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
  • GYAFC

    The GYAFC dataset is a formality transfer dataset for English that contains aligned formal and informal sentences from two domains: Entertainment & Music and Family &...
  • Yahoo

    The Yahoo dataset used for training and testing the proposed model, containing leaked passwords.
  • C4

    The dataset used for pre-training language models, containing a large collection of text documents.
  • LAION

    The dataset used in the paper is not explicitly described, but it is mentioned that it is a large-scale captioned image dataset (LAION) used to train the Stable Diffusion model.