43 datasets found

Filter Results
  • EGOODS

    A large native one-to-many text dataset for text generation tasks, constructed to accelerate the research of diverse text generation.
  • BLIP2

    A vision-language pre-training dataset, BLIP2, which consists of 100 million image-text pairs.
  • The E2E dataset

    The E2E dataset contains restaurant reviews labeled by 8 fields including food type, price, and customer rating.
  • MTTN: Multi-Pair Text to Text Narratives for Prompt Generation

    A large-scale dataset for generating prompts that can be used in diffusion models for text-to-text generation tasks.
  • CLIP-GLaSS

    The dataset used for the text-to-image task consists of 20 context tokens, to which three fixed tokens have been concatenated, representing the static context "the picture of".
  • Wikitext-2

    The dataset used in this paper is not explicitly described. However, it is mentioned that the authors used the Wikitext-2 dataset for text generation tasks.
  • TESS: Text-to-Text Self-Conditioned Simplex Diffusion

    Diffusion models have emerged as a power-ful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models...
  • MME

    MME: A comprehensive evaluation benchmark for multimodal large language models
  • Mmbench

    Mmbench: Is your multi-modal model an all-around player?
  • Language models are few-shot learners

    A language model that demonstrates capabilities in processing and generating human-like text.
  • Mmicl

    Mmicl: Empowering vision-language model with multi-modal in-context learning
  • Prompt Highlighter

    Prompt Highlighter is a novel paradigm for user-model interactions in multi-modal LLMs, offering output control through a token-level highlighting mechanism.
  • C4

    The dataset used for pre-training language models, containing a large collection of text documents.
  • CommonGen

    Commonsense generation aims to generate a realistic sentence describing a daily scene under the given concepts, which is very challenging, since it requires models to have...
  • SSD-LM

    Semi-autoregressive simplex-based diffusion language model for text generation and modular control
  • Text8

    Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.
  • STC dataset

    The STC dataset is a short text conversation dataset used for evaluating the performance of conversation response generation models.
  • Wikitext-103

    The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles.
  • Synthetic Dataset

    The dataset used in this work is a custom synthetic dataset generated using the liquid-dsp library, containing 600000 examples of each of 13.8 million examples, with SNRs...
  • SeqDiffuSeq

    The dataset used in the SeqDiffuSeq paper for sequence-to-sequence text generation.