33 datasets found

Tags: large-scale

Filter Results
  • WebLI

    The dataset used in the paper for subject-driven text-to-image synthesis
  • DEEP

    Detecting Errors through Ensembling Prompts (DEEP) - an end-to-end large language model framework for detecting factual errors in text summarization.
  • MOSES

    The MOSES dataset is a large-scale molecular dataset containing 1.9 million molecules with up to 30 heavy atoms.
  • GSM8K dataset

    The dataset used in the paper is a set of problems for testing the safety of artificial general intelligence (AGI) systems.
  • BigEarthNet-MM

    A large-scale benchmark archive for remote sensing image classification and retrieval.
  • Wind Farm Dataset

    The dataset is used to test the HCMAPPO algorithm for large-scale wind farm control. It includes 13, 16, 19, and 22 wind turbines with their coordinates, wind speeds, and...
  • Nordland Railway dataset

    The Nordland Railway dataset is a large-scale driving dataset that includes a 728km train journey from Trondheim to Bodø in Nordland, Norway, recorded four times, once per season.
  • WavCaps

    The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
  • MLS

    MLS: A large-scale multilingual dataset for speech research.
  • People’s Speech

    The People’s Speech: A large-scale diverse English speech recognition dataset for commercial usage.
  • YTF

    Face recognition and person re-identification using paired image-attribute data, where the attributes (i.e., soft biometrics) are only available during the training phase.
  • VGGSound

    The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
  • DataComp-1B

    The dataset used in the paper is also DataComp-1B, which is a large-scale dataset for training next-generation image-text models.
  • Webvid10M

    The dataset used for training the image-to-video model consists of LAION COCO 600M and Webvid10M.
  • Webvid-10M

    The dataset used for training the video model consists of Webvid-10M, a large-scale dataset of short videos with textual descriptions.
  • LAION COCO 600M

    The dataset used for training the text-to-video model consists of 20 million videos and 600 million images.
  • DOTA-v2.0

    A large-scale dataset for object detection in aerial images, containing 11,268 images and 1,793,658 objects.
  • VoxCeleb: A Large-Scale Speaker Identification Dataset

    VoxCeleb: A Large-Scale Speaker Identification Dataset
  • Criminal

    A large-scale dataset for charge prediction, consisting of roughly 500,000 legal cases.
  • LAION-Aesthetic

    The dataset used in the paper is LAION-Aesthetic, a large-scale image dataset.
You can also access this registry using the API (see API Docs).