Dataset - LDM

Gossipcop

The Gossipcop dataset is an English full-length article news dataset collected from the entertainment domain of FakeNewsNet repository.
- Dataset
- JSON
Tedlium3

Tedlium3: A large-scale English speech corpus for speaker adaptation.
- Dataset
- JSON
VoxCeleb dataset

The VoxCeleb dataset is a large-scale speaker identification dataset, used to evaluate the performance of face recognition systems.
- Dataset
- JSON
LitBank

Dataset for coreference resolution in English.
- Dataset
- JSON
TIMIT dataset

The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
- Dataset
- JSON
KFTT datasets

KFTT English↔Japanese translation datasets.
- Dataset
- JSON
NIST 2003 (MT03), NIST 2004 (MT04), NIST 2005 (MT05), NIST 2006 (MT06) datasets

Chinese↔English translation tasks, KFTT English↔Japanese translation datasets.
- Dataset
- JSON
WIT corpus, SETimes corpus, newsdev2016, newstest2016, and newstest2017

The dataset used in the paper is the WIT corpus, SETimes corpus, newsdev2016, newstest2016, and newstest2017.
- Dataset
- JSON
Turkish-English and Uyghur-Chinese machine translation tasks

The dataset used in the paper is the Turkish-English and Uyghur-Chinese machine translation tasks.
- Dataset
- JSON
BBC News

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
IWSLT 2014

The IWSLT 2014 German-to-English dataset is a machine translation dataset, containing 153K sentence pairs.
- Dataset
- JSON
English Test Set

The English test set is used for evaluating the performance of the proposed system.
- Dataset
- JSON
RAVDESS

RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset contains 24 professional actors (12 female, 12 male) to offer the performance with good quality and...
- Dataset
- JSON
SAVEE

The SAVEE dataset contains 480 acted English utterances recorded by four male actors and consists of seven emotion categories: anger, fear, disgust, happiness, neutral, sadness,...
- Dataset
- JSON
MSVD

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...
- Dataset
- JSON
WordNet

This paper uses a large text corpus to extract subjects and objects of verbs and represents them as abstract concepts.
- Dataset
- JSON
LibriSpeech dataset

The dataset used in the paper is the LibriSpeech dataset, which contains about 1,000 hours of English speech derived from audiobooks.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

37 datasets found