Dataset - LDM

Quasar-T

Open-domain question answering (QA) is a key challenge in natural language processing. A successful open-domain QA system must be able to effectively retrieve and comprehend one...
- Dataset
- JSON
Quora Question Pairs

The Quora Question Pairs dataset contains 404k English question pairs on Quora, created to test the abilities of the models to understand the semantics from text, and determine...
- Dataset
- JSON
SQuAD 2.0

The SQuAD 2.0 dataset is a new challenging task for natural language processing, which requires that machine can read, understand, and answer questions about a text. The dataset...
- Dataset
- JSON
MSVD

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...
- Dataset
- JSON
CLEVR

CLEVR images contain objects characterized by a set of attributes (shape, color, size and material). The questions are grouped into 5 categories: Exist, Count, CompareInteger,...
- Dataset
- JSON
TREC

The dataset used for sentiment analysis, question type classification, and subjectivity classification tasks.
- Dataset
- JSON
DocVQA and ChartQA Datasets

The dataset used for testing the Vary-base model, containing DocVQA and ChartQA datasets.
- Dataset
- JSON
Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.
- Dataset
- JSON
SimpleQuestion dataset for adaptive learning

The dataset used in this paper is a collection of questions and answers related to adaptive learning and generative AI.
- Dataset
- JSON
SimpleQuestion dataset for Wikidata

The dataset used in this paper is a reinforcement learning dataset, specifically the SimpleQuestion dataset, which contains questions answerable using Wikidata as the knowledge...
- Dataset
- JSON
Seq2SQL

Seq2SQL: Generating structured queries from natural language using reinforcement learning.
- Dataset
- JSON
WikiTableQuestions

Semantic parsing maps a user-issued natural language (NL) utterance to a machine-executable meaning representation (MR), such as λ−calculus (Zettlemoyer and Collins, 2005), SQL...
- Dataset
- JSON
AlpacaFarm

The AlpacaFarm dataset is a large-scale dataset for preference optimization, which consists of a set of instructions and their corresponding responses.
- Dataset
- JSON
OBQA

The dataset used in the paper to evaluate the REFLEX system, consisting of open-book question answering tasks.
- Dataset
- JSON
EntailmentBank

The dataset used in the paper to evaluate the REFLEX system, consisting of multiple-choice questions with entailment relationships.
- Dataset
- JSON
SentEval

The SentEval dataset is a library for evaluating the quality of sentence embeddings.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

196 datasets found