Dataset - LDM

Contextualized Sequence Likelihood

The authors used several question-answering datasets, including CoQA, TriviaQA, and Natural Questions.
- Dataset
- JSON
SST-2

The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and ﬁnd that a model having higher AUC does not necessarily...
- Dataset
- JSON
FUNSD dataset

FUNSD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
- Dataset
- JSON
CORD dataset

CORD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
- Dataset
- JSON
Neural Collaborative Filtering

The dataset is used for neural collaborative filtering, which is a type of collaborative filtering that uses neural networks to learn the relationships between users and items.
- Dataset
- JSON
MS MARCO: A Human-Generated Machine Reading Comprehension Dataset

The dataset is used for training and evaluating the MS MARCO model, a question answering model.
- Dataset
- JSON
VQAv2

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...
- Dataset
- JSON
IMDB-RLHF-Pair dataset

The IMDB-RLHF-Pair dataset is generated by IMDB, and responses with positive sentiment are preferred.
- Dataset
- JSON
Stack-Exchange-Paired dataset

The Stack-Exchange-Paired dataset contains questions and answers from the Stack Overflow dataset, where answers with more votes are preferred.
- Dataset
- JSON
FAQ dataset

The dataset used for FAQ sentence labeling.
- Dataset
- JSON
XQuAD

The XQuAD dataset is a multilingual question answering dataset.
- Dataset
- JSON
TyDi QA

Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there...
- Dataset
- JSON
Wizard of Wikipedia

Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from...
- Dataset
- JSON
Synthetic Data

The dataset used in the paper is a synthetic dataset for off-policy contextual bandits, with contexts x ∈ X, a finite set of actions A, and bounded real rewards r ∈ A → [0, 1].
- Dataset
- JSON
Visual Dialog

Visual dialog is a multi-round extension for VQA. The interactions between the image and multi-round question-answer pairs (history) are progressively changing, and the...
- Dataset
- JSON
Context-Aware Graph for Visual Dialog

Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation...
- Dataset
- JSON
CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

Legal case retrieval is a critical process for modern legal information systems. This paper proposes CaseEncoder, a pre-trained encoder that utilizes fine-grained legal...
- Dataset
- JSON
StackOverﬂow

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
Generalized Category Discovery with Decoupled Prototypical Network

Generalized Category Discovery (GCD) aims to recognize both known and novel categories from a set of unlabeled data, based on another dataset labeled with only known categories.
- Dataset
- JSON
MathQA

MathQA is an English mathematical problems dataset at GRE level. The original MathQA dataset is annotated in a different way from Math23k with many pre-defined operations.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

416 datasets found