-
Contextualized Sequence Likelihood
The authors used several question-answering datasets, including CoQA, TriviaQA, and Natural Questions. -
FUNSD dataset
FUNSD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation. -
CORD dataset
CORD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation. -
Neural Collaborative Filtering
The dataset is used for neural collaborative filtering, which is a type of collaborative filtering that uses neural networks to learn the relationships between users and items. -
MS MARCO: A Human-Generated Machine Reading Comprehension Dataset
The dataset is used for training and evaluating the MS MARCO model, a question answering model. -
IMDB-RLHF-Pair dataset
The IMDB-RLHF-Pair dataset is generated by IMDB, and responses with positive sentiment are preferred. -
Stack-Exchange-Paired dataset
The Stack-Exchange-Paired dataset contains questions and answers from the Stack Overflow dataset, where answers with more votes are preferred. -
FAQ dataset
The dataset used for FAQ sentence labeling. -
Wizard of Wikipedia
Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from... -
Synthetic Data
The dataset used in the paper is a synthetic dataset for off-policy contextual bandits, with contexts x ∈ X, a finite set of actions A, and bounded real rewards r ∈ A → [0, 1]. -
Visual Dialog
Visual dialog is a multi-round extension for VQA. The interactions between the image and multi-round question-answer pairs (history) are progressively changing, and the... -
Context-Aware Graph for Visual Dialog
Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation... -
CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding
Legal case retrieval is a critical process for modern legal information systems. This paper proposes CaseEncoder, a pre-trained encoder that utilizes fine-grained legal... -
StackOverflow
The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models. -
Generalized Category Discovery with Decoupled Prototypical Network
Generalized Category Discovery (GCD) aims to recognize both known and novel categories from a set of unlabeled data, based on another dataset labeled with only known categories.