416 datasets found

Filter Results
  • Contextualized Sequence Likelihood

    The authors used several question-answering datasets, including CoQA, TriviaQA, and Natural Questions.
  • SST-2

    The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and find that a model having higher AUC does not necessarily...
  • FUNSD dataset

    FUNSD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
  • CORD dataset

    CORD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
  • Neural Collaborative Filtering

    The dataset is used for neural collaborative filtering, which is a type of collaborative filtering that uses neural networks to learn the relationships between users and items.
  • MS MARCO: A Human-Generated Machine Reading Comprehension Dataset

    The dataset is used for training and evaluating the MS MARCO model, a question answering model.
  • VQAv2

    Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...
  • IMDB-RLHF-Pair dataset

    The IMDB-RLHF-Pair dataset is generated by IMDB, and responses with positive sentiment are preferred.
  • Stack-Exchange-Paired dataset

    The Stack-Exchange-Paired dataset contains questions and answers from the Stack Overflow dataset, where answers with more votes are preferred.
  • FAQ dataset

    The dataset used for FAQ sentence labeling.
  • XQuAD

    The XQuAD dataset is a multilingual question answering dataset.
  • TyDi QA

    Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there...
  • Wizard of Wikipedia

    Wizard of Wikipedia is a recent, large-scale dataset of multi-turn knowledge-grounded dialogues between a “apprentice” and a “wizard”, who has access to information from...
  • Synthetic Data

    The dataset used in the paper is a synthetic dataset for off-policy contextual bandits, with contexts x ∈ X, a finite set of actions A, and bounded real rewards r ∈ A → [0, 1].
  • Visual Dialog

    Visual dialog is a multi-round extension for VQA. The interactions between the image and multi-round question-answer pairs (history) are progressively changing, and the...
  • Context-Aware Graph for Visual Dialog

    Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation...
  • CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

    Legal case retrieval is a critical process for modern legal information systems. This paper proposes CaseEncoder, a pre-trained encoder that utilizes fine-grained legal...
  • StackOverflow

    The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
  • Generalized Category Discovery with Decoupled Prototypical Network

    Generalized Category Discovery (GCD) aims to recognize both known and novel categories from a set of unlabeled data, based on another dataset labeled with only known categories.
  • MathQA

    MathQA is an English mathematical problems dataset at GRE level. The original MathQA dataset is annotated in a different way from Math23k with many pre-defined operations.