Dataset - LDM

AdvBench dataset

The dataset used for the experiments in the paper, consisting of 60 harmful instructions from the AdvBench dataset.
- Dataset
- JSON
DeepTagRec: A Tag Recommendation Framework

A content-cum-user based deep learning framework for tag recommendation model which takes advantage of the content of the question text and is further enhanced by the rich...
- Dataset
- JSON
Question Answering

The task is to predict whether the number of edges assigned x is greater than the number of edges assigned y.
- Dataset
- JSON
Jeopardy Questions

The authors used the Jeopardy questions dataset for their experiments.
- Dataset
- JSON
CountryQA

The authors used the CountryQA dataset for their experiments.
- Dataset
- JSON
Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence A...

The authors used a variety of datasets for question answering, including TriviaQA, Natural Questions, CountryQA, and Jeopardy questions.
- Dataset
- JSON
WMDP

The dataset used in the paper is a benchmark contamination detection dataset, which contains questions and answers from various benchmarks.
- Dataset
- JSON
Mistral-7B-Instruct-v0.2

The dataset used in the paper is a benchmark contamination detection dataset, which contains questions and answers from various benchmarks.
- Dataset
- JSON
BIG-Bench Hard

The BIG-Bench Hard dataset is derived from the original BIG-Bench evaluation suite, focusing on tasks that pose challenges to existing language models.
- Dataset
- JSON
Ruozhiba dataset

The Ruozhiba dataset is a distinctive Chinese natural language processing dataset originating from the Ruozhiba community on Baidu Tieba, a Chinese online forum where members...
- Dataset
- JSON
Leveraging QA Datasets to Improve Generative Data Augmentation

The paper proposes a method to leverage QA datasets for training generative language models to be context generators for a given question and answer.
- Dataset
- JSON
POPQA

The dataset used in the paper is POPQA, an entity-centric open-domain question answering dataset.
- Dataset
- JSON
VQA-CP

The VQA-CP dataset is a split of the VQA dataset, designed to test generalization skills across changes in the answer distribution between the training and the test sets.
- Dataset
- JSON
Diverse and Specific Clarification Question Generation with Keywords

Product descriptions on e-commerce websites often suffer from missing important aspects. Clarification question generation (CQ-Gen) can be a promising approach to help alleviate...
- Dataset
- JSON
Proprietary Large-Scale Industry Dataset

The dataset used for the proposed Joint Multi-Domain Learning for Automatic Short Answer Grading.
- Dataset
- JSON
Improving Question Generation With to the Point Context

Question generation (QG) is the task of gen-erating a question from a reference sentence and a speciﬁed answer within the sentence. A major challenge in QG is to iden-tify...
- Dataset
- JSON
Mining Clues from Incomplete Utterance: A Query-enhanced Network for Incomple...

Incomplete utterance rewriting has recently raised wide attention. However, previous works do not consider the semantic structural information between incomplete utterance and...
- Dataset
- JSON
A large annotated corpus for learning natural language inference

A large annotated corpus for learning natural language inference
- Dataset
- JSON
PROPSEGMENT

The PROPSEGMENT dataset is a large-scale corpus for proposition-level segmentation and entailment recognition.
- Dataset
- JSON
Sub-sentence encoder

The sub-sentence encoder is a contrastive learning framework for learning contextual embeddings for semantic units on the sub-sentence level.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

416 datasets found