-
LLM dataset
The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their... -
MMLU dataset
The dataset used in the paper is the Multitask Language Understanding (MMLU) dataset, which consists of 57 tasks from Science, Technology, Engineering, and Math (STEM),... -
Cross-topic Argument Mining from Heterogeneous Sources
Cross-topic Argument Mining from Heterogeneous Sources. -
Few-Shot Stance Detection via Target-Aware Prompt Distillation
Stance detection aims to identify whether the author of a text is in favor of, against, or neutral to a given target. The main challenge of this task comes two-fold: few-shot... -
Proprietary Large-Scale Industry Dataset
The dataset used for the proposed Joint Multi-Domain Learning for Automatic Short Answer Grading. -
KALM Dataset
The dataset used in the KALM paper. -
TREC dataset
The dataset used in the paper is the TREC dataset, which consists of 124 queries. -
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping
Self-alignment is an effective way to reduce the cost of human annotation while ensuring promising model capability. This objective can be achieved from three aspects: (i) high... -
Wikipedia Neutrality Corpus
This dataset is used to test the ability of large language models to detect and correct biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. -
Conditional Generative Matching Model for Multi-lingual Reply Suggestion
A Conditional Generative Matching Model for Multi-lingual Reply Suggestion -
Training Language Models to Perform Tasks
A dataset for training language models to perform tasks such as question answering and text classification.