-
WikiText-2 dataset
The WikiText-2 dataset is a benchmark for evaluating the performance of large language models. -
C4 dataset
The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset. -
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large La...
Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for... -
Confidence Calibration in Large Language Models
The dataset used in this study to analyze the self-assessment behavior of Large language models. -
Moral Foundations Questionnaire
This dataset is used to study the moral profiles of large language models. -
Ethical Dilemmas for Large Language Models
This dataset is used to assess the moral reasoning capabilities of large language models. -
Llama: Open and efficient foundation language models
The LLaMA dataset is a large language model dataset used in the paper. -
Chatbot Arena
The dataset used in this paper is a large-scale dataset for evaluating LLMs, which is used to train and evaluate the Chatbot Arena model. -
Arena-Hard
The dataset used in this paper is a large-scale dataset for evaluating LLMs, which is used to train and evaluate the Arena-Hard model. -
LMSYS ChatBot Arena
The dataset used in this paper is a large-scale real-world LLM conversation dataset, which is used to train and evaluate the LMSYS ChatBot Arena model. -
WizardArena
The dataset used in this paper is a large-scale conversational data, which is used to train and evaluate the WizardLM-β model. -
OPT-66B and Llama2-70B
The dataset used in the paper is OPT-66B, a large language model, and Llama2-70B, another large language model. -
OpenAssistant Conversations– Democratizing Large Language Model Alignment
OpenAssistant Conversations– Democratizing Large Language Model Alignment -
Sparse Watermarking in LLMs with Enhanced Text Quality
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used the ELI5, FinanceQA, MultiNews, and QMSum datasets. -
Inducing Anxiety in Large Language Models Increases Exploration and Bias
The Inducing Anxiety in Large Language Models Increases Exploration and Bias dataset contains anxiety-inducing scenarios for large language models. -
How do large language models capture the ever-changing world knowledge?
This paper presents a review of recent advances in large language models' ability to capture ever-changing world knowledge. -
Dr.E: A Graph Language Translator
Significant efforts have been dedicated to integrating the powerful Large Language Models (LLMs) with diverse modality, particularly focusing on the fusion of language, vision... -
Forbidden Question Dataset
The dataset used to evaluate the effectiveness of different jailbreak attack methods against LLMs. The dataset contains 160 forbidden questions with high diversity. -
Jailbreak Attack Dataset
The dataset used in the paper to evaluate the effectiveness of different jailbreak attack methods against Large Language Models (LLMs). -
Xiezhi Benchmark
Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 249,587 questions and accompanied by Xiezhi-Specialty with...