-
WebWISE: Web Interface Control and Sequential Exploration with Large Language...
The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations. -
Learning to summarize with human feedback
The paper presents a study on the impact of synthetic data on large language models (LLMs) and proposes a method to steer LLMs towards desirable non-differentiable attributes. -
MACHIAVELLI Benchmark
A dataset of traces from the MACHIAVELLI environment, including API calls and their outcomes. -
BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM ...
A structured collection of tests for input-output safeguards, including established failure tests, emerging failure tests, and next-gen architecture tests. -
Synthetic Workload for LLM Serving
The dataset used in the paper is a synthetic workload, where clients send requests with different input and output lengths, and with varying request rates. -
LLM Ethics Dataset
The dataset used in this study to explore the ethical issues surrounding Large Language Models (LLMs). -
WikiText-2 dataset
The WikiText-2 dataset is a benchmark for evaluating the performance of large language models. -
C4 dataset
The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset. -
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large La...
Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for... -
Confidence Calibration in Large Language Models
The dataset used in this study to analyze the self-assessment behavior of Large language models. -
Moral Foundations Questionnaire
This dataset is used to study the moral profiles of large language models. -
Ethical Dilemmas for Large Language Models
This dataset is used to assess the moral reasoning capabilities of large language models. -
Llama: Open and efficient foundation language models
The LLaMA dataset is a large language model dataset used in the paper. -
Chatbot Arena
The dataset used in this paper is a large-scale dataset for evaluating LLMs, which is used to train and evaluate the Chatbot Arena model. -
Arena-Hard
The dataset used in this paper is a large-scale dataset for evaluating LLMs, which is used to train and evaluate the Arena-Hard model. -
LMSYS ChatBot Arena
The dataset used in this paper is a large-scale real-world LLM conversation dataset, which is used to train and evaluate the LMSYS ChatBot Arena model. -
WizardArena
The dataset used in this paper is a large-scale conversational data, which is used to train and evaluate the WizardLM-β model.