-
Gemma: Open models based on gemini research and technology
This dataset contains a large corpus of text for training and evaluating large language models. -
Llama 2: Open foundation and fine-tuned chat models
This dataset contains a large corpus of text for training and evaluating large language models. -
Wizardcoder
Wizardcoder: Empowering code large language models with evol-instruct -
LiveCodeBench
LiveCodeBench is a benchmark for evaluating the performance of Large Language Models (LLMs) in code editing tasks, including debugging, translating, polishing, and requirement... -
Program synthesis with large language models
Program synthesis with large language models -
Buffer of Thoughts
Buffer of Thoughts is a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). -
Towards Expert-Level Medical Question Answering with Large Language Models
The Towards Expert-Level Medical Question Answering with Large Language Models dataset contains a large-scale dataset for medical question answering using large language models. -
ALCUNA: Large Language Models Meet New Knowledge
ALCUNA is a benchmark for evaluating the ability of large language models (LLMs) to handle new knowledge. -
Mistral-7B-Instruct-v0.2
The dataset used in the paper is a benchmark contamination detection dataset, which contains questions and answers from various benchmarks. -
Conceptual Inconsistencies in Large Language Models
The dataset consists of 119 clusters, with a total of 584 questions, which include 4 different linguistic forms per query, so we have approximately 146 semantically different... -
Retrieval-augmented generation for large language models: A survey
Retrieval-augmented generation for large language models: A survey -
Text-to-sql empowered by large language models: A benchmark evaluation
DAIL-SQL is a method for text-to-SQL that uses masked question similarity selection. -
CYBERSECEVAL 2
A wide-ranging cybersecurity evaluation suite for large language models. -
AstroMLab 1: Who Wins Astronomy Jeopardy!?
A comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. -
Reducing Retraining by Recycling Parameter-Efficient Prompts
Parameter-efficient methods are able to use a single frozen pre-trained large language model to perform many tasks by learning task-specific soft prompts that modulate model...