-
Orca: Progressive Learning from Complex Explanation Traces
The Orca approach involves leveraging explanation tuning to generate detailed responses from a large language model. -
Evol-Instruct: A Pipeline for Automatically Evolving Instruction Datasets
The Evol-Instruct pipeline involves automatically evolving instruction datasets using large language models. -
LaMini: A Large-Scale Instruction Dataset
The LaMini approach involves generating a large-scale instruction dataset by leveraging the outputs of a large language model, gpt-3.5-turbo. -
Various Datasets
The datasets used in the paper are described as follows: WikiMIA, BookMIA, Temporal Wiki, Temporal arXiv, ArXiv-1 month, Multi-Webdata, LAION-MI, Gutenberg. -
Question Classification using Convolutional Neural Networks
Question classification using Convolutional Neural Networks -
Penn Treebank dataset
The dataset used in the paper is the Penn Treebank dataset, which is a large-scale text classification dataset. -
Keyphrase generation with fine-grained evaluation-guided reinforcement learning
A dataset for keyphrase generation with fine-grained evaluation-guided reinforcement learning. -
Unified language model pre-training for natural language understanding and ge...
A unified language model pre-training for natural language understanding and generation. -
Neural keyphrase generation via reinforcement learning with adaptive rewards
A dataset for neural keyphrase generation. -
Select, extract and generate: Neural keyphrase generation with layer-wise cov...
A dataset for neural keyphrase generation with layer-wise coverage attention. -
KPEVAL: Towards Fine-Grained Semantic-Based Keyphrase Evaluation
A comprehensive evaluation framework for keyphrase systems, including reference agreement, faithfulness, diversity, and utility. -
DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
We propose DisCo-CLIP, a distributed memory-efficient CLIP training approach, to reduce the memory consump- tion of contrastive loss when training contrastive learning models. -
Customer Service Calls Dataset
A dataset consisting of ten years of customer service calls to a fleet truck company. -
Ubuntu Dialogue Corpus
The Ubuntu Dialogue Corpus is the largest freely available multi-turn based dialogue corpus which consists of almost one million two-way conversations extracted from the Ubuntu... -
Visual Genome
The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships. -
Interpreting Learned Feedback Patterns in Large Language Models
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a condensed representation of LLM activations obtained from sparse...