-
Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Huma...
Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback -
Training a helpful and harmless assistant with reinforcement learning from hu...
The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation. -
Toxic-DPO Dataset
The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback. -
Anthropic-HH-RLHF Dataset
The dataset used in the paper is the Anthropic-HH-RLHF dataset, which is used for reinforcement learning from human feedback.