Human Feedback - Groups

Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Huma...

Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback

Dataset
JSON

Training a helpful and harmless assistant with reinforcement learning from hu...

The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation.

Dataset
JSON

Toxic-DPO Dataset

The dataset used in the paper is the Toxic-DPO dataset, which is used for reinforcement learning from human feedback.

Dataset
JSON

Anthropic-HH-RLHF Dataset

The dataset used in the paper is the Anthropic-HH-RLHF dataset, which is used for reinforcement learning from human feedback.

Dataset
JSON

4 datasets found

Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Huma...

Training a helpful and harmless assistant with reinforcement learning from hu...

Toxic-DPO Dataset

Anthropic-HH-RLHF Dataset