-
Lunar Lander
The dataset used in this paper is a collection of data points from a lunar lander, which is used to test the proposed APG algorithm for task switching. -
Training a helpful and harmless assistant with reinforcement learning from hu...
The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation.