Training a helpful and harmless assistant with reinforcement learning from human feedback

doi:doi:10.57702/ueb4xymx

Training a helpful and harmless assistant with reinforcement learning from human feedback

The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation.

BibTex:

@dataset{Yuntao_Bai_and_Andy_Jones_and_Kamal_Ndousse_and_Amanda_Askell_and_Anna_Chen_and_Nova_DasSarma_and_Dawn_Drain_and_Stanislav_Fort_and_Deep_Ganguli_and_Tom_Henighan_2024,
    abstract = {The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation.},
    author = {Yuntao Bai and Andy Jones and Kamal Ndousse and Amanda Askell and Anna Chen and Nova DasSarma and Dawn Drain and Stanislav Fort and Deep Ganguli and Tom Henighan},
    doi = {10.57702/ueb4xymx},
    institution = {No Organization},
    keyword = {'Assistant', 'Human Feedback', 'Reinforcement Learning', 'controllable generation', 'human feedback', 'human-computer interaction', 'natural language generation', 'reinforcement learning', 'single-turn dialogue', 'summarization'},
    month = {dec},
    publisher = {TIB},
    title = {Training a helpful and harmless assistant with reinforcement learning from human feedback},
    url = {https://service.tib.eu/ldmservice/dataset/training-a-helpful-and-harmless-assistant-with-reinforcement-learning-from-human-feedback},
    year = {2024}
}