Fine-tuning Language Models with Advantage-Induced Policy Alignment

The dataset used in the paper is the Anthropic Helpfulness and Harmlessness dataset and the StackExchange dataset.

BibTex: