Fine-tuning Language Models with Advantage-Induced Policy Alignment

doi:doi:10.57702/3oqqdleq

Fine-tuning Language Models with Advantage-Induced Policy Alignment

The dataset used in the paper is the Anthropic Helpfulness and Harmlessness dataset and the StackExchange dataset.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Michael I. Jordan, Jiantao Jiao (2024). Dataset: Fine-tuning Language Models with Advantage-Induced Policy Alignment. https://doi.org/10.57702/3oqqdleq

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Author	Banghua Zhu
More Authors	Hiteshi Sharma Felipe Vieira Frujeri Shi Dong Michael I. Jordan Jiantao Jiao
Homepage	https://huggingface.co/datasets/lvwerra/stack-exchange-paired