You're currently viewing an old version of this dataset. To see the current version, click here.
Original Metadata
The json representation of the dataset with its distributions based on DCAT.
Cite this as
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D Manning, Chelsea Finn (2024). Dataset: Direct preference optimization: Your language model is secretly a reward model. Resource: Original Metadata. https://doi.org/10.57702/wgpeg5j4
DOI retrieved: December 2, 2024
Additional Information
Field | Value |
---|---|
Created | unknown |
Last updated | December 2, 2024 |
Format | application/json |