Direct preference optimization: Your language model is secretly a reward model

doi:doi:10.57702/wgpeg5j4

Direct preference optimization: Your language model is secretly a reward model

Followers: 0

Organization

No Organization

There is no description for this organization

License

No License Provided

Export

DCAT(rdf/xml) DCAT(xml) DCAT(N3) DCAT(ttl) DCAT(jsonld) DataCite CSL DublinCore BibTex

Direct preference optimization: Your language model is secretly a reward model

The dataset used in the paper is not explicitly described. However, it is mentioned that the authors used a language model to optimize the performance of a reinforcement learning algorithm.

BibTex:

Before browse our site, please accept our cookies policy