Activity Stream
-
admin updated the dataset Direct preference optimization: Your language model is secretly a reward model
2 weeks ago | View this version | Changes -
admin created the dataset Direct preference optimization: Your language model is secretly a reward model
2 weeks ago | View this version