Reward Model Ensembles

The authors used three datasets: TL;DR, HELPFULNESS, and XSUM/NLI.

BibTex: