Representer Theorems for Metric and Preference Learning
The dataset is used to demonstrate the representer theorem for simultaneous metric and preference learning from paired comparisons and metric learning from triplet comparisons. -
PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models
Instruction-finetuned code language models have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language... -
Anthropic HH dataset
The Anthropic HH dataset is a general-purpose preference dataset for helpfulness and harmlessness.