-
PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models
Instruction-finetuned code language models have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language... -
Anthropic HH dataset
The Anthropic HH dataset is a general-purpose preference dataset for helpfulness and harmlessness.