Negative preference optimization: From catastrophic collapse to effective unlearning

A dataset for testing the effectiveness of unlearning methods in large language models.

BibTex: