Self-Supervised Alignment with Mutual Information

The dataset is used for training a language model to follow behavioral principles without the use of preference labels, demonstrations, or human oversight.

BibTex: