Do Models Explain Themselves?

Do models explain themselves? counterfactual simulatability of natural language explanations

BibTex: