Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers

The dataset consists of approximately 10 million question-answer pairs from multiple languages covering diverse fields such as math and language, and strong variation in question and answer syntax.

BibTex: