GLUE benchmark

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used three downstream tasks from the GLUE benchmark: Stanford Sentiment Treebank (SST-2), Natural Language Inference (MNLI), and Paraphrase Identification (QQP).

BibTex: