-
WikiText-2 dataset
The WikiText-2 dataset is a benchmark for evaluating the performance of large language models. -
C4 dataset
The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset. -
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large La...
Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for... -
Automated discovery of mathematical definitions in text
Automated discovery of mathematical definitions in text. -
Language Models as Inductive Reasoners
Inductive reasoning is a core component of human intelligence. In the past research of inductive reasoning within computer science, logic language is used as representations of... -
CoNLL-2016 Shared Task
The CoNLL-2016 Shared Task (CoNLL16) provides more abundant annotation for shadow discourse parsing. -
Penn Discourse Treebank 2.0
The Penn Discourse Treebank 2.0 (PDTB 2.0) is a large scale corpus containing 2,312 Wall Street Journal (WSJ) articles. -
Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Re...
Implicit Discourse Relation Recognition (IDRR), which infers discourse relations without the help of explicit connectives, is still a crucial and challenging task for discourse... -
QQP Dataset
The QQP dataset contains more than 400k question pairs. -
Penn Tree Bank
The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The... -
Self-Recognition in Language Models
A self-recognition test for language models using model-generated security questions. -
Confidence Calibration in Large Language Models
The dataset used in this study to analyze the self-assessment behavior of Large language models. -
Xl-sum: Large-scale multilingual abstractive summarization
The Xl-sum dataset for multilingual abstractive summarization -
Cross-Lingual Ability of Multilingual BERT
The Cross-Lingual Ability of Multilingual BERT dataset -
Multilingual Language Models
The dataset used in this paper for multilingual language models -
SST-2, SNLI, and PubMed datasets
The dataset used in the paper is a collection of sentence classification tasks, including SST-2, SNLI, and PubMed.