-
A Joint Model for Definition Extraction with Syntactic Connection and Semantic...
Definition Extraction (DE) is one of the well-known topics in Information Extraction that aims to identify terms and their corresponding definitions in unstructured texts. -
Chimera dataset
The Chimera dataset is a ‘Chimera’ dataset of (Lazaridou et al., 2017). This dataset was specifically constructed to sim- ulate a nonce situation where a speaker encoun- ters a... -
TaxiXNLI (translated)
Multilingual extension of the TAXINLI dataset for analyzing the effects of reasoning types on cross-lingual transfer performance. -
TaxiXNLI (diagnostic)
Multilingual extension of the TAXINLI dataset for analyzing the effects of reasoning types on cross-lingual transfer performance. -
Corpus of Linguistic Acceptability (CoLA)
The Corpus of Linguistic Acceptability (CoLA) is a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. -
Execution-based Evaluation for NL2Bash
A set of 50 prompts to evaluate execution-based evaluation for NL2Bash task -
Words2Contact
The Words2Contact dataset contains verbal instructions for humanoid robots to place support contacts. -
Word2Vec: A Novel Semi-Supervised Learning Approach for Word Embeddings
Word2Vec is a technique for learning vector representations of words in a text corpus. -
SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity
SimVerb-3500 is a large-scale evaluation set of verb similarity, providing human ratings for the similarity of 3,500 verb pairs. -
WikiText-2 dataset
The WikiText-2 dataset is a benchmark for evaluating the performance of large language models. -
C4 dataset
The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset. -
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large La...
Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for... -
Automated discovery of mathematical definitions in text
Automated discovery of mathematical definitions in text. -
Language Models as Inductive Reasoners
Inductive reasoning is a core component of human intelligence. In the past research of inductive reasoning within computer science, logic language is used as representations of... -
CoNLL-2016 Shared Task
The CoNLL-2016 Shared Task (CoNLL16) provides more abundant annotation for shadow discourse parsing. -
Penn Discourse Treebank 2.0
The Penn Discourse Treebank 2.0 (PDTB 2.0) is a large scale corpus containing 2,312 Wall Street Journal (WSJ) articles.