Natural Language Processing - Groups

A Joint Model for Deﬁnition Extraction with Syntactic Connection and Semantic...

Deﬁnition Extraction (DE) is one of the well-known topics in Information Extraction that aims to identify terms and their corresponding deﬁnitions in unstructured texts.

Dataset
JSON

Chimera dataset

The Chimera dataset is a ‘Chimera’ dataset of (Lazaridou et al., 2017). This dataset was speciﬁcally constructed to sim- ulate a nonce situation where a speaker encoun- ters a...

Dataset
JSON

TaxiXNLI (translated)

Multilingual extension of the TAXINLI dataset for analyzing the effects of reasoning types on cross-lingual transfer performance.

Dataset
JSON

TaxiXNLI (diagnostic)

Multilingual extension of the TAXINLI dataset for analyzing the effects of reasoning types on cross-lingual transfer performance.

Dataset
JSON

TaxiXNLI

Multilingual extension of the TAXINLI dataset for analyzing the effects of reasoning types on cross-lingual transfer performance.

Dataset
JSON

Corpus of Linguistic Acceptability (CoLA)

The Corpus of Linguistic Acceptability (CoLA) is a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature.

Dataset
JSON

Execution-based Evaluation for NL2Bash

A set of 50 prompts to evaluate execution-based evaluation for NL2Bash task

Dataset
JSON

Words2Contact

The Words2Contact dataset contains verbal instructions for humanoid robots to place support contacts.

Dataset
JSON

Word2Vec: A Novel Semi-Supervised Learning Approach for Word Embeddings

Word2Vec is a technique for learning vector representations of words in a text corpus.

Dataset
JSON

SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity

SimVerb-3500 is a large-scale evaluation set of verb similarity, providing human ratings for the similarity of 3,500 verb pairs.

Dataset
JSON

WikiText-2 dataset

The WikiText-2 dataset is a benchmark for evaluating the performance of large language models.

Dataset
JSON

C4 dataset

The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset.

Dataset
JSON

APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large La...

Large Language Models (LLMs) have greatly advanced the natural language processing paradigm. However, the high computational load and huge model sizes pose a grand challenge for...

Dataset
JSON

Automated discovery of mathematical definitions in text

Automated discovery of mathematical definitions in text.

Dataset
JSON

MathGloss

MathGloss is a project to create a knowledge graph (KG) for undergraduate mathematics from text, automatically, using modern natural language processing (NLP) tools and...

Dataset
JSON

DEERLET

DEERLET is a dataset containing 846 tuples with format (fact, rule, label0, label1, label2, label3) for the task of classifying the capabilities required by inductive reasoning.

Dataset
JSON

DEER

DEER is a dataset containing 1.2k rule-fact pairs for the task of inducing natural language rules from natural language facts.

Dataset
JSON

Language Models as Inductive Reasoners

Inductive reasoning is a core component of human intelligence. In the past research of inductive reasoning within computer science, logic language is used as representations of...

Dataset
JSON

CoNLL-2016 Shared Task

The CoNLL-2016 Shared Task (CoNLL16) provides more abundant annotation for shadow discourse parsing.

Dataset
JSON

Penn Discourse Treebank 2.0

The Penn Discourse Treebank 2.0 (PDTB 2.0) is a large scale corpus containing 2,312 Wall Street Journal (WSJ) articles.

Dataset
JSON

530 datasets found