Natural Language Processing - Groups

SST-1, SST-2, Subj, TREC, CR, MPQA

The dataset used for the experiments is a set of common datasets for natural language processing.

Dataset
JSON

SCAN

SCAN is a semantic parsing dataset of natural language navigation commands that are mapped to corresponding action sequences.

Dataset
JSON

Room-to-Room (R2R) dataset

The Room-to-Room (R2R) dataset is a benchmark for vision-and-language navigation tasks. It consists of 7,189 paths sampled from its navigation graphs, each with three...

Dataset
JSON

Intensionalizing Abstract Meaning Representations: Non-Veridicality and Scope

Abstract Meaning Representation (AMR) is a graphical meaning representation language designed to represent propositional information about argument structure.

Dataset
JSON

OpenAgents

OpenAgents is an open-source platform for using and hosting language agents, including three agents: Data Agent for data analysis, Plugins Agent for plugin integration, and Web...

Dataset
JSON

Learning Norms via Natural Language Teachings

The dataset used in this paper for learning norms from natural language text.

Dataset
JSON

Character-Aware Neural Networks for Word-Level Prediction: Do They Discover L...

Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns...

Dataset
JSON

Do Models Explain Themselves?

Do models explain themselves? counterfactual simulatability of natural language explanations

Dataset
JSON

XTOWER

A multilingual LLM for explaining and correcting translation errors

Dataset
JSON

Automata-based constraints for language model decoding

The dataset used in this paper is a collection of regular expressions and grammars for constraining language models.

Dataset
JSON

TQ+ datasets for whataboutism detection

Two new datasets for whataboutism detection

Dataset
JSON

Vietnamese Diacritic Restoration Dataset

The dataset used for Vietnamese diacritic restoration problem, consisting of 180,000 sentence pairs.

Dataset
JSON

IPA Transcription of Bengali Texts

A comprehensive study of IPA transcription issues and challenges for Bangla, a novel IPA transcription framework, a DUAL-IPA, a sentence level ipa transcripted parallel corpus...

Dataset
JSON

Corpus of Spoken Dutch

The Corpus of Spoken Dutch (CGN) is a dataset of spoken Dutch recordings.

Dataset
JSON

Language Models of Spoken Dutch

The dataset consists of subtitles of television shows provided by the Flemish public-service broadcaster VRT. The dataset is used to train language models of spoken Dutch.

Dataset
JSON

Scholarly Paper Recommendation via User's Recent Research Interests

The dataset used in this paper is a collection of research papers, and the authors propose a scholarly paper recommendation system.

Dataset
JSON

Interactive Research Paper Recommender System

The dataset used in this paper is a collection of research papers, and the authors propose an interactive research paper recommender system.

Dataset
JSON

OPT

The dataset used in the paper is OPT, a large language model.

Dataset
JSON

LLaMA

The dataset used in the paper is LLaMA, a large language model.

Dataset
JSON

Grammaticality Judgment Task

The dataset used in the paper is a grammaticality judgment task featuring four linguistic phenomena: anaphora, center embedding, comparatives, and negative polarity constructions.

Dataset
JSON

530 datasets found