-
SST-1, SST-2, Subj, TREC, CR, MPQA
The dataset used for the experiments is a set of common datasets for natural language processing. -
Room-to-Room (R2R) dataset
The Room-to-Room (R2R) dataset is a benchmark for vision-and-language navigation tasks. It consists of 7,189 paths sampled from its navigation graphs, each with three... -
Intensionalizing Abstract Meaning Representations: Non-Veridicality and Scope
Abstract Meaning Representation (AMR) is a graphical meaning representation language designed to represent propositional information about argument structure. -
OpenAgents
OpenAgents is an open-source platform for using and hosting language agents, including three agents: Data Agent for data analysis, Plugins Agent for plugin integration, and Web... -
Learning Norms via Natural Language Teachings
The dataset used in this paper for learning norms from natural language text. -
Character-Aware Neural Networks for Word-Level Prediction: Do They Discover L...
Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns... -
Do Models Explain Themselves?
Do models explain themselves? counterfactual simulatability of natural language explanations -
Automata-based constraints for language model decoding
The dataset used in this paper is a collection of regular expressions and grammars for constraining language models. -
TQ+ datasets for whataboutism detection
Two new datasets for whataboutism detection -
Vietnamese Diacritic Restoration Dataset
The dataset used for Vietnamese diacritic restoration problem, consisting of 180,000 sentence pairs. -
IPA Transcription of Bengali Texts
A comprehensive study of IPA transcription issues and challenges for Bangla, a novel IPA transcription framework, a DUAL-IPA, a sentence level ipa transcripted parallel corpus... -
Corpus of Spoken Dutch
The Corpus of Spoken Dutch (CGN) is a dataset of spoken Dutch recordings. -
Language Models of Spoken Dutch
The dataset consists of subtitles of television shows provided by the Flemish public-service broadcaster VRT. The dataset is used to train language models of spoken Dutch. -
Scholarly Paper Recommendation via User's Recent Research Interests
The dataset used in this paper is a collection of research papers, and the authors propose a scholarly paper recommendation system. -
Interactive Research Paper Recommender System
The dataset used in this paper is a collection of research papers, and the authors propose an interactive research paper recommender system. -
Grammaticality Judgment Task
The dataset used in the paper is a grammaticality judgment task featuring four linguistic phenomena: anaphora, center embedding, comparatives, and negative polarity constructions.