-
Reducing Retraining by Recycling Parameter-Efficient Prompts
Parameter-efficient methods are able to use a single frozen pre-trained large language model to perform many tasks by learning task-specific soft prompts that modulate model... -
COMMUNITY-CROSS-INSTRUCT
COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities -
Global Neural CCG Parsing with Optimality Guarantees
Global Neural CCG Parsing with Optimality Guarantees -
Grounded response generation task at DSTC7
Grounded response generation task at DSTC7 -
Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish T...
The Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts -
SQuAD: 100,000+ Questions for Machine Comprehension of Text
The SQuAD dataset is a benchmark for natural language understanding tasks, including question answering and text classification. -
FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language
FarFetched is a modular framework that enables people to verify any kind of textual claim based on the incorporated evidence from textual news sources. -
Stock Movement and Volatility Prediction from Tweets, Macroeconomic Factors a...
The dataset used in the paper for stock movement and volatility prediction from tweets, macroeconomic factors and historical prices. -
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is... -
SST-1, SST-2, Subj, TREC, CR, MPQA
The dataset used for the experiments is a set of common datasets for natural language processing. -
Room-to-Room (R2R) dataset
The Room-to-Room (R2R) dataset is a benchmark for vision-and-language navigation tasks. It consists of 7,189 paths sampled from its navigation graphs, each with three... -
Intensionalizing Abstract Meaning Representations: Non-Veridicality and Scope
Abstract Meaning Representation (AMR) is a graphical meaning representation language designed to represent propositional information about argument structure. -
OpenAgents
OpenAgents is an open-source platform for using and hosting language agents, including three agents: Data Agent for data analysis, Plugins Agent for plugin integration, and Web... -
Learning Norms via Natural Language Teachings
The dataset used in this paper for learning norms from natural language text. -
Character-Aware Neural Networks for Word-Level Prediction: Do They Discover L...
Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns... -
Do Models Explain Themselves?
Do models explain themselves? counterfactual simulatability of natural language explanations -
Automata-based constraints for language model decoding
The dataset used in this paper is a collection of regular expressions and grammars for constraining language models.