-
ShiftT: Simulation-to-Human Instruction Following via Transfer from Text
The dataset used in the paper is a collection of natural human instructions for a 3D room containing everyday objects. The instructions are used to train an agent to follow... -
Phi-2: A Dataset for Language Model Evaluation
The Phi-2 dataset is a collection of language models used to evaluate the performance of language models. -
STAMP 4 NLP
STAMP 4 NLP is an instantiable, iterative, and incremental process model for developing natural language processing applications with a focus on quality, business value, and... -
CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset
A comprehensive dataset for post-OCR parsing and receipt understanding, specifically designed to enhance OCR and information extraction from receipts in multilingual contexts... -
PRIVACY-PRESERVING IN-CONTEXT LEARNING FOR LARGE LANGUAGE MODELS
In-context learning (ICL) is an important capability of Large Language Models (LLMs), enabling these models to dynamically adapt based on specific, in-context exemplars, thereby... -
CIFAR-10, FEMNIST, and IMDB
The dataset used in the paper is CIFAR-10, FEMNIST, and IMDB. The authors used these datasets to evaluate the performance of the EmbracingFL framework. -
Virtual Language Observatory (VLO)
The Virtual Language Observatory (VLO) is a web application equipped with easy-to-use Natural Language Processing tools. -
MNLI: Multi-Genre Natural Language Inference
Propose a method for evaluating gender bias in contextualised word embeddings. -
SEAT: Sentence Encoder Association Test
Propose a method for evaluating gender bias in contextualised word embeddings. -
Reducing Retraining by Recycling Parameter-Efficient Prompts
Parameter-efficient methods are able to use a single frozen pre-trained large language model to perform many tasks by learning task-specific soft prompts that modulate model... -
COMMUNITY-CROSS-INSTRUCT
COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities -
Global Neural CCG Parsing with Optimality Guarantees
Global Neural CCG Parsing with Optimality Guarantees -
Grounded response generation task at DSTC7
Grounded response generation task at DSTC7 -
Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish T...
The Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts -
SQuAD: 100,000+ Questions for Machine Comprehension of Text
The SQuAD dataset is a benchmark for natural language understanding tasks, including question answering and text classification. -
FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language
FarFetched is a modular framework that enables people to verify any kind of textual claim based on the incorporated evidence from textual news sources. -
Stock Movement and Volatility Prediction from Tweets, Macroeconomic Factors a...
The dataset used in the paper for stock movement and volatility prediction from tweets, macroeconomic factors and historical prices. -
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is...