-
XNMT: The eXtensible Neural Machine Translation Toolkit
XNMT is a neural machine translation toolkit that focuses on modular code design, making it easy to swap in and out different parts of the model. -
Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers
The dataset consists of approximately 10 million question-answer pairs from multiple languages covering diverse fields such as math and language, and strong variation in... -
One-stage Visual Grounding
A fast and accurate one-stage approach to visual grounding -
InstanceRefer
Cooperative holistic understanding for visual grounding on point clouds through instance multi-level contextual referring -
Free-form description guided 3D visual graph network for object grounding in ...
Free-form description guided 3D visual graph network for 3D object grounding in point clouds -
CellulaRoBERTa
A dataset of 22,000 cellular documents, including technical specifications, change requests, and technical reports. -
Anthropic Helpfulness and Harmlessness (HH) dataset
The dataset used in this paper is the Anthropic Helpfulness and Harmlessness (HH) dataset, which features a wide variety of user queries and is commonly used for training... -
Multilingual Eye-movement Corpus (MECO)
The Multilingual Eye-movement Corpus (MECO) is a collection of eye-tracking data that has been collected from participants reading texts in 13 languages. -
ShiftT: Simulation-to-Human Instruction Following via Transfer from Text
The dataset used in the paper is a collection of natural human instructions for a 3D room containing everyday objects. The instructions are used to train an agent to follow... -
Phi-2: A Dataset for Language Model Evaluation
The Phi-2 dataset is a collection of language models used to evaluate the performance of language models. -
STAMP 4 NLP
STAMP 4 NLP is an instantiable, iterative, and incremental process model for developing natural language processing applications with a focus on quality, business value, and... -
CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset
A comprehensive dataset for post-OCR parsing and receipt understanding, specifically designed to enhance OCR and information extraction from receipts in multilingual contexts... -
PRIVACY-PRESERVING IN-CONTEXT LEARNING FOR LARGE LANGUAGE MODELS
In-context learning (ICL) is an important capability of Large Language Models (LLMs), enabling these models to dynamically adapt based on specific, in-context exemplars, thereby... -
CIFAR-10, FEMNIST, and IMDB
The dataset used in the paper is CIFAR-10, FEMNIST, and IMDB. The authors used these datasets to evaluate the performance of the EmbracingFL framework. -
Virtual Language Observatory (VLO)
The Virtual Language Observatory (VLO) is a web application equipped with easy-to-use Natural Language Processing tools. -
MNLI: Multi-Genre Natural Language Inference
Propose a method for evaluating gender bias in contextualised word embeddings. -
SEAT: Sentence Encoder Association Test
Propose a method for evaluating gender bias in contextualised word embeddings.