-
nepal_queensland
The dataset used in the paper for crisis domain adaptation using sequence-to-sequence transformers. -
Femicide perception dataset
Femicide perception dataset: a large-scale perception survey of GBV descriptions automatically extracted from a corpus of Italian newspapers. -
Wang271K Dataset
The Wang271K dataset is used for Chinese Spelling Check (CSC) task, with a large number of Chinese characters and their corresponding errors. -
SIGHAN Datasets
The SIGHAN datasets are used for Chinese Spelling Check (CSC) task, with a limited number of Chinese characters and their corresponding errors. -
Chinese Spelling Check Dataset
The dataset is used for Chinese Spelling Check (CSC) task, with a large number of Chinese characters and their corresponding errors. -
Text Summarization
The dataset used for the text summarization task, where a summarizer produces an utterance made up of one or multiple sentences to succinctly report the main content of a text. -
Unsupervised alignment of embeddings with Wasserstein procrustes
This study introduces a new method for unsupervised alignment of embeddings with Wasserstein procrustes. -
Discovering Universal Geometry in Embeddings with ICA
This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. -
COVID-19 Twitter Data
The COVID-19 Twitter Data dataset contains tweets about the COVID-19 pandemic. -
XNMT: The eXtensible Neural Machine Translation Toolkit
XNMT is a neural machine translation toolkit that focuses on modular code design, making it easy to swap in and out different parts of the model. -
Towards Trustworthy AutoGrading of Short, Multi-lingual, Multi-type Answers
The dataset consists of approximately 10 million question-answer pairs from multiple languages covering diverse fields such as math and language, and strong variation in... -
One-stage Visual Grounding
A fast and accurate one-stage approach to visual grounding -
InstanceRefer
Cooperative holistic understanding for visual grounding on point clouds through instance multi-level contextual referring -
Free-form description guided 3D visual graph network for object grounding in ...
Free-form description guided 3D visual graph network for 3D object grounding in point clouds -
CellulaRoBERTa
A dataset of 22,000 cellular documents, including technical specifications, change requests, and technical reports. -
Anthropic Helpfulness and Harmlessness (HH) dataset
The dataset used in this paper is the Anthropic Helpfulness and Harmlessness (HH) dataset, which features a wide variety of user queries and is commonly used for training... -
Multilingual Eye-movement Corpus (MECO)
The Multilingual Eye-movement Corpus (MECO) is a collection of eye-tracking data that has been collected from participants reading texts in 13 languages.