-
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction...
The dataset used in the paper to evaluate the effectiveness of the BEEAR method in mitigating safety backdoors in instruction-tuned LLMs. -
NL_trajectory_reshaper
A dataset containing robot trajectories modified by language commands, used for training a multi-modal attention transformer model. -
TreeMAN: Tree-enhanced Multimodal Attention Network for ICD Coding
ICD coding is designed to assign the disease codes to electronic health records (EHRs) upon discharge, which is crucial for billing and clinical statistics. In an attempt to... -
MIMIC-IV ED
MIMIC-IV ED dataset used for outcome prediction and patient triage in hospital emergency department based on text information in chief complaints and vital signs recorded at... -
Existing ACQ datasets
A few existing datasets for asking clarification questions -
FLM-HotpotQA
A dataset for pragmatic evaluation of clarifying questions and fact-level masking -
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs fro...
The dataset used in this paper is a collection of natural language generation tasks, including general knowledge, biology and medicine, general domain questions from Google... -
LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction
LNMap: Departures from isomorphic assumption in bilingual lexicon induction through non-linear mapping in latent space. -
Learning Principled Bilingual Word Embeddings
Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. -
RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction
Bilingual lexicon induction induces the word translations by aligning independently trained word embeddings in two languages. -
YouTube Clickbait Detection Dataset
The dataset is a collection of online videos from YouTube, with comments and metadata. It is used to evaluate the performance of the Online Video Clickbait Protector (OVCP) scheme. -
Super-NaturalInstructions (SNI) dataset
The Super-NaturalInstructions (SNI) dataset is a collection of 1761 diverse NLP tasks belonging to one of 76 task types. -
Generative Agents framework
Generative Agents framework by Park et al., aimed at enhancing the efficient retrieval of key events for general-purpose LLM agents.