Natural Language Processing - Groups

BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction...

The dataset used in the paper to evaluate the effectiveness of the BEEAR method in mitigating safety backdoors in instruction-tuned LLMs.

Dataset
JSON

NL_trajectory_reshaper

A dataset containing robot trajectories modified by language commands, used for training a multi-modal attention transformer model.

Dataset
JSON

TreeMAN: Tree-enhanced Multimodal Attention Network for ICD Coding

ICD coding is designed to assign the disease codes to electronic health records (EHRs) upon discharge, which is crucial for billing and clinical statistics. In an attempt to...

Dataset
JSON

MIMIC-IV ED

MIMIC-IV ED dataset used for outcome prediction and patient triage in hospital emergency department based on text information in chief complaints and vital signs recorded at...

Dataset
JSON

PARRY

A chatbot designed by Colby et al. to imitate aggressive emotions.

Dataset
JSON

mT5

A multilingual version of the seq2seq architecture trained on Colossal Clean Crawled Corpus.

Dataset
JSON

RetGen

A hybrid retrieval-augmented/grounded version of the seq2seq architecture.

Dataset
JSON

DLGNet

A multi-turn dialogue response generator that was evaluated using automatic metrics.

Dataset
JSON

Meena

A multi-turn open-domain conversational AI seq2seq model that was trained end-to-end.

Dataset
JSON

Existing ACQ datasets

A few existing datasets for asking clarification questions

Dataset
JSON

FLM-HotpotQA

A dataset for pragmatic evaluation of clarifying questions and fact-level masking

Dataset
JSON

Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs fro...

The dataset used in this paper is a collection of natural language generation tasks, including general knowledge, biology and medicine, general domain questions from Google...

Dataset
JSON

mC4

Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there...

Dataset
JSON

LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction

LNMap: Departures from isomorphic assumption in bilingual lexicon induction through non-linear mapping in latent space.

Dataset
JSON

Learning Principled Bilingual Word Embeddings

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance.

Dataset
JSON

RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction

Bilingual lexicon induction induces the word translations by aligning independently trained word embeddings in two languages.

Dataset
JSON

YouTube Clickbait Detection Dataset

The dataset is a collection of online videos from YouTube, with comments and metadata. It is used to evaluate the performance of the Online Video Clickbait Protector (OVCP) scheme.

Dataset
JSON

Super-NaturalInstructions (SNI) dataset

The Super-NaturalInstructions (SNI) dataset is a collection of 1761 diverse NLP tasks belonging to one of 76 task types.

Dataset
JSON

Generative Agents framework

Generative Agents framework by Park et al., aimed at enhancing the efficient retrieval of key events for general-purpose LLM agents.

Dataset
JSON

CSQA

The CSQA dataset is a widely used benchmark dataset for conversational KBQA, consisting of around 200K dialogues where training set, validation set and testing set contain 153K,...

Dataset
JSON

530 datasets found