Natural Language Processing - Groups

Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Re...

Implicit Discourse Relation Recognition (IDRR), which infers discourse relations without the help of explicit connectives, is still a crucial and challenging task for discourse...

Dataset
JSON

Banking77

The Banking77 dataset is a specialized dataset for intent classification in the banking domain.

Dataset
JSON

QQP Dataset

The QQP dataset contains more than 400k question pairs.

Dataset
JSON

Penn Tree Bank

The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...

Dataset
JSON

Self-Recognition in Language Models

A self-recognition test for language models using model-generated security questions.

Dataset
JSON

Confidence Calibration in Large Language Models

The dataset used in this study to analyze the self-assessment behavior of Large language models.

Dataset
JSON

Xl-sum: Large-scale multilingual abstractive summarization

The Xl-sum dataset for multilingual abstractive summarization

Dataset
JSON

Cross-Lingual Ability of Multilingual BERT

The Cross-Lingual Ability of Multilingual BERT dataset

Dataset
JSON

Multilingual Language Models

The dataset used in this paper for multilingual language models

Dataset
JSON

SST-2, SNLI, and PubMed datasets

The dataset used in the paper is a collection of sentence classification tasks, including SST-2, SNLI, and PubMed.

Dataset
JSON

Corpus Pairs Dataset

Corpus pairs dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs.

Dataset
JSON

Minimal Pairs Dataset

Minimal pairs dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs.

Dataset
JSON

Sentiment Training Dataset

Sentiment training dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs.

Dataset
JSON

QQP

The Quora Question Pairs (QQP) dataset consists of 50,000 question pairs labeled with paraphrase or non-paraphrase.

Dataset
JSON

BEiT

The BEiT dataset used for the experiments in the paper.

Dataset
JSON

D3

The D3 dataset contains a curated sample of social media posts from Jigsaw datasets (Jigsaw, 2019, 2018), annotated for offensiveness in text.

Dataset
JSON

DICES-350

The DICES-350 dataset is a curated sample of 8k multi-turn conversation corpus generated by human agents interacting with a generative AI-chatbot (Thoppilan et al., 2022) in an...

Dataset
JSON

GRASP: A Disagreement Analysis Framework to Assess Group Associations in Pers...

Human annotation plays a core role in machine learning — annotations for supervised models, safety guardrails for generative models, and human feedback for reinforcement...

Dataset
JSON

ChatGPT: A conversational AI model

The dataset used in the paper ChatGPT: A conversational AI model.

Dataset
JSON

Latent Distance Guided Alignment Training for Large Language Models

Ensuring alignment with human preferences is a crucial characteristic of large language models (LLMs). Presently, the primary alignment methods, RLHF and DPO, require extensive...

Dataset
JSON

530 datasets found