-
ChatGPT Dataset
The dataset used in this study consists of a large language model (LLM) enabled platform - ChatGPT. -
DailyDialog
The DailyDialog dataset is a large-scale multi-turn dialogue dataset, consisting of 10,000 conversations with 5 turns each. -
SHP and HH
The dataset used in the paper is SHP and HH. -
Demonstration ITerated Task Optimization (DITTO)
The dataset used in the paper is a collection of email and blog posts from 20 distinct authors, with a focus on few-shot alignment of large language models. -
DEMYSTIFYING CLIP DATA
Contrastive Language-Image Pre-training (CLIP) is an approach that has advanced research and applications in computer vision, fueling modern recognition systems and generative... -
CodeSearchNet
The dataset used in the paper is CodeSearchNet, a natural language code search benchmark for six programming languages (Python, Java, Javascript, Ruby, PHP, and Go). -
EmpatheticDialogues
The EmpatheticDialogues dataset is a text dataset for training empathetic AI chatbots, consisting of 25k conversations grounded in emotional situations with emotion labels. -
BookCorpus
The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text. -
PatentEval Dataset
The PatentEval dataset is a comprehensive dataset for evaluating patent text generation. -
Big Patent Dataset
The Big Patent dataset is a large-scale dataset for abstractive and coherent summarization. -
Harvard USPTO Patent Dataset
The Harvard USPTO Dataset is a large-scale, well-structured, and multi-purpose corpus of patent applications. -
Training Dataset
The training dataset is a collection of the publicly available Arabic corpora listed below: The unshuffled OSCAR corpus (Ortiz Su´arez et al., 2020). The Arabic Wikipedia dump... -
RPC-Lex: A dictionary to measure German right-wing populist conspiracy discou...
A dictionary to measure German right-wing populist conspiracy discourse online. -
A Benchmark Dataset for Learning to Intervene in Online Hate Speech
A benchmark dataset for learning to intervene in online hate speech.