Dataset - LDM

Quantum Process Logic - Take IIb

The dataset consists of a graphical language for describing quantum phenomena and meaning-related linguistic phenomena.
- Dataset
- JSON
Quantum Process Logic - Take IIa

The dataset consists of a graphical language for describing quantum phenomena and meaning-related linguistic phenomena.
- Dataset
- JSON
Quantum Process Logic

The dataset consists of a graphical language for describing quantum phenomena and meaning-related linguistic phenomena.
- Dataset
- JSON
No language left behind: Scaling human-centered machine translation

The dataset is used for training and testing the performance of multilingual language models.
- Dataset
- JSON
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration fo...

CBBQ is a Chinese Bias Benchmark dataset curated with Human-AI Collaboration for Large Language Models. It consists of over 100K questions jointly constructed by human experts...
- Dataset
- JSON
PFN Picking Instructions for Commodities Dataset (PFN-PIC)

A new challenging dataset for real-world object picking tasks, consisting of 1,180 images with bounding boxes and text instructions annotated.
- Dataset
- JSON
A natural language fmri dataset for voxelwise encoding models

A natural language fmri dataset for voxelwise encoding models.
- Dataset
- JSON
Augmenting Interpretable Models with LLMs during Training

Aug-GAM and Aug-Tree are two instantiations of Aug-imodels, a framework for leveraging the knowledge learned by LLMs to build extremely eﬃcient and interpretable models.
- Dataset
- JSON
Universal Dependencies

Universal Dependencies (Nivre et al., 2020) provides an extensive testing ground for such scenarios: Its language diversity is constantly increasing (from 10 in v1.0 to 104 in...
- Dataset
- JSON
SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles

SemEval-2010 Task 5: Automatic Keyphrase Extraction from Scientific Articles
- Dataset
- JSON
MixATIS

The MixATIS dataset is a large-scale dataset for spoken language understanding, containing 13,162 utterances for training, 756 utterances for validation, and 828 utterances for...
- Dataset
- JSON
MixSNIPS

The MixSNIPS dataset is a large-scale dataset for spoken language understanding, containing 39,776 utterances for training, 2,198 utterances for validation, and 2,199 utterances...
- Dataset
- JSON
DialogZoo

A large-scale dialogue dataset with rich task diversity, collected to pre-train a unified dialogue foundation model.
- Dataset
- JSON
WebSRC

WebSRC dataset for web-based structural reading comprehension.
- Dataset
- JSON
TIE: Topological Information Enhanced Structural Reading Comprehension on Web...

Topological Information Enhanced model (TIE) for web-based structural reading comprehension on web pages.
- Dataset
- JSON
Annotation Tool dataset

A dataset of annotations for the Interactive Gameplay dataset.
- Dataset
- JSON
Interactive Gameplay dataset

A dataset of natural language commands written by crowd-sourced workers for an interactive Minecraft game.
- Dataset
- JSON
Image and Text Prompts dataset

A dataset of natural language commands written by crowd-sourced workers for an interactive Minecraft game.
- Dataset
- JSON
CraftAssist Instruction Parsing (CAIP) dataset

A large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft.
- Dataset
- JSON
EQG-RACE

Educational Question Generation (QG) dataset, used to train and evaluate QG models.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

530 datasets found