Dataset - LDM

AI-hub Dialogue Dataset

AI-hub dialogue dataset for Korean dialogue processing
- Dataset
- JSON
KLUE

KLUE benchmark dataset for Korean language understanding
- Dataset
- JSON
Incomplete Syntax Influence Korean Language Model

Syntactically Incomplete Korean (SIKO) dataset for Korean language models
- Dataset
- JSON
IT Job Detection Dataset

Dataset for job detection in Twitter
- Dataset
- JSON
Job Detection in Twitter

Job detection in Twitter using Skip-gram model and word2vec
- Dataset
- JSON
FST Morphological Analyser and Generator for Mapudüngun

FST Morphological Analyser and Generator for Mapudüngun
- Dataset
- JSON
u-SNLI

Uncertain Natural Language Inference (UNLI) dataset, a refinement of Natural Language Inference (NLI) that shifts away from categorical labels to the direct prediction of...
- Dataset
- JSON
Linguistic Data Set

The dataset used in this paper is a linguistic data set consisting of co-occurrences of 54 nouns and 58 adjectives in Charles Dickens' novel David Copperfield.
- Dataset
- JSON
NGEP: A Graph-based Event Planning Framework for Story Generation

NGEP: A Graph-based Event Planning Framework for Story Generation
- Dataset
- JSON
ECC Analyzer

The ECC Analyzer dataset is a collection of earnings conference calls (ECCs) with their corresponding transcripts and audio recordings.
- Dataset
- JSON
WinoBias

The dataset used in the paper is a collection of sentences for the task of pronoun resolution.
- Dataset
- JSON
NounPP

The dataset used in the paper is a collection of sentences for the tasks of subject-verb agreement and anaphora resolution.
- Dataset
- JSON
EC AI Platform

The dataset used in the paper is not explicitly described, but it is mentioned that the authors evaluated GPT-4 against three applications built with the EC AI platform for...
- Dataset
- JSON
Experiments with multilingual and language-specific pre-trained masked langua...

The datasets used in the experiments are annotated according to the Unimorph schema guidelines.
- Dataset
- JSON
SIGMORPHON 2019 datasets

The datasets developed for the SIGMORPHON 2019 lemmatization task are annotated according to the Unimorph schema guidelines.
- Dataset
- JSON
SNLI and MultiNLI datasets

The dataset used in the paper is the SNLI and MultiNLI datasets, which are used for natural language inference tasks.
- Dataset
- JSON
Sanskrit Text Annotation

The Sanskrit text is annotated with various NLP tasks, including sentence boundary detection, canonical word ordering, free-form text annotation of tokens, token classification,...
- Dataset
- JSON
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algori...

Six-bit quantization can effectively reduce the size of large language models and preserve the model quality consistently across varied applications.
- Dataset
- JSON
Furiously Can Colourless Green Ideas Sleep?

The dataset used in the paper to study the inﬂuence of context on sentence acceptability.
- Dataset
- JSON
Unfun.me dataset

A dataset of satirical and similar-but-serious-looking headlines collected via Unfun.me, an online game.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

530 datasets found