Dataset - LDM

Neural keyphrase generation via reinforcement learning with adaptive rewards

A dataset for neural keyphrase generation.
- Dataset
- JSON
Select, extract and generate: Neural keyphrase generation with layer-wise cov...

A dataset for neural keyphrase generation with layer-wise coverage attention.
- Dataset
- JSON
KPEVAL: Towards Fine-Grained Semantic-Based Keyphrase Evaluation

A comprehensive evaluation framework for keyphrase systems, including reference agreement, faithfulness, diversity, and utility.
- Dataset
- JSON
DisCo-CLIP: A Distributed Contrastive Loss for Memory Efﬁcient CLIP Training

We propose DisCo-CLIP, a distributed memory-efﬁcient CLIP training approach, to reduce the memory consump- tion of contrastive loss when training contrastive learning models.
- Dataset
- JSON
Customer Service Calls Dataset

A dataset consisting of ten years of customer service calls to a fleet truck company.
- Dataset
- JSON
Ubuntu Dialogue Corpus

The Ubuntu Dialogue Corpus is the largest freely available multi-turn based dialogue corpus which consists of almost one million two-way conversations extracted from the Ubuntu...
- Dataset
- JSON
Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.
- Dataset
- JSON
CLIP

The CLIP model and its variants are becoming the de facto backbone in many applications. However, training a CLIP model from hundreds of millions of image-text pairs can be...
- Dataset
- JSON
GLUE

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...
- Dataset
- JSON
Interpreting Learned Feedback Patterns in Large Language Models

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a condensed representation of LLM activations obtained from sparse...
- Dataset
- JSON
Seq2SQL

Seq2SQL: Generating structured queries from natural language using reinforcement learning.
- Dataset
- JSON
WikiTableQuestions

Semantic parsing maps a user-issued natural language (NL) utterance to a machine-executable meaning representation (MR), such as λ−calculus (Zettlemoyer and Collins, 2005), SQL...
- Dataset
- JSON
ToolWriter: Generating query-specific tools for tabular question answering

Tabular question answering (TQA) presents a challenging setting for neural systems by requiring joint reasoning of natural language with large amounts of semi-structured data.
- Dataset
- JSON
Lang8

This dataset is used for training and evaluating the proposed SynGEC approach.
- Dataset
- JSON
NLPCC-18

This dataset is used for training and evaluating the proposed SynGEC approach.
- Dataset
- JSON
MuCGEC

This dataset is used for training and evaluating the proposed SynGEC approach.
- Dataset
- JSON
Penn Treebank (PTB) dataset

The Penn Treebank (PTB) dataset is used for word ordering task. The dataset is used to evaluate the performance of different models for word ordering.
- Dataset
- JSON
A CHEAPER AND BETTER DIFFUSION LANGUAGE MODEL WITH SOFT-MASKED NOISE

Diffusion models that are based on iterative denoising have been recently proposed and leveraged in various generation tasks like image generation.
- Dataset
- JSON
PAUSE: Positive and Annealed Unlabeled Sentence Embedding

PAUSE is a generic and end-to-end sentence embedding approach that exploits the labels and explores the unlabeled sentence pairs simultaneously.
- Dataset
- JSON
COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

420 datasets found