Dataset - LDM

PatentEval Dataset

The PatentEval dataset is a comprehensive dataset for evaluating patent text generation.
- Dataset
- JSON
Big Patent Dataset

The Big Patent dataset is a large-scale dataset for abstractive and coherent summarization.
- Dataset
- JSON
Harvard USPTO Patent Dataset

The Harvard USPTO Dataset is a large-scale, well-structured, and multi-purpose corpus of patent applications.
- Dataset
- JSON
SQuAD 2.0

The SQuAD 2.0 dataset is a new challenging task for natural language processing, which requires that machine can read, understand, and answer questions about a text. The dataset...
- Dataset
- JSON
A Benchmark Dataset for Learning to Intervene in Online Hate Speech

A benchmark dataset for learning to intervene in online hate speech.
- Dataset
- JSON
Penn Treebank dataset

The dataset used in the paper is the Penn Treebank dataset, which is a large-scale text classification dataset.
- Dataset
- JSON
Ubuntu Dialogue Corpus

The Ubuntu Dialogue Corpus is the largest freely available multi-turn based dialogue corpus which consists of almost one million two-way conversations extracted from the Ubuntu...
- Dataset
- JSON
Seq2SQL

Seq2SQL: Generating structured queries from natural language using reinforcement learning.
- Dataset
- JSON
WikiTableQuestions

Semantic parsing maps a user-issued natural language (NL) utterance to a machine-executable meaning representation (MR), such as λ−calculus (Zettlemoyer and Collins, 2005), SQL...
- Dataset
- JSON
ToolWriter: Generating query-specific tools for tabular question answering

Tabular question answering (TQA) presents a challenging setting for neural systems by requiring joint reasoning of natural language with large amounts of semi-structured data.
- Dataset
- JSON
WordNet

This paper uses a large text corpus to extract subjects and objects of verbs and represents them as abstract concepts.
- Dataset
- JSON
Penn Treebank (PTB) dataset

The Penn Treebank (PTB) dataset is used for word ordering task. The dataset is used to evaluate the performance of different models for word ordering.
- Dataset
- JSON
PAUSE: Positive and Annealed Unlabeled Sentence Embedding

PAUSE is a generic and end-to-end sentence embedding approach that exploits the labels and explores the unlabeled sentence pairs simultaneously.
- Dataset
- JSON
MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...
- Dataset
- JSON
Leibniz University Hannover

Imported

STEM-NER-60k

A Large-scale Dataset of STEM Science as PROCESS, METHOD, MATERIAL, and DATA Named Entities This repository hosts data as a follow-up study to the following publications...
- Imported Dataset
- ZIP
Leibniz University Hannover

Imported

SemEval-2021 Task 11 Shared Task Dataset

NLPContributionGraph - Structuring Scholarly NLP Contributions in the Open Research Knowledge Graph Background NLPContributionGraph was introduced as Task 11 at SemEval 2021 for...
- Imported Dataset
- json, pdf, txt
Leibniz University Hannover

Imported

NLPContributionGraph Trial Dataset

An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature This dataset is the result of a pilot annotation exercise to...
- Imported Dataset
- JSON
Leibniz University Hannover

Imported

CS-NER

Computer Science Named Entity Recognition in the Open Research Knowledge Graph 1) About This work proposes a standardized CS-NER task by defining a set of seven...
- Imported Dataset
- TXT
SemEval-2021 Task 11 Shared Task Dataset

NLPContributionGraph - Structuring Scholarly NLP Contributions in the Open Research Knowledge Graph Background NLPContributionGraph was introduced as Task 11 at SemEval 2021 for...
- Dataset
- json, pdf, txt

You can also access this registry using the API (see API Docs).

219 datasets found

Leibniz University Hannover

Imported

Leibniz University Hannover

Imported

Leibniz University Hannover

Imported

Leibniz University Hannover

Imported