Dataset - LDM

NLPeer dataset

A unified resource for the computational study of peer review.
- Dataset
- JSON
A citation-based method for automatic indexing of Chinese academic literatures

The dataset used in this paper for citation-based method for automatic indexing of Chinese academic literatures.
- Dataset
- JSON
WINGNUS: Keyphrase extraction utilizing document logical structure

The dataset used in this paper for keyphrase extraction utilizing document logical structure.
- Dataset
- JSON
SemEval-2010 Task 5 dataset

The dataset used in this paper for keyphrase extraction from academic articles.
- Dataset
- JSON
Racist and sexist hate speech detection: Literature review

A review of studies on the detection of racist and sexist hate speech.
- Dataset
- JSON
YOSM: A new Yorùbá Sentiment Corpus for Movie Reviews

A dataset for sentiment analysis of Yoruba movie reviews.
- Dataset
- JSON
SemEval-2023 Task 10: Explainable Detection of Online Sexism

The dataset used for the SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) task, a shared task on offensive language (sexism) detection on English Gab and...
- Dataset
- JSON
Fairness Certification for Natural Language Processing and Large Language Models

The dataset used in the paper is a large corpus of text data, which is used to train and evaluate natural language processing models.
- Dataset
- JSON
Shallow Parsing with Conditional Random Fields

Shallow parsing with conditional random fields.
- Dataset
- JSON
Conditional Random Fields

CRFs have been applied to a variety of domains, including text processing, computer vision, and bioinformatics.
- Dataset
- JSON
ANTHROSCORE: A Computational Linguistic Measure of Anthropomorphism

Anthropomorphism in research papers and downstream news headlines
- Dataset
- JSON
English-language interviews of patients and healthy people

The dataset used in the paper is English-language interviews of patients and healthy people.
- Dataset
- JSON
Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimi...

The dataset used in this paper is a production-scale on-device natural language understanding model.
- Dataset
- JSON
LDC2014T12

The dataset used in the paper is the Linguistic Data Consortium AMR corpus release 1.0 (LDC2014T12), consisting of 13,050 AMR/English sentence pairs.
- Dataset
- JSON
Patent corpus

A dataset of over 100,000 patent documents from the Cooperative Patent Classification scheme (CPC) category A61.
- Dataset
- JSON
A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers

A diverse corpus for evaluating and developing English math word problem solvers. It contains 1,213 problems.
- Dataset
- JSON
Mawps: A Math Word Problem Repository

Mawps: A math word problem repository. It contains 2,373 problems.
- Dataset
- JSON
Leveraging Passage Embeddings for Efficient Listwise Reranking

Passage ranking, which aims to rank each passage in a large corpus according to its relevance to the user's information need expressed in a short query.
- Dataset
- JSON
ATDP dataset

The ATDP dataset contains 18 textual descriptions annotated with actions, conditions, entities, and events.
- Dataset
- JSON
DECON dataset

The DECON dataset contains 17 textual process descriptions annotated with Declare constraint types.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

530 datasets found