-
NLPeer dataset
A unified resource for the computational study of peer review. -
A citation-based method for automatic indexing of Chinese academic literatures
The dataset used in this paper for citation-based method for automatic indexing of Chinese academic literatures. -
WINGNUS: Keyphrase extraction utilizing document logical structure
The dataset used in this paper for keyphrase extraction utilizing document logical structure. -
SemEval-2010 Task 5 dataset
The dataset used in this paper for keyphrase extraction from academic articles. -
Racist and sexist hate speech detection: Literature review
A review of studies on the detection of racist and sexist hate speech. -
YOSM: A new Yorùbá Sentiment Corpus for Movie Reviews
A dataset for sentiment analysis of Yoruba movie reviews. -
SemEval-2023 Task 10: Explainable Detection of Online Sexism
The dataset used for the SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) task, a shared task on offensive language (sexism) detection on English Gab and... -
Fairness Certification for Natural Language Processing and Large Language Models
The dataset used in the paper is a large corpus of text data, which is used to train and evaluate natural language processing models. -
Shallow Parsing with Conditional Random Fields
Shallow parsing with conditional random fields. -
Conditional Random Fields
CRFs have been applied to a variety of domains, including text processing, computer vision, and bioinformatics. -
ANTHROSCORE: A Computational Linguistic Measure of Anthropomorphism
Anthropomorphism in research papers and downstream news headlines -
English-language interviews of patients and healthy people
The dataset used in the paper is English-language interviews of patients and healthy people. -
Latency-Aware Neural Architecture Search with Multi-Objective Bayesian Optimi...
The dataset used in this paper is a production-scale on-device natural language understanding model. -
LDC2014T12
The dataset used in the paper is the Linguistic Data Consortium AMR corpus release 1.0 (LDC2014T12), consisting of 13,050 AMR/English sentence pairs. -
Patent corpus
A dataset of over 100,000 patent documents from the Cooperative Patent Classification scheme (CPC) category A61. -
A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers
A diverse corpus for evaluating and developing English math word problem solvers. It contains 1,213 problems. -
Mawps: A Math Word Problem Repository
Mawps: A math word problem repository. It contains 2,373 problems. -
Leveraging Passage Embeddings for Efficient Listwise Reranking
Passage ranking, which aims to rank each passage in a large corpus according to its relevance to the user's information need expressed in a short query. -
ATDP dataset
The ATDP dataset contains 18 textual descriptions annotated with actions, conditions, entities, and events. -
DECON dataset
The DECON dataset contains 17 textual process descriptions annotated with Declare constraint types.