Natural Language Processing - Groups

AGNews

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.

Dataset
JSON

SST

The dataset used in the paper is the Stanford Sentiment Treebank (SST) dataset, which contains standard train/dev/test sets and two subtasks: binary sentence classification or...

Dataset
JSON

SST-1, SST-2, Subj, TREC, CR, MPQA

The dataset used for the experiments is a set of common datasets for natural language processing.

Dataset
JSON

TEL-NLP

The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.

Dataset
JSON

SST-2

The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and ﬁnd that a model having higher AUC does not necessarily...

Dataset
JSON

AG News

The dataset used in the paper is a language domain dataset, specifically for sentiment classification, named AG News. The dataset is used to evaluate the performance of...

Dataset
JSON

Yahoo and Yelp corpora

The Yahoo and Yelp corpora dataset contains 100k sentences with greater average length.

Dataset
JSON

Penn Treebank

The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths.

Dataset
JSON

SNLI

The dataset used in the paper is the Stanford Natural Language Inference (SNLI) dataset, which consists of 549,367 premise-hypothesis pairs for train/dev/test sets and target...

Dataset
JSON

MR

The dataset used for sentiment analysis, question type classification, and subjectivity classification tasks.

Dataset
JSON

IMDB

The dataset used in the paper is not explicitly described, but it is mentioned that the authors tested the proposed method on three real data sets for the most relevant security...

Dataset
JSON

GLUE

Pre-trained language models (PrLM) have to carefully manage input units when training on a very large text with a vocabulary consisting of millions of words. Previous works have...

Dataset
JSON

12 datasets found