-
Yahoo Answers topics
The dataset used in this paper for few-shot text classification task. -
Spam Dataset
The spam dataset is a dataset used for spam classification. -
Aspect Category Detection (ACD)
Aspect Category Detection (ACD) dataset for few-shot one-class ACD is collected from Yel-pAspect (Bauman et al., 2017; Li et al., 2019), which is a large-scale multi-domain... -
Augmenting Interpretable Models with LLMs during Training
Aug-GAM and Aug-Tree are two instantiations of Aug-imodels, a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. -
PAN16 Dataset
The PAN16 dataset focuses on tweet classification and includes age and gender information of the authors. -
IT Job Detection Dataset
Dataset for job detection in Twitter -
Job Detection in Twitter
Job detection in Twitter using Skip-gram model and word2vec -
20 Newsgroups Text Classification Dataset
The dataset used in this paper is a collection of 20 Newsgroups text classification problems. -
Naive Bayes with Correlation Factor for Text Classification
Text classification problem with small-size training dataset -
Semantic Scholar Dataset
The dataset is a collection of 17,500 papers from Semantic Scholar. -
English Tweets Dataset
The dataset for English Tweets, used as the source domain for Domain Adaptation. -
Movie Review (MR) and Product Review (PR) datasets
Movie Review (MR) dataset is a binary sentiment classification dataset with movie reviews from IMDB, consisting of 1000 positive and 1000 negative movie reviews. Product Review... -
AP News Corpus
The AP News corpus contains professionally-edited news articles and its vocabulary plateaus much faster than the Amazon corpus. -
Amazon Corpus
The Amazon corpus contains user product reviews and has a much higher vocabulary relative to the number of documents, due to its noisy text. -
CNN news articles dataset
The CNN news articles dataset is a collection of news articles crawled from the CNN website.