-
Yahoo Answers topics
The dataset used in this paper for few-shot text classification task. -
Spam Dataset
The spam dataset is a dataset used for spam classification. -
Aspect Category Detection (ACD)
Aspect Category Detection (ACD) dataset for few-shot one-class ACD is collected from Yel-pAspect (Bauman et al., 2017; Li et al., 2019), which is a large-scale multi-domain... -
Augmenting Interpretable Models with LLMs during Training
Aug-GAM and Aug-Tree are two instantiations of Aug-imodels, a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. -
CNN-DM Dataset
The CNN-DM dataset contains news articles and is used for training language models. -
PAN16 Dataset
The PAN16 dataset focuses on tweet classification and includes age and gender information of the authors. -
20 Newsgroups Text Classification Dataset
The dataset used in this paper is a collection of 20 Newsgroups text classification problems. -
Naive Bayes with Correlation Factor for Text Classification
Text classification problem with small-size training dataset -
Semantic Scholar Dataset
The dataset is a collection of 17,500 papers from Semantic Scholar. -
English Tweets Dataset
The dataset for English Tweets, used as the source domain for Domain Adaptation. -
AP News Corpus
The AP News corpus contains professionally-edited news articles and its vocabulary plateaus much faster than the Amazon corpus. -
Amazon Corpus
The Amazon corpus contains user product reviews and has a much higher vocabulary relative to the number of documents, due to its noisy text.