-
SlimPajama
The dataset is used to evaluate the performance of the xLSTM architecture on various tasks, including language modeling, question answering, and text classification. -
20-Newsgroups dataset
The 20-Newsgroups dataset is a collection of text documents. -
REDDIT-BINARY dataset
The REDDIT-BINARY dataset contains 2,000 graphs labeled as question/answer-based or discussion-based community in the content-aggregation website Reddit. -
Reuters-21578
Text classification problem has long been an interesting research field, the aim of text classification is to develop algorithm to find the categories of given documents. -
Amazon Review
The Amazon Review dataset is a widely used benchmark dataset for cross-domain sentiment analysis. -
Text Classification based on Multiple Block Convolutional Highways
Text classification based on Multiple Block Convolutional Highways -
OpenWebText Corpus
A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words. -
Disin dataset
The Disin dataset is a fake news dataset on Kaggle, including 12,600 fake news articles and 12,600 truthful news articles. -
Natural Questions
The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer. -
Clothing Dataset
The Clothing dataset contains metadata, text descriptions, and images of the clothing items, with the review score as the label. -
COVID-19 Research Articles Classification
The dataset used for text classification to support Epistemonikos' effort to filter and categorize research articles related to COVID-19. -
Stanford Alpaca
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and ImageNet-100... -
AGNews Dataset
The AGNews dataset is a collection of news articles, where each article is labeled with a topic (e.g. politics, sports, etc.).