-
Collective Classification in Network Data
The Collective Classification in Network Data dataset is used for graph neural network research. -
Cora, Citeseer, and Polblogs datasets
The Cora, Citeseer, and Polblogs datasets are widely used for graph neural network research. -
CiteSeerX Name Disambiguation Dataset
The dataset contains 10 highly ambiguous name references with 1091 documents and 74 distinct real-life authors. -
Arnetminer Name Disambiguation Dataset
The dataset contains 10 highly ambiguous name references with 1091 documents and 74 distinct real-life authors. -
Misinformation Detection Dataset
A dataset of 248 well-cited papers in the field of misinformation detection -
Mozart Dataset
The dataset used for training the model consists of 13 pieces of Mozart, 989 pieces for validation, and 11,821 pieces for testing. -
KEEL repository
The Knowledge Extraction based on Evolutionary Learning (KEEL) repository contains 64 datasets for the experiments. -
Gemma: Open models based on gemini research and technology
This dataset contains a large corpus of text for training and evaluating large language models. -
Llama 2: Open foundation and fine-tuned chat models
This dataset contains a large corpus of text for training and evaluating large language models. -
harmless/harmful anchor datasets
This dataset contains 100 harmless and 100 harmful anchor prompts for evaluating the performance of large language models. -
CCCS-CIC-AndMal-2020
The CCCS-CIC-AndMal-2020 dataset comprises 400K android apps, to test and assess the suggested methodology. -
Colosseum dataset
The Colosseum dataset is a dataset of network traffic flow level, which is used to train and test the traffic steering algorithm. -
Road Network Dataset
A dataset for testing the proposed algorithm, consisting of a road network with 1719 vertices and 2280 edges. -
Generated Instances for SUTP
The dataset contains 420 instances with varying number of big trains, each with 1 to 4 unit trains, and different configurations of dumpers, conveyors, and stackers. -
Stanford Glaucoma Dataset
A dataset of OCT scans consisting of glaucoma and non-glaucomatous cases obtained from four tertiary care eye hospitals located in four different countries. -
Enron Corpus
The Enron corpus is a dataset of over 17K Excel Spreadsheets extracted from the Enron email corpus. -
Google Sheets Dataset
The dataset is constructed from a corpus of Google Sheets publicly shared within our organization. We collected 46K Google Sheets with formulas, and split them into 42K for...