-
Projection-Cost-Preserving Sketches
In this note we illustrate how common matrix approximation methods, such as random projection and random sampling, yield projection-cost-preserving sketches, as introduced in... -
SQLFlow: A Bridge between SQL and Machine Learning
Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and... -
Stack Overflow dataset
The Stack Overflow dataset contains data from a question-answering forum on the topic of computer programming. -
NYC Property Sale data
The New York City Property Sale data contains records of every building or building unit sold in the New York City property market. -
Wine Quality data
The Wine Quality data combines two benchmark data sets from UCI related to red and white wines. -
OpenML dataset
OpenML dataset -
MIT Open Learning Library dataset
MIT Open Learning Library dataset -
Kaggle InClass dataset
Kaggle InClass dataset -
Developing Open Source Educational Resources for Machine Learning and Data Sc...
Open Source Educational Resources (OSER) for Machine Learning and Data Science -
Lead Poisoning Dataset
The dataset used in the paper is a real-world dataset from a partner organization, focusing on reducing lead poisoning in children. -
Wolfram Database of Simple Graphs
Wolfram Database of Simple Graphs -
T ¨ubingen cause-effect pairs
The T ¨ubingen cause-effect pairs dataset contains 108 datasets of real cause-effect pairs. -
Dataset for ℓp subspace approximation
The dataset used in this paper is a set of points in d-dimensional space, with n points in total. -
Cats and Dogs
This dataset contains images of cats and dogs, which is used for training deep neural networks. -
UMS Ticket Purchasing Data
UMS has provided us with five years of anonymized ticket purchasing data from 2011 to 2015. We use the first three years for training and hold out the most recent two years for... -
Physical exercise, Iris, Wine, Boston house price, Breast cancer, Epilepsy
The Physical exercise data set contains data about physical exercise, Iris data set contains data about iris plants, Wine data set contains data about wine, Boston house price... -
Iris dataset
The Iris dataset is a multivariate dataset introduced by Sir Ronald Fisher in his 1936 paper "The use of multiple measurements in taxonomic problems". It contains 150 samples...