Multi-modal learning - Groups

Audiovision-MNIST

The Audiovision-MNIST dataset is a multi-modal dataset consisting of 1500 samples of audio and image files, with images for digits 0 to 9 and audio files with mel-frequency...
- Dataset
- JSON
MNIST-SVHN-Text dataset

The MNIST-SVHN-Text dataset is a multi-modal dataset consisting of images, text, and labels.
- Dataset
- JSON

Before browse our site, please accept our cookies policy

2 datasets found