-
Audiovision-MNIST
The Audiovision-MNIST dataset is a multi-modal dataset consisting of 1500 samples of audio and image files, with images for digits 0 to 9 and audio files with mel-frequency... -
MNIST-SVHN-Text dataset
The MNIST-SVHN-Text dataset is a multi-modal dataset consisting of images, text, and labels.