Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification

An audio-visual dataset of five crowded scenes: 'Riot', 'Noise-Street', 'Firework-Event', 'Music-Event', and 'Sport-Atmosphere'.

BibTex: