-
ANALYSING DISCRETE SELF SUPERVISED SPEECH REPRESENTATION FOR SPOKEN LANGUAGE ...
This work profoundly analyzes discrete self-supervised speech representations (units) through the eyes of Generative Spoken Language Modeling (GSLM). -
OctoPath test dataset
The dataset used for testing the OctoPath network, containing sequences of octrees and points from the reference path. -
OctoPath dataset
The dataset used for training the OctoPath network, containing sequences of octrees and points from the reference path. -
LeBenchmark7K
The LeBenchmark7K dataset is a self-supervised representation of French speech. -
PHIMO - Physics-Informed Deep Learning for Motion-Corrected Reconstruction of...
PHIMO, a physics-informed motion correction method tailored to quantitative MRI, which utilises information from the MR signal evolution to detect motion events with a... -
Self-supervised Relational RL with Independently Controllable Subgoals
The dataset used in the paper is a multi-object environment with a robotic arm and multiple objects to manipulate. The agent learns to control the objects independently and... -
Heartheflow: Optical Flow-Based Self-Supervised Visual Sound Source Localization
Learning to localize the sound source in videos without explicit annotations is a novel area of audio-visual research. Existing work in this area focuses on creating attention... -
Structural Deep Clustering Network
Clustering is a fundamental task in data analysis. Recently, deep clustering, which derives inspiration primarily from deep learning approaches, achieves state-of-the-art... -
Decoupled Contrastive Learning
Contrastive learning is one of the most successful paradigms for self-supervised learning (SSL). In a principled way, it considers two augmented views of the same image as... -
S3T: Self-supervised pre-training with Swin Transformer for music classification
Self-supervised pre-training method with Swin Transformer for music classification, leveraging massive unlabeled music data to improve the performance of music classification... -
Unsupervised Learning of Style-Aware Facial Animation from Real Acting Perfor...
A new approach for creating an animatable and photo-realistic 3D head model from multi-view video footage of a real actor, together with a neural animation model based on... -
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Self-supervised vision-language pretraining from pure images and text with a contrastive loss is effective, but ignores fine-grained alignment due to a dual-stream architecture... -
wav2vec 2.0
The wav2vec 2.0 dataset is a self-supervised learning dataset for speech recognition tasks.