Dataset - LDM

TransAVS: End-to-End Audio-Visual Segmentation with Transformer

Audio-Visual Segmentation (AVS) is a challenging task, which aims to segment sounding objects in video frames by exploring audio signals. The proposed TransAVS framework tackles...
- Dataset
- JSON
End-to-end speaker-attributed ASR with Transformer

End-to-end speaker-attributed ASR with Transformer
- Dataset
- JSON
Self-supervised Video-centralised Transformer for Video Face Clustering

A self-supervised video-centralised transformer for video face clustering.
- Dataset
- JSON
Remote Sensing Image Change Detection with Transformers

Change detection in high-resolution remote sensing images using a bitemporal image transformer (BIT)
- Dataset
- JSON
DiT

DiT: Self-supervised pre-training for document image Transformer.
- Dataset
- JSON
Swin-Unet: Unet-like pure transformer for medical image segmentation

Swin-Unet: Unet-like pure transformer for medical image segmentation.
- Dataset
- JSON
Medical Transformer: Gated axial-attention for medical image segmentation

Medical Transformer: Gated axial-attention for medical image segmentation.
- Dataset
- JSON
Long-Short Transformer

The Long-Short Transformer dataset is a dataset for language and vision.
- Dataset
- JSON
Hybrid Spectral Denoising Transformer for Hyperspectral Image Denoising

Hyperspectral image denoising using a hybrid spectral denoising transformer
- Dataset
- JSON
Swin Deformable Attention U-Net Transformer (SDAUT) for Explainable Fast MRI

Fast MRI aims to reconstruct a high fidelity image from partially observed measurements. Exuberant development in fast MRI using deep learning has been witnessed recently....
- Dataset
- JSON
DSIFN-CD

The DSIFN-CD dataset is a collection of high-resolution images with seasonal changes in different cities.
- Dataset
- JSON
DeepSense 6G: Large-Scale Real-World Multimodal Sensing and Communication Dat...

Development dataset for multimodal beam prediction challenge
- Dataset
- JSON
Multimodal Transformers for Wireless Communications: A Case Study in Beam Pre...

Multimodal transformer deep learning framework for sensing-assisted beam prediction in wireless communications
- Dataset
- JSON
Conformer: Local Features Coupling Global Representations

Conformer is a dual network structure that combines CNN-based local features with transformer-based global representations for enhanced representation learning.
- Dataset
- JSON
Region Attention Transformer for Medical Image Restoration

The proposed region attention transformer (RAT) for medical image restoration, which conducts attention within similar semantic regions, facilitating pixels with similar...
- Dataset
- JSON
Deep embedded image clustering with transformer and distribution information

Deep embedded image clustering with transformer and distribution information
- Dataset
- JSON
FastSpeech: Fast, Robust and Controllable Text to Speech

Neural network based end-to-end text to speech (TTS) has signiﬁcantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually ﬁrst generate...
- Dataset
- JSON
DLAFormer: An End-to-End Transformer For Document Layout Analysis

Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization,...
- Dataset
- JSON
Svg vector font generation for chinese characters with transformer

Svg vector font generation for chinese characters with transformer.
- Dataset
- JSON
DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Trans...

This paper proposes a concise Dynamic Point Text DEtection TRansformer network, termed DPText-DETR, for scene text detection. The dataset used in this paper is Total-Text,...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

36 datasets found