-
Self-supervised Video-centralised Transformer for Video Face Clustering
A self-supervised video-centralised transformer for video face clustering. -
Remote Sensing Image Change Detection with Transformers
Change detection in high-resolution remote sensing images using a bitemporal image transformer (BIT) -
Swin-Unet: Unet-like pure transformer for medical image segmentation
Swin-Unet: Unet-like pure transformer for medical image segmentation. -
Medical Transformer: Gated axial-attention for medical image segmentation
Medical Transformer: Gated axial-attention for medical image segmentation. -
Long-Short Transformer
The Long-Short Transformer dataset is a dataset for language and vision. -
Hybrid Spectral Denoising Transformer for Hyperspectral Image Denoising
Hyperspectral image denoising using a hybrid spectral denoising transformer -
Swin Deformable Attention U-Net Transformer (SDAUT) for Explainable Fast MRI
Fast MRI aims to reconstruct a high fidelity image from partially observed measurements. Exuberant development in fast MRI using deep learning has been witnessed recently.... -
DeepSense 6G: Large-Scale Real-World Multimodal Sensing and Communication Dat...
Development dataset for multimodal beam prediction challenge -
Multimodal Transformers for Wireless Communications: A Case Study in Beam Pre...
Multimodal transformer deep learning framework for sensing-assisted beam prediction in wireless communications -
Conformer: Local Features Coupling Global Representations
Conformer is a dual network structure that combines CNN-based local features with transformer-based global representations for enhanced representation learning. -
Region Attention Transformer for Medical Image Restoration
The proposed region attention transformer (RAT) for medical image restoration, which conducts attention within similar semantic regions, facilitating pixels with similar... -
Deep embedded image clustering with transformer and distribution information
Deep embedded image clustering with transformer and distribution information -
FastSpeech: Fast, Robust and Controllable Text to Speech
Neural network based end-to-end text to speech (TTS) has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually first generate... -
DLAFormer: An End-to-End Transformer For Document Layout Analysis
Document layout analysis (DLA) is crucial for understanding the physical layout and logical structure of documents, serving information retrieval, document summarization,... -
Svg vector font generation for chinese characters with transformer
Svg vector font generation for chinese characters with transformer. -
DPText-DETR: Towards Better Scene Text Detection with Dynamic Points in Trans...
This paper proposes a concise Dynamic Point Text DEtection TRansformer network, termed DPText-DETR, for scene text detection. The dataset used in this paper is Total-Text,... -
Multi-label Transformer
The proposed Multi-label Transformer architecture is designed for multi-label image classification, combining pixel attention and cross-window attention to better excavate the... -
Image Fusion Transformer
The proposed Image Fusion Transformer (IFT) network where we developed a novel Spatio-Transformer (ST) fusion strategy that attends to both local and long-range dependencies.