-
Decimal Addition Dataset
The dataset used in this paper is a collection of decimal addition tasks, where the input lengths range from 1 to 40 digits. The dataset is used to evaluate the ability of... -
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Point-BERT is a new paradigm for learning point cloud Transformers. It pre-trains standard point cloud Transformers with a Masked Point Modeling (MPM) task. -
PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer
Recent Transformer-based 3D object detectors learn point cloud features either from point- or voxel-based representations. -
Container: A General-Purpose Building Block for Multi-Head Context Aggregation
Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations. Recently, Transformers – originally introduced in... -
U-Transformer: Self and Cross Attention for Medical Image Segmentation
Medical image segmentation remains particularly challenging for complex and low-contrast anatomical structures. In this paper, we introduce the U-Transformer network, which... -
Exploring Advances in Transformers and CNN for Skin Lesion Diagnosis on Small...
Skin cancer is one of the most common types of cancer in the world. Different computer-aided diagnosis systems have been proposed to tackle skin lesion diagnosis, most of them... -
Discrete-Valued Neural Communication
The dataset used in the paper is a visual reasoning task using Graph Neural Networks (GNNs) and Recurrent Independent Mechanisms (RIMs). The dataset consists of 8 Atari games... -
Training Transformers to Perform Tasks
A dataset for training transformers to perform tasks such as language translation and text generation. -
3D Vision with Transformers: A Survey
The dataset is a comprehensive review of over 100 transformer methods for different 3D vision tasks, including classification, segmentation, detection, completion, pose... -
An image is worth 16x16 words: Transformers for image recognition at scale
An image is worth 16x16 words: Transformers for image recognition at scale.