RAMS-Trans: Recurrent Attention Multi-scale Transformer for Fine-grained Image Recognition

Fine-grained image recognition (FGIR) has been a challenging problem. Most of the current methods are dominated by convolutional neural networks (CNNs). FGIR has the problem of large intra-class variance and small inter-class variance. Therefore, FGIR methods need to be able to identify and localize region attention in an image that is critical for classification.

BibTex: