VIT(2)
-
[Paper reading] Swin Transformer
Swin Transformer Hierarchical Vision Transformer using Shifted Windows Abstract This paper presents a new vision Transformer, called Swin Transformer, that capably serves as a general-purpose backbone for computer vision. Challenges in adapting Transformer from language to vision arise from differences between the two domains, such as large variations in the scale of visual entities and the high..
2023.09.04 -
[Paper reading] Transformers for image recognition, ViT
Transformers for image recognition Model overview. We split an image into fixed-size patches, linearly embed each of them, add position embeddings, and feed the resulting sequence of vectors to a standard Transformer encoder. In order to perform classification, we use the standard approach of adding an extra learnable “classification token” to the sequence. Abstract While the Transformer archite..
2023.08.28