Vision Transformers (ViTs) have shown impressive performance in various vision tasks, which has aroused scholarly interest in studying adversarial example generation and transferability on ViTs. ViT has architecture with self-attention at its core, which is entirely different from traditional convolutional neural networks (CNNs). However, existing adversarial attacks have limited effect on ViTs due to neglecting these architectural features. To address this issue, we propose a self-attention oriented Adversarial Block Drop (ABD) method to generate transferable adversarial examples by skipping attention mechanism from partial blocks. The ViT encoder consists of multiple blocks that are consistent architectures consisting of a self-attentive layer and a feed-forward layer. Specifically, we tailor our approach to this architecture, enhancing self-attention uncertainty by dropping some of the blocks during inference and thus fooling the model decisions. This exploits a unique but widely used architectural feature in the transformer model that can be used as a general attack pattern. Extensive experiments using multiple popular transformers on ImageNet datasets show that the proposed ABD significantly outperforms other baseline methods. Our approach can greatly improve the transferability between ViTs and from ViTs to both CNNs and MLPs, demonstrating the true generalization potential of ViTs in the adversarial space.
We conducted extensive experiments using the unused Vision Transformer model as an alternative model to generate adversarial samples to attack other unknown black -box models.