The use of pre-trained large vision transformer (ViT) models on extensive datasets has become common practice to enhance performance on downstream tasks. A recent study by cybersecurity analysts from Tsinghua University, Tencent Security Platform Department, Zhejiang University, Research Center of Artificial Intelligence, and Peng Cheng Laboratory has uncovered a novel backdoor attack named SWARM (Switchable Backdoor Attack Against Pre-trained Models). This attack exploits the Visual Prompting (VP) technique, which introduces learnable task-specific parameters while keeping the pre-trained backbone frozen, posing unexplored security risks.
SWARM utilizes a “switch” prompt token that can be attached or removed to toggle between clean and backdoor modes stealthily. The attack optimizes triggers, clean prompts, and switch tokens to ensure normal behavior in the clean mode and targeted misclassification in the backdoor mode. Experiments show that SWARM achieves a high attack success rate of 96% against Neural Attention Distillation (NAD) and over 97% against I-BAU, outperforming baseline attacks significantly.
In a cloud service scenario, threat actors, often acting as cloud service providers, can control prompt inputs while users submit task datasets and pre-trained models to their service. The attack’s effectiveness lies in its ability to predict correctly on clean samples and cause misclassification when the switch trigger is added, making it highly evasive. These findings underscore the urgent need for enhanced security measures and continued research into defense mechanisms to protect against such sophisticated backdoor attacks.
Reference: