Zhongqi Wang, Jie Zhang*, Shiguang Shan, Xilin Chen
*Corresponding Author
We propose AMDet, a model-level textual backdoor defense on pretrained encoders.
The defender DO NOT have the knowledge of:
- the trigger and corresponding target.
- downstream tasks or classifier.
- pre-training dataset.
Our defense requires around 5 min on a consumer-grade gpu to scan the pretrained encoders and reverse the backdoor target feature.
- [2025/11/29] We release all the source code and model for the backdoor defense.
Vision-language pretrained models (VLPs) expose potential backdoor risks. For example, when a backdoor is implanted into a pretrained text encoder with a trigger such as “V”, and the target label is “cat”, the encoder will induce a series of outputs based on the specific task type.
Our method determines whether a model is backdoored by optimizing an implicit backdoor feature.
AMDet has been implemented and tested on Pytorch 2.2.0 with python 3.10. It runs well on both Windows and Linux.
-
Clone the repo:
git clone https://github.com/Robin-WZQ/AMDet cd AMDet -
We recommend you first use
condato create virtual environment, and installpytorchfollowing official instructions.conda create -n AMDet python=3.10 conda activate AMDet python -m pip install --upgrade pip pip install torch==2.2.0+cu118 torchvision==0.17.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 -
Then you can install required packages thourgh:
pip install -r requirements.txt
We provide a poisoned model for testing in Model_download.
Download the model and make sure the file structure is like:
|-- AMDet
|-- Models
|-- CLIP
|-- poisoned_model_1
|-- config.json
|-- model.safetensors
Scan the model to judge if it is backdoored or not.
If it is backdoored, return the pseudo-trigger embedding and its target.
- Scan the model
python main.py
The results file structure should be like:
|-- Results
|-- Model_name
|-- Images # 4 images that contain the backdoor target semantic
|-- Backdoor_Embedding_init.pt # initial embedding
|-- Backdoor_Embedding_Inversion.pt # optimized embedding which can be loaded by Textual Inversion
|-- Backdoor_Embedding.pt # optimized embedding
|-- Backdoor_Feature.pt # last layer feature
|-- log.txt
|--hessian_spectrum.png # Hessian spectrum of the optimized embedding
|--loss_landscape.png # loss landscape of the optimized embedding
The data structure here would be like:
|-- Data
|-- Main
|-- Features
|-- HiddenStates
|-- OriginalFeature
|-- Prompts
|-- Prompts.txt
We also provide the visualization script for reproducing the images in our paper.
- Please refer to
./Analysisand follow the specific instruction in each file.- assimilation.ipynb
- attention_vis.ipynb
Here, we focus on two scenarios:
-
Text-on-Text
bash backdoor_injection_text_on_text.sh -
Image-on-Text
bash backdoor_injection_image_on_text.sh
It will generate backdoored models with specific target.
To change the hyper-parameters of attacking, please refer to ./Backdoor_Attack/configs
We fine-tune the text encoder on clean dataset, i.e., coco30K.
python ./Utils/finetuning_on_coco30k.py
Here, we provide some results to show the effectiveness of our defense
- Backdoor Detection
- Reversed Results
- Natural Backdoor
Loss landscape of optimized embeddings. (Left) Loss landscape of embedding optimization in a backdoored model; (Right) Loss landscape of embedding optimization in a benign model.
If you find this project useful in your research, please consider cite:
@article{wang2025amdet,
title={Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models},
author={Zhongqi Wang and Jie Zhang and Shiguang Shan and Xilin Chen},
journal={arXiv preprint arXiv:2512.00343},
year={2025},
}
🤝 Feel free to discuss with us privately!






