🛡️Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models

Zhongqi Wang, Jie Zhang*, Shiguang Shan, Xilin Chen

*Corresponding Author

We propose AMDet, a model-level textual backdoor defense on pretrained encoders.

The defender DO NOT have the knowledge of:

the trigger and corresponding target.
downstream tasks or classifier.
pre-training dataset.

Our defense requires around 5 min on a consumer-grade gpu to scan the pretrained encoders and reverse the backdoor target feature.

🔥 News

[2025/11/29] We release all the source code and model for the backdoor defense.

👀 Overview

Vision-language pretrained models (VLPs) expose potential backdoor risks. For example, when a backdoor is implanted into a pretrained text encoder with a trigger such as “V”, and the target label is “cat”, the encoder will induce a series of outputs based on the specific task type.

Our method determines whether a model is backdoored by optimizing an implicit backdoor feature.

🧭 Getting Start

Environment Requirement 🌍

AMDet has been implemented and tested on Pytorch 2.2.0 with python 3.10. It runs well on both Windows and Linux.

Clone the repo:

git clone https://github.com/Robin-WZQ/AMDet
cd AMDet

We recommend you first use conda to create virtual environment, and install pytorch following official instructions.

conda create -n AMDet python=3.10
conda activate AMDet
python -m pip install --upgrade pip
pip install torch==2.2.0+cu118 torchvision==0.17.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

Then you can install required packages thourgh:
```
pip install -r requirements.txt
```

🏃🏼 Running Scripts

Model Preparation ⚙️

We provide a poisoned model for testing in Model_download.

Download the model and make sure the file structure is like:

|-- AMDet
    |-- Models
        |-- CLIP
            |-- poisoned_model_1
                |-- config.json
                |-- model.safetensors

Backdoor Detection🔎

Scan the model to judge if it is backdoored or not.

If it is backdoored, return the pseudo-trigger embedding and its target.

Scan the model

python main.py

The results file structure should be like:

|-- Results
    |-- Model_name
    	|-- Images # 4 images that contain the backdoor target semantic
        |-- Backdoor_Embedding_init.pt # initial embedding
    	|-- Backdoor_Embedding_Inversion.pt # optimized embedding which can be loaded by Textual Inversion 
    	|-- Backdoor_Embedding.pt # optimized embedding
        |-- Backdoor_Feature.pt # last layer feature
        |-- log.txt
        |--hessian_spectrum.png # Hessian spectrum of the optimized embedding
        |--loss_landscape.png # loss landscape of the optimized embedding

The data structure here would be like:

|-- Data
    |-- Main
    	|-- Features
    		|-- HiddenStates
    		|-- OriginalFeature
    	|-- Prompts
    		|-- Prompts.txt

Visualization 🖼️

We also provide the visualization script for reproducing the images in our paper.

Please refer to ./Analysis and follow the specific instruction in each file.
- assimilation.ipynb
- attention_vis.ipynb

Backdoor Attack 🦠

Here, we focus on two scenarios:

Text-on-Text

bash backdoor_injection_text_on_text.sh

Image-on-Text

bash backdoor_injection_image_on_text.sh

It will generate backdoored models with specific target.

To change the hyper-parameters of attacking, please refer to ./Backdoor_Attack/configs

Benign Fine-tuning 😁

We fine-tune the text encoder on clean dataset, i.e., coco30K.

python ./Utils/finetuning_on_coco30k.py

Results

Here, we provide some results to show the effectiveness of our defense

Backdoor Detection

Reversed Results

Natural Backdoor

The model contains inherent trigger features, such that when these features are present, the model directly ignores other prompt tokens and produces fixed representations.

Loss landscape of optimized embeddings. (Left) Loss landscape of embedding optimization in a backdoored model; (Right) Loss landscape of embedding optimization in a benign model.

📄 Citation

If you find this project useful in your research, please consider cite:

@article{wang2025amdet,
title={Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models}, 
author={Zhongqi Wang and Jie Zhang and Shiguang Shan and Xilin Chen},
journal={arXiv preprint arXiv:2512.00343},
year={2025},
}

🤝 Feel free to discuss with us privately!

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
Analysis		Analysis
Backdoor_Attack		Backdoor_Attack
Data/Prompts		Data/Prompts
Images		Images
Utils		Utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛡️Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models

🔥 News

👀 Overview

🧭 Getting Start

Environment Requirement 🌍

🏃🏼 Running Scripts

Model Preparation ⚙️

Backdoor Detection🔎

Visualization 🖼️

Backdoor Attack 🦠

Benign Fine-tuning 😁

Results

📄 Citation

About

Uh oh!

Releases

Packages

Languages

License

Robin-WZQ/AMDET

Folders and files

Latest commit

History

Repository files navigation

🛡️Assimilation Matters: Model-level Backdoor Detection in Vision-Language Pretrained Models

🔥 News

👀 Overview

🧭 Getting Start

Environment Requirement 🌍

🏃🏼 Running Scripts

Model Preparation ⚙️

Backdoor Detection🔎

Visualization 🖼️

Backdoor Attack 🦠

Benign Fine-tuning 😁

Results

📄 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages