[NeurIPS 25] Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI detection

Official PyTorch implementation of NeurIPS 2025 paper: "Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection"

Dataset

Follow the process of UPT.

The downloaded files should be placed as follows. Otherwise, please replace the default path to your custom locations.

|- VDRP
|   |- hicodet
|   |   |- hico_20160224_det
|   |       |- annotations
|   |       |- images
|   |- vcoco
|   |   |- mscoco2014
|   |       |- train2014
|   |       |- val2014
:   :

Dependencies

Follow the instructions to install dependencies.

git clone github.com:https://github.com/YangChanhyeong/VDRP.git

conda create --name vdrp python=3.9 # CLIP dependency
conda activate vdrp

pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install matplotlib==3.6.3 scipy==1.10.0 tqdm==4.64.1
pip install numpy==1.24.1 timm==0.6.12
pip install fvcore

cd pocket
pip install -e .

Our code is built upon CLIP. Install the local package of CLIP:

cd CLIP && python setup.py develop && cd ..

Download the CLIP weights to checkpoints/pretrained_clip.

|- VDRP
|   |- checkpoints
|   |   |- pretrained_clip
|   |       |- ViT-B-16.pt
|   |       |- ViT-L-14-336px.pt
:   :

Download the weights of DETR and put them in checkpoints/.

Dataset	DETR weights
HICO-DET	weights
V-COCO	weights

|- VDRP
|   |- checkpoints
|   |   |- detr-r50-hicodet.pth
|   |   |- detr-r50-vcoco.pth
:   :   :

Pre-extracted Features

This repository provides pre-computed visual diversity statistics and concept embeddings used for our VDRP experiments.

Group covariance statistics for visual diversity-aware prompt learning: Link

data/
├── distribution/ # CLIP ViT-B/16
│ ├── non_rare_first/vdrp_group_cov.pt
│ ├── rare_first/vdrp_group_cov.pt
│ ├── unseen_object/vdrp_group_cov.pt
│ └── unseen_verb/vdrp_group_cov.pt
│
├── distribution_L/ # CLIP ViT-L/14
│ ├── non_rare_first/vdrp_group_cov.pt
│ ├── rare_first/vdrp_group_cov.pt
│ ├── unseen_object/vdrp_group_cov.pt
│ ├── unseen_verb/vdrp_group_cov.pt
│ ├── default/vdrp_group_cov.pt
│ └── vcoco/vdrp_group_cov.pt
:

Concept embeddings for region-aware prompt augmentation: Link

data/verb_concepts/
├── human_concepts.pt
├── object_concepts.pt
├── context_concepts.pt
├── human_concepts_L.pt
├── object_concepts_L.pt
├── context_concepts_L.pt
├── human_concepts_vcoco.pt
├── object_concepts_vcoco.pt
└── context_concepts_vcoco.pt

Train/Test

Please follow the commands in ./scripts.

Model Zoo

Method	Backbone	Type	Unseen↑	Seen↑	Full↑	HM↑
VDRP	ResNet50+ViT-B/16	NF-UC	36.45	31.60	32.57	33.85
VDRP	ResNet50+ViT-B/16	RF-UC	31.29	34.41	33.78	32.77
VDRP	ResNet50+ViT-B/16	UO	36.13	32.84	33.39	34.41
VDRP	ResNet50+ViT-B/16	UV	26.69	33.72	32.73	29.80

Model Weights

You can download the VDRP weights this link:

https://drive.google.com/drive/folders/1c0buK5W9fnF869C_zdtrEcyxsedSNoTv?usp=sharing

Citation

If you find our paper and/or code helpful, please consider citing:

@article{yang2025visual,
  title={Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection},
  author={Yang, Chanhyeong and Song, Taehoon and Park, Jihwan and Kim, Hyunwoo J},
  journal={arXiv preprint arXiv:2510.25094},
  year={2025}
}

Acknowledgement

We gratefully thank the authors from UPT, PViC, ADA-CM and CMMP for open-sourcing their code.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
CLIP		CLIP
assets		assets
data		data
detr		detr
hicodet		hicodet
pairwise_prior		pairwise_prior
pocket		pocket
scripts		scripts
util		util
vcoco		vcoco
CLIP_models_adapter_prior2.py		CLIP_models_adapter_prior2.py
HICO_utils.py		HICO_utils.py
LICENSE		LICENSE
README.md		README.md
attention.py		attention.py
eval_vcoco.py		eval_vcoco.py
hico_list.py		hico_list.py
hico_text_label.py		hico_text_label.py
inference.py		inference.py
interaction_head.py		interaction_head.py
main_tip_finetune.py		main_tip_finetune.py
ops.py		ops.py
tools.py		tools.py
transformer_layers.py		transformer_layers.py
transformer_module.py		transformer_module.py
upt_tip_cache_model_free_finetune_distill3.py		upt_tip_cache_model_free_finetune_distill3.py
utils_tip_cache_and_union_finetune.py		utils_tip_cache_and_union_finetune.py
vcoco_list.py		vcoco_list.py
vcoco_text_label.py		vcoco_text_label.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[NeurIPS 25] Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI detection

Dataset

Dependencies

Pre-extracted Features

Group covariance statistics for visual diversity-aware prompt learning: Link

Concept embeddings for region-aware prompt augmentation: Link

Train/Test

Model Zoo

Model Weights

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

mlvlab/VDRP

Folders and files

Latest commit

History

Repository files navigation

[NeurIPS 25] Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI detection

Dataset

Dependencies

Pre-extracted Features

Group covariance statistics for visual diversity-aware prompt learning: Link

Concept embeddings for region-aware prompt augmentation: Link

Train/Test

Model Zoo

Model Weights

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages