Reproduction of oadp_ov_coco.py

Thank you for outstanding work, I got some problems when I try to reproduce the training of coco. Firstly I use your checkpoint and successfully got the same result 31.3 mAP, it proves that the dataset and python environment is correctly set.

And I use the command to train vild first: `torchrun --nproc_per_node=2 -m oadp.dp.train vild_ov_coco configs/dp/vild_ov_coco.py`, and then formattly train coco: `torchrun --nproc_per_node=2 -m oadp.dp.train oadp_ov_coco configs/dp/oadp_ov_coco.py`, but I don't get correct result when I use the training checkpoint, here is my full result:
> {'COCO_17_bbox_mAP_': '0.1495',
 'COCO_17_bbox_mAP_50': '0.2830',
 'COCO_17_bbox_mAP_75': '0.1398',
 'COCO_17_bbox_mAP_copypaste': '0.1495 0.2830 0.1398 0.1060 0.1788 0.1816',
 'COCO_17_bbox_mAP_l': '0.1816',
 'COCO_17_bbox_mAP_m': '0.1788',
 'COCO_17_bbox_mAP_s': '0.1060',
 'COCO_48_17_bbox_mAP_': '0.2673',
 'COCO_48_17_bbox_mAP_50': '0.4436',
 'COCO_48_17_bbox_mAP_75': '0.2798',
 'COCO_48_17_bbox_mAP_copypaste': '0.2673 0.4436 0.2798 0.1750 0.2916 0.3488',
 'COCO_48_17_bbox_mAP_l': '0.3488',
 'COCO_48_17_bbox_mAP_m': '0.2916',
 'COCO_48_17_bbox_mAP_s': '0.1750',
 'COCO_48_bbox_mAP_': '0.3090',
 'COCO_48_bbox_mAP_50': '0.5005',
 'COCO_48_bbox_mAP_75': '0.3293',
 'COCO_48_bbox_mAP_copypaste': '0.3090 0.5005 0.3293 0.1994 0.3316 0.4080',
 'COCO_48_bbox_mAP_l': '0.4080',
 'COCO_48_bbox_mAP_m': '0.3316',
 'COCO_48_bbox_mAP_s': '0.1994'}

By the way, I noticed that some abnormal data was output during the training process, the mAP result of coco_17_bbox is -1!!!, here I randomly cut partly of output during training, it is during iteration of 26000/40000:
> 2023-11-29 19:26:42,471 - mmdet - INFO - Iter(val) [2500]	COCO_48_17_bbox_mAP_: 0.1982, COCO_48_17_bbox_mAP_50: 0.3539, COCO_48_17_bbox_mAP_75: 0.1999, COCO_48_17_bbox_mAP_s: 0.1101, COCO_48_17_bbox_mAP_m: 0.2075, COCO_48_17_bbox_mAP_l: 0.2655, COCO_48_17_bbox_mAP_copypaste: 0.1982 0.3539 0.1999 0.1101 0.2075 0.2655, COCO_48_bbox_mAP_: 0.1982, COCO_48_bbox_mAP_50: 0.3539, COCO_48_bbox_mAP_75: 0.1999, COCO_48_bbox_mAP_s: 0.1101, COCO_48_bbox_mAP_m: 0.2075, COCO_48_bbox_mAP_l: 0.2655, COCO_48_bbox_mAP_copypaste: 0.1982 0.3539 0.1999 0.1101 0.2075 0.2655, **COCO_17_bbox_mAP_: -1.0000, COCO_17_bbox_mAP_50: -1.0000, COCO_17_bbox_mAP_75: -1.0000, COCO_17_bbox_mAP_s: -1.0000, COCO_17_bbox_mAP_m: -1.0000, COCO_17_bbox_mAP_l: -1.0000, COCO_17_bbox_mAP_copypaste: -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000**

And when I add --override to command like: `torchrun --nproc_per_node=2 -m oadp.dp.train vild_ov_coco configs/dp/vild_ov_coco.py --override .validator.dataloader.dataset.ann_file::data/coco/annotations/instances_val2017.48.json`, the checkpoint becomes unuseful:
<img width="1149" alt="截屏2023-11-30 09 44 37" src="https://github.com/LutingWang/OADP/assets/61184966/218f8923-6bda-4ce6-8465-381fbe2181ab">
why it makes this situation?

It seems like some parts of my experiment is wrong, how can I fixed it? And can you tell me how to use training command correctly? Appreciated!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproduction of oadp_ov_coco.py #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproduction of oadp_ov_coco.py #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions