Thank you for sharing this nice work. Could you explain why the number of QA pairs per image has a negative impact? Off the top of my head, a single QA pair can only represent one small factor regarding the quality of a caption—so more QA pairs should be able to reflect the quality better. Why this method can avoid reward hacking?