📃 [Paper] • 💻 [Github] • 🤗 [Models] • [Playground]
JudgeLRM is a family of judgment-oriented Large Language Models (LLMs) designed to enhance evaluative reasoning through reinforcement learning (RL) with judge-wise, outcome-driven rewards. It demonstrates that judgment is inherently a reasoning-intensive task and addresses the limitations of supervised fine-tuning (SFT) in pair-wise evaluation. Notably, JudgeLRM-3B surpasses GPT-4, and JudgeLRM-7B outperforms DeepSeek-R1.
Explore JudgeLRM’s reasoning capabilities and detailed comparisons by testing it against other Hugging Face models with your own questions!
# Recommended Python version: 3.9.21
pip install -r requirements.txt
# Recommended Python version: 3.10.18
pip install -r requirements_qwen3.txt
# Overwrite src/verl with qwen3 specific source
cp -r src/verl_qwen3/* src/verl/
To preprocess the data for training:
python src/examples/data_preprocess/judgelrm.py
# Training using GRPO
bash src/scripts/judgelrm_grpo7b_{n}gpu.sh
# Inference after training
python pandalm/utils/judgelrm_inference.py
See pandalm/utils for specific scripts.
python pandalm/utils/judgelrm_{qwen3_}inference.py
python pandalm/calculate_result.py
bash JudgeLM/scripts/step4eval_judge_on_judgelm_benchmark_rl.sh
# Calculate reasoning rate
python data/markreasoning.py
# Calculate reasoning ability stats
python data/mark_reasoning_countabaility.py
python data/count_reasoning_countabaility.py
Click to expand all Baseline implementations
First, navigate to the baseline source directory:
cd baseline/src
bash train_dpo_fixed.sh
python convert_dpo_to_reward.py
bash test_reward_model.sh
bash train_reward_model.sh
bash test_reward_model.sh
python train_bt_reward.py
python test_bt_reward.py
python train_bt_cross_encoder.py
python test_crossencoderbt.py
bash train_sft_think.sh
python eval_sft_think.py
bash run_spin.sh
bash pandalm/utils/judgelrm_single_inference.py
For other inference scripts regarding baselines, please check baseline/inference.
If you find this repo useful for your research, please consider citing our paper:
@misc{nuo2025judgelrm,
title={JudgeLRM: Large Reasoning Models as a Judge},
author={Nuo Chen, Zhiyuan Hu, Qingyun Zou, Jiaying Wu, Qian Wang, Bryan Hooi, Bingsheng He},
year={2025},
eprint={2504.00050},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.00050},
}