GitHub - gustavecortal/preference-optimization-orpo-lora: Lightweight preference optimization for LLMs using LoRA and ORPO

Lightweight preference optimization using ORPO and LoRA

This repo fine-tunes Hugging Face models for preference optimization using ORPO + LoRA.

If you want the cheapest way to align an LLM without a reference model, you are in the right place. Using LoRA with a small rank, if you have enough compute for inference, then you probably have enough for fine-tuning.

From my experiments, ORPO + LoRA works well and benefits from model souping (averaging checkpoints).

prepare_dataset.py — downloads a preference dataset, wraps it in chat format, filters by length, and saves to disk.
train_orpo.py — fine-tunes the model with ORPO + LoRA.
model_soup.py — merges multiple LoRA checkpoints into a full model.

Usage: python train_orpo.py --config configs/config_0.6b.yaml

What are ORPO and LoRA?

ORPO (Odds Ratio Preference Optimization). A reference-model-free preference objective: for each (x, y⁺, y⁻) pair, it adds a log-odds term that boosts the likelihood of the chosen response and penalizes the rejected one, so alignment happens in a single SFT-style stage (no PPO/DPO or separate ref model). (arXiv)

LoRA (Low-Rank Adaptation). Keeps the pretrained weights frozen and learns tiny low-rank matrices on selected layers. These adapters are small, swappable, and can be merged into the base model for export. (Hugging Face)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
.python-version		.python-version
README.md		README.md
model_soup.py		model_soup.py
prepare_dataset.py		prepare_dataset.py
pyproject.toml		pyproject.toml
train_orpo.py		train_orpo.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Lightweight preference optimization using ORPO and LoRA

What are ORPO and LoRA?

About

Uh oh!

Releases

Packages

Languages

gustavecortal/preference-optimization-orpo-lora

Folders and files

Latest commit

History

Repository files navigation

Lightweight preference optimization using ORPO and LoRA

What are ORPO and LoRA?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages