Use SheepRL Python API like SB3’s model.learn() – example for PPO and DreamerV3? #333

merhovon · 2025-08-05T16:46:24Z

merhovon
Aug 5, 2025

Hi SheepRL team,

First of all, thank you for creating such a clean, Lightning-Fabric–powered RL library! I’ve been using it alongside Gymnasium and really appreciate having DreamerV3 natively included.

In Stable Baselines3 I’m used to a very straightforward Python workflow:

from stable_baselines3 import PPO
model = PPO("MlpPolicy", vec_env, verbose=1)
model.learn(total_timesteps=25000)

I simply call model.learn(), it handles the training loop on my vectorized Gym env, and I can pull in callbacks, tensorboard logging, etc.

🐑 My question / request

I’d like to invoke SheepRL entirely from Python—without having to drop into the sheeprl CLI tool—for both PPO and DreamerV3. Ideally I’d write something like:

from sheeprl import Experiment

# PPO example
exp = Experiment(
    algo="ppo",
    env="gym",
    env_id="CartPole-v1",
    runner={"num_envs":4, "total_timesteps":25000, "device":"cuda"},
    log_dir="runs/ppo"
)
exp.run()

…and for DreamerV3 something analogous:

from sheeprl import Experiment

# DreamerV3 example
exp = Experiment(
    algo="dreamerv3",
    env="gym",
    env_id="CartPole-v1",
    runner={"total_steps":25000, "device":"cuda"},
    log_dir="runs/dreamerv3"
)
exp.run()

However, I haven’t found a minimal, working example in the docs under howto/ that shows me the exact Python-API calls (imports, parameters, callbacks, saving checkpoints, etc.) for these setups.

Could you please:

Confirm whether SheepRL’s Python API supports this kind of direct “single-call” experiment launch without going through Bash?
Provide minimal code snippets for both a PPO experiment and a DreamerV3 experiment, equivalent to SB3’s learn() interface, including how to:

Define the environment in Python (Gymnasium)
Configure number of environments, timesteps/steps, device
Attach tensorboard logging or callbacks
Save and load policies/checkpoints

Thank you in advance for your help and for maintaining SheepRL! 😊

Environment:
SheepRL v0.5.7
Python 3.10

belerico · 2025-08-06T11:32:53Z

belerico
Aug 6, 2025
Maintainer

Hi @merhovon, thank you for your kind words.
SheepRL was thought to be used by the CLI mainly. The most similar interface to the one you're mentioning can be found here:

sheeprl/sheeprl/cli.py

Line 359 in 33b6366

def run(cfg: DictConfig):

.
That is the main entry point: i think that with some hydra machinery, something similar to what you're asking can be achieved. Everything runs through hydra, so configs must come from there.

0 replies

merhovon · 2025-08-06T12:14:27Z

merhovon
Aug 6, 2025
Author

Hi @belerico,

Thanks again for pointing me to the run(cfg: DictConfig) entry point. To make sure I’ve got the Gym-only variant of DreamerV3 correctly set up, could you confirm the exact override keys and defaults? For example, I’m aiming for something like this:

from hydra import initialize, compose
from omegaconf import DictConfig
from sheeprl.cli import run

def train_dreamerv3_gym():
    with initialize(config_path="conf"):
        cfg: DictConfig = compose(
            config_name="config",
            overrides=[
                "algo=dreamerv3",             # world-model-based training
                "env=gym",                    # use gym/Gymnasium backend
                "env_id=CartPole-v1",         # your chosen gym env
                "runner.total_steps=25000",   # number of env interactions
                "runner.device=cuda"          # or "cpu"
            ]
        )
        run(cfg)

if __name__ == "__main__":
    train_dreamerv3_gym()

A few quick checks:

algo=dreamerv3 and env=gym are the correct keys for a pure Gym setup?
Do I need any additional DreamerV3-specific parameters (e.g. dreamerv3.hidden_size, dreamerv3.stoch_size) to get sensible defaults, or will the base config suffice?
Does this approach also enable TensorBoard logging and checkpointing exactly as the CLI version does?

Thanks for confirming, so I can integrate this directly into my Python training script!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use SheepRL Python API like SB3’s model.learn() – example for PPO and DreamerV3? #333

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Use SheepRL Python API like SB3’s model.learn() – example for PPO and DreamerV3? #333

Uh oh!

Uh oh!

merhovon Aug 5, 2025

Replies: 2 comments

Uh oh!

belerico Aug 6, 2025 Maintainer

Uh oh!

merhovon Aug 6, 2025 Author

merhovon
Aug 5, 2025

belerico
Aug 6, 2025
Maintainer

merhovon
Aug 6, 2025
Author