Highway Environment Deep Reinforcement Learning Project

This project implements a self-driving car agent using Deep Q-Network (DQN) in a simulated highway environment. The agent learns to navigate through traffic while maintaining safety and efficiency.

Features

Deep Q-Network Implementation: Robust DQN agent for autonomous driving
Parallel Training: Efficient training using multiple environments (VecEnv)
Custom Reward Shaping: Balanced reward system for safe and efficient driving (collision penalty, speed bonus, right-lane bonus, lane-change penalty)
Real-time Visualization: Interactive Pygame visualization of the trained agent
Progress Tracking: Comprehensive metrics and performance visualization (training curves, evaluation results, action distribution, episode statistics)

Project Structure

.
├── configs/
│   └── env_config.py      # Environment and training configuration
├── utils/
│   └── viz.py            # Visualization utilities  (Matplotlib)
├── main.py               # Main training/evaluation script
├── simulate.py           # Real-time Pygame simulation of trained model
├── requirements.txt      # Project dependencies
└── README.md            # This file

Environment

The project uses the highway-v0 environment from Highway-Env, featuring:

Multi-lane highway with dynamic traffic (4 lanes by default) with realistic physics (IDM longitudinal + MOBIL lane-change)
Kinematics-based: each vehicle (ego + up to 5 nearest neighbors) has a 5-dimensional feature vector
Discrete action space (DiscreteMetaAction): {LANE_LEFT, LANE_RIGHT, FASTER, SLOWER, IDLE}
Episode duration: 40 seconds (simulation at 15 Hz → 600 steps max) or ends upon collision

Installation

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# or
.venv\Scripts\activate     # Windows

Install dependencies:

pip install -r requirements.txt

Usage

Training

To train the agent:

python main.py --train

Optional arguments:

--n_envs: Number of parallel Gym environments to use (default: 4; recommended ≤ CPU core count)
--no_render: Disable rendering during training (not typically needed)

Training will proceed in intervals of 2 000 steps (total 100 000 timesteps by default), evaluating the policy every 2 000 steps on 10 episodes. The best model (by mean evaluation reward) is saved as:

results/run_<YYYYMMDD_HHMMSS>/best_model.zip

After training completes (or early-stops when mean reward ≥ 200), the final model is also saved:

results/run_<YYYYMMDD_HHMMSS>/final_model.zip

A plot of training progress (training_progress.png) is automatically generated and stored in the same results/run_/ folder.

Evaluation

To evaluate a pre-trained model (without retraining), run:

python main.py --evaluate --model_path results/run_YYYYMMDD_HHMMSS/best_model.zip

Optional arguments:

--no_render : Disable rendering (env.render calls) during evaluation
--n_envs: (Ignored in evaluation mode)

Evaluation will run a single environment for up to 1 000 steps or until termination, collecting:

Total cumulative reward
Average reward per step
Number of collisions
Action distribution

The results generate:

results/run_<timestamp>/evaluation_results.png
results/run_<timestamp>/action_distribution.png
results/run_<timestamp>/episode_stats.txt

Real-time Visualization

To watch the trained agent act in real time (via Pygame), run:

python simulate.py --model results/run_YYYYMMDD_HHMMSS/best_model.zip

The visualization includes:

Green vehicle: Ego (The trained agent)
Red vehicles: Other traffic
Purple vehicles: Merging traffic (lateral movement)
Gray horizontal lines: Lane boundaries
On-screen stats (top-left): Current step, cumulative reward, collision count

Controls:

Press 'R' to reset the episode (new random initialization)
Close window to end simulation

Visualization Tools (`utils/viz.py`)

The viz.py module contains functions to generate plots and save statistics:

Training Progress: Plots mean reward over time (plot_training_progress(timestamps, mean_rewards, output_dir))

Inputs:
- timestamps: List of timesteps at which evaluation occurred
- mean_rewards: Corresponding mean evaluation rewards
Output:
- Saves training_progress.png in output_dir=result/run_YYYYMMDD_HHMMSS

Evaluation Results: Shows cumulative reward during evaluation (plot_evaluation_results(rewards, output_dir))

Inputs:
- rewards: List of rewards per step (evaluation run)
Output:
- Saves evaluation_results.png (cumulative reward vs. timestep)

Action Distribution: Displays the agent's action preferences (plot_action_distribution(actions, output_dir))

Inputs:
- actions: List of discrete actions taken by the agent (evaluation run)
Output:
- Saves action_distribution.png

Episode Statistics: Saves detailed performance metrics (save_episode_stats(rewards, output_dir))

Inputs:
- rewards: List of rewards per step

Output:

Saves episode_stats.txt containing:

Total Reward: <sum of rewards>
Average Reward: <mean>
Std Reward: <standard deviation>
Min Reward: <minimum>
Max Reward: <maximum>
Episode Length: <len(rewards)>

Results are automatically saved in the results directory with timestamped folders (results/run_<timestamp>/).

Results Direcory

Training results and visualizations are saved in:

results/
└── run_YYYYMMDD_HHMMSS/
    ├── training_progress.png
    ├── evaluation_results.png
    ├── action_distribution.png
    ├── episode_stats.txt
    ├── best_model.zip
    └── final_model.zip

training_progress.png: Mean evaluation reward vs. timesteps
best_model.zip: Saved DQN weights from the best evaluation checkpoint
final_model.zip: Final DQN weights after training
evaluation_results.png: Cumulative reward curve for a 1 000-step evaluation run
action_distribution.png: Histogram of actions taken by the agent during evaluation
episode_stats.txt: Numeric summary (total, average, std, min, max rewards; length)

Authors

SAMI Ayoub - ayoub.sami@etu.toulouse-inp.fr
HAGROUF Abdellatif - abdellatif.hagrouf@etu.toulouse-inp.fr
ERRACHIDI Abdelghafour - abdelghafour.errachidi@etu.toulouse-inp.fr
ZOUARI Sami - sami.zouari@etu.inp-n7.fr

Acknowledgments

Highway-Env (E.Leurent) for providing the simulation environment
Stable-baselines3 for the DQN implementation
Pygame for enabling real-time interactive visualization.

Configuration

configs/env_config.py:
- Modify observation features, reward weights, IDM/MOBIL parameters, lane count, etc.
- Adjust training hyperparameters (learning_rate, buffer_size, batch_size, gamma, exploration_fraction, etc.) under TRAIN_CONFIG.
Command-line arguments in main.py:
- --train / --evaluate
- --model_path (for evaluation)
- --no_render (to disable rendering)
- --n_envs (number of parallel environments during training)

Known Warnings

The project may display several warnings during execution, primarily related to:

Environment registration (from gymnasium)
Deprecated environment configuration methods
Package deprecation notices (e.g., pkg_resources)

These warnings don't affect functionality and can be suppressed using any of these methods:

Command line: python -W ignore main.py --train
Environment variable: Set PYTHONWARNINGS=ignore
Code-level: Using warnings.filterwarnings('ignore')

Dependencies

gymnasium>=0.29.1
highway-env>=1.8.1
stable-baselines3>=2.1.0
matplotlib>=3.7.0
numpy>=1.26.4
torch>=2.7.0
tqdm>=4.65.0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
results/run_20250531_104539		results/run_20250531_104539
utils		utils
.gitignore		.gitignore
README.md		README.md
Report of RL Project.pdf		Report of RL Project.pdf
cea_final_project.pdf		cea_final_project.pdf
demo.py		demo.py
idea.txt		idea.txt
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt
rl_project_demo.ipynb		rl_project_demo.ipynb
simulate.py		simulate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Highway Environment Deep Reinforcement Learning Project

Features

Project Structure

Environment

Installation

Usage

Training

Evaluation

Real-time Visualization

Visualization Tools (`utils/viz.py`)

Results Direcory

Authors

Acknowledgments

Configuration

Known Warnings

Dependencies

About

Uh oh!

Releases

Packages

Languages

AyoubTe/RLProject

Folders and files

Latest commit

History

Repository files navigation

Highway Environment Deep Reinforcement Learning Project

Features

Project Structure

Environment

Installation

Usage

Training

Evaluation

Real-time Visualization

Visualization Tools (utils/viz.py)

Results Direcory

Authors

Acknowledgments

Configuration

Known Warnings

Dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Visualization Tools (`utils/viz.py`)

Packages