This project implements a self-driving car agent using Deep Q-Network (DQN) in a simulated highway environment. The agent learns to navigate through traffic while maintaining safety and efficiency.
- Deep Q-Network Implementation: Robust DQN agent for autonomous driving
- Parallel Training: Efficient training using multiple environments (VecEnv)
- Custom Reward Shaping: Balanced reward system for safe and efficient driving (collision penalty, speed bonus, right-lane bonus, lane-change penalty)
- Real-time Visualization: Interactive Pygame visualization of the trained agent
- Progress Tracking: Comprehensive metrics and performance visualization (training curves, evaluation results, action distribution, episode statistics)
.
├── configs/
│ └── env_config.py # Environment and training configuration
├── utils/
│ └── viz.py # Visualization utilities (Matplotlib)
├── main.py # Main training/evaluation script
├── simulate.py # Real-time Pygame simulation of trained model
├── requirements.txt # Project dependencies
└── README.md # This file
The project uses the highway-v0 environment from Highway-Env, featuring:
- Multi-lane highway with dynamic traffic (4 lanes by default) with realistic physics (IDM longitudinal + MOBIL lane-change)
- Kinematics-based: each vehicle (ego + up to 5 nearest neighbors) has a 5-dimensional feature vector
- Discrete action space (
DiscreteMetaAction): {LANE_LEFT, LANE_RIGHT, FASTER, SLOWER, IDLE} - Episode duration: 40 seconds (simulation at 15 Hz → 600 steps max) or ends upon collision
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# or
.venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txtTo train the agent:
python main.py --trainOptional arguments:
--n_envs: Number of parallel Gym environments to use (default: 4; recommended ≤ CPU core count)--no_render: Disable rendering during training (not typically needed)
Training will proceed in intervals of 2 000 steps (total 100 000 timesteps by default), evaluating the policy every 2 000 steps on 10 episodes. The best model (by mean evaluation reward) is saved as:
results/run_<YYYYMMDD_HHMMSS>/best_model.zipAfter training completes (or early-stops when mean reward ≥ 200), the final model is also saved:
results/run_<YYYYMMDD_HHMMSS>/final_model.zipA plot of training progress (training_progress.png) is automatically generated and stored in the same results/run_/ folder.
To evaluate a pre-trained model (without retraining), run:
python main.py --evaluate --model_path results/run_YYYYMMDD_HHMMSS/best_model.zipOptional arguments:
--no_render: Disable rendering (env.render calls) during evaluation--n_envs: (Ignored in evaluation mode)
Evaluation will run a single environment for up to 1 000 steps or until termination, collecting:
-
Total cumulative reward
-
Average reward per step
-
Number of collisions
-
Action distribution
The results generate:
results/run_<timestamp>/evaluation_results.png
results/run_<timestamp>/action_distribution.png
results/run_<timestamp>/episode_stats.txtTo watch the trained agent act in real time (via Pygame), run:
python simulate.py --model results/run_YYYYMMDD_HHMMSS/best_model.zipThe visualization includes:
- Green vehicle: Ego (The trained agent)
- Red vehicles: Other traffic
- Purple vehicles: Merging traffic (lateral movement)
- Gray horizontal lines: Lane boundaries
- On-screen stats (top-left): Current step, cumulative reward, collision count
Controls:
- Press 'R' to reset the episode (new random initialization)
- Close window to end simulation
The viz.py module contains functions to generate plots and save statistics:
- Training Progress: Plots mean reward over time (
plot_training_progress(timestamps, mean_rewards, output_dir))
- Inputs:
timestamps: List of timesteps at which evaluation occurredmean_rewards: Corresponding mean evaluation rewards
- Output:
- Saves
training_progress.pnginoutput_dir=result/run_YYYYMMDD_HHMMSS
- Saves
- Evaluation Results: Shows cumulative reward during evaluation (plot_evaluation_results(rewards, output_dir))
- Inputs:
rewards: List of rewards per step (evaluation run)
- Output:
- Saves
evaluation_results.png(cumulative reward vs. timestep)
- Saves
- Action Distribution: Displays the agent's action preferences (plot_action_distribution(actions, output_dir))
- Inputs:
actions: List of discrete actions taken by the agent (evaluation run)
- Output:
- Saves
action_distribution.png
- Saves
- Episode Statistics: Saves detailed performance metrics (save_episode_stats(rewards, output_dir))
- Inputs:
rewards: List of rewards per step
- Output:
-
Saves
episode_stats.txtcontaining:Total Reward: <sum of rewards> Average Reward: <mean> Std Reward: <standard deviation> Min Reward: <minimum> Max Reward: <maximum> Episode Length: <len(rewards)>
-
Results are automatically saved in the results directory with timestamped folders (results/run_<timestamp>/).
Training results and visualizations are saved in:
results/
└── run_YYYYMMDD_HHMMSS/
├── training_progress.png
├── evaluation_results.png
├── action_distribution.png
├── episode_stats.txt
├── best_model.zip
└── final_model.zip
-
training_progress.png: Mean evaluation reward vs. timesteps
-
best_model.zip: Saved DQN weights from the best evaluation checkpoint
-
final_model.zip: Final DQN weights after training
-
evaluation_results.png: Cumulative reward curve for a 1 000-step evaluation run
-
action_distribution.png: Histogram of actions taken by the agent during evaluation
-
episode_stats.txt: Numeric summary (total, average, std, min, max rewards; length)
- SAMI Ayoub - ayoub.sami@etu.toulouse-inp.fr
- HAGROUF Abdellatif - abdellatif.hagrouf@etu.toulouse-inp.fr
- ERRACHIDI Abdelghafour - abdelghafour.errachidi@etu.toulouse-inp.fr
- ZOUARI Sami - sami.zouari@etu.inp-n7.fr
- Highway-Env (E.Leurent) for providing the simulation environment
- Stable-baselines3 for the DQN implementation
- Pygame for enabling real-time interactive visualization.
-
configs/env_config.py:
-
Modify observation features, reward weights, IDM/MOBIL parameters, lane count, etc.
-
Adjust training hyperparameters (learning_rate, buffer_size, batch_size, gamma, exploration_fraction, etc.) under TRAIN_CONFIG.
-
-
Command-line arguments in main.py:
-
--train / --evaluate
-
--model_path (for evaluation)
-
--no_render (to disable rendering)
-
--n_envs (number of parallel environments during training)
-
The project may display several warnings during execution, primarily related to:
- Environment registration (from gymnasium)
- Deprecated environment configuration methods
- Package deprecation notices (e.g., pkg_resources)
These warnings don't affect functionality and can be suppressed using any of these methods:
- Command line:
python -W ignore main.py --train - Environment variable: Set
PYTHONWARNINGS=ignore - Code-level: Using
warnings.filterwarnings('ignore')
- gymnasium>=0.29.1
- highway-env>=1.8.1
- stable-baselines3>=2.1.0
- matplotlib>=3.7.0
- numpy>=1.26.4
- torch>=2.7.0
- tqdm>=4.65.0