Skip to content

This project implements a self-driving car agent using Deep Q-Network (DQN) in a simulated highway environment.

Notifications You must be signed in to change notification settings

AyoubTe/RLProject

Repository files navigation

Highway Environment Deep Reinforcement Learning Project

This project implements a self-driving car agent using Deep Q-Network (DQN) in a simulated highway environment. The agent learns to navigate through traffic while maintaining safety and efficiency.

Features

  • Deep Q-Network Implementation: Robust DQN agent for autonomous driving
  • Parallel Training: Efficient training using multiple environments (VecEnv)
  • Custom Reward Shaping: Balanced reward system for safe and efficient driving (collision penalty, speed bonus, right-lane bonus, lane-change penalty)
  • Real-time Visualization: Interactive Pygame visualization of the trained agent
  • Progress Tracking: Comprehensive metrics and performance visualization (training curves, evaluation results, action distribution, episode statistics)

Project Structure

.
├── configs/
│   └── env_config.py      # Environment and training configuration
├── utils/
│   └── viz.py            # Visualization utilities  (Matplotlib)
├── main.py               # Main training/evaluation script
├── simulate.py           # Real-time Pygame simulation of trained model
├── requirements.txt      # Project dependencies
└── README.md            # This file

Environment

The project uses the highway-v0 environment from Highway-Env, featuring:

  • Multi-lane highway with dynamic traffic (4 lanes by default) with realistic physics (IDM longitudinal + MOBIL lane-change)
  • Kinematics-based: each vehicle (ego + up to 5 nearest neighbors) has a 5-dimensional feature vector
  • Discrete action space (DiscreteMetaAction): {LANE_LEFT, LANE_RIGHT, FASTER, SLOWER, IDLE}
  • Episode duration: 40 seconds (simulation at 15 Hz → 600 steps max) or ends upon collision

Installation

  1. Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# or
.venv\Scripts\activate     # Windows
  1. Install dependencies:
pip install -r requirements.txt

Usage

Training

To train the agent:

python main.py --train

Optional arguments:

  • --n_envs: Number of parallel Gym environments to use (default: 4; recommended ≤ CPU core count)
  • --no_render: Disable rendering during training (not typically needed)

Training will proceed in intervals of 2 000 steps (total 100 000 timesteps by default), evaluating the policy every 2 000 steps on 10 episodes. The best model (by mean evaluation reward) is saved as:

results/run_<YYYYMMDD_HHMMSS>/best_model.zip

After training completes (or early-stops when mean reward ≥ 200), the final model is also saved:

results/run_<YYYYMMDD_HHMMSS>/final_model.zip

A plot of training progress (training_progress.png) is automatically generated and stored in the same results/run_/ folder.

Evaluation

To evaluate a pre-trained model (without retraining), run:

python main.py --evaluate --model_path results/run_YYYYMMDD_HHMMSS/best_model.zip

Optional arguments:

  • --no_render : Disable rendering (env.render calls) during evaluation
  • --n_envs: (Ignored in evaluation mode)

Evaluation will run a single environment for up to 1 000 steps or until termination, collecting:

  • Total cumulative reward

  • Average reward per step

  • Number of collisions

  • Action distribution

The results generate:

results/run_<timestamp>/evaluation_results.png
results/run_<timestamp>/action_distribution.png
results/run_<timestamp>/episode_stats.txt

Real-time Visualization

To watch the trained agent act in real time (via Pygame), run:

python simulate.py --model results/run_YYYYMMDD_HHMMSS/best_model.zip

The visualization includes:

  • Green vehicle: Ego (The trained agent)
  • Red vehicles: Other traffic
  • Purple vehicles: Merging traffic (lateral movement)
  • Gray horizontal lines: Lane boundaries
  • On-screen stats (top-left): Current step, cumulative reward, collision count

Controls:

  • Press 'R' to reset the episode (new random initialization)
  • Close window to end simulation

Visualization Tools (utils/viz.py)

The viz.py module contains functions to generate plots and save statistics:

  1. Training Progress: Plots mean reward over time (plot_training_progress(timestamps, mean_rewards, output_dir))
  • Inputs:
    • timestamps: List of timesteps at which evaluation occurred
    • mean_rewards: Corresponding mean evaluation rewards
  • Output:
    • Saves training_progress.png in output_dir=result/run_YYYYMMDD_HHMMSS
  1. Evaluation Results: Shows cumulative reward during evaluation (plot_evaluation_results(rewards, output_dir))
  • Inputs:
    • rewards: List of rewards per step (evaluation run)
  • Output:
    • Saves evaluation_results.png (cumulative reward vs. timestep)
  1. Action Distribution: Displays the agent's action preferences (plot_action_distribution(actions, output_dir))
  • Inputs:
    • actions: List of discrete actions taken by the agent (evaluation run)
  • Output:
    • Saves action_distribution.png
  1. Episode Statistics: Saves detailed performance metrics (save_episode_stats(rewards, output_dir))
  • Inputs:
    • rewards: List of rewards per step
  • Output:
    • Saves episode_stats.txt containing:

      Total Reward: <sum of rewards>
      Average Reward: <mean>
      Std Reward: <standard deviation>
      Min Reward: <minimum>
      Max Reward: <maximum>
      Episode Length: <len(rewards)>
      

Results are automatically saved in the results directory with timestamped folders (results/run_<timestamp>/).

Results Direcory

Training results and visualizations are saved in:

results/
└── run_YYYYMMDD_HHMMSS/
    ├── training_progress.png
    ├── evaluation_results.png
    ├── action_distribution.png
    ├── episode_stats.txt
    ├── best_model.zip
    └── final_model.zip
  • training_progress.png: Mean evaluation reward vs. timesteps

  • best_model.zip: Saved DQN weights from the best evaluation checkpoint

  • final_model.zip: Final DQN weights after training

  • evaluation_results.png: Cumulative reward curve for a 1 000-step evaluation run

  • action_distribution.png: Histogram of actions taken by the agent during evaluation

  • episode_stats.txt: Numeric summary (total, average, std, min, max rewards; length)

Authors

Acknowledgments

  • Highway-Env (E.Leurent) for providing the simulation environment
  • Stable-baselines3 for the DQN implementation
  • Pygame for enabling real-time interactive visualization.

Configuration

  • configs/env_config.py:

    • Modify observation features, reward weights, IDM/MOBIL parameters, lane count, etc.

    • Adjust training hyperparameters (learning_rate, buffer_size, batch_size, gamma, exploration_fraction, etc.) under TRAIN_CONFIG.

  • Command-line arguments in main.py:

    • --train / --evaluate

    • --model_path (for evaluation)

    • --no_render (to disable rendering)

    • --n_envs (number of parallel environments during training)

Known Warnings

The project may display several warnings during execution, primarily related to:

  1. Environment registration (from gymnasium)
  2. Deprecated environment configuration methods
  3. Package deprecation notices (e.g., pkg_resources)

These warnings don't affect functionality and can be suppressed using any of these methods:

  1. Command line: python -W ignore main.py --train
  2. Environment variable: Set PYTHONWARNINGS=ignore
  3. Code-level: Using warnings.filterwarnings('ignore')

Dependencies

  • gymnasium>=0.29.1
  • highway-env>=1.8.1
  • stable-baselines3>=2.1.0
  • matplotlib>=3.7.0
  • numpy>=1.26.4
  • torch>=2.7.0
  • tqdm>=4.65.0

About

This project implements a self-driving car agent using Deep Q-Network (DQN) in a simulated highway environment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published