Skip to content

Pyoussefpour/Reinforcement-Learning-APS1080-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

APS1080 โ€“ Reinforcement Learning (Graduate Course)

This repository contains coursework and assignments for APS1080: Reinforcement Learning, a graduate-level course at the University of Toronto.

๐Ÿ“˜ Course Overview

APS1080 provides a comprehensive introduction to Reinforcement Learning (RL), covering both classical methods and modern deep RL techniques. The course blends theoretical foundations with hands-on implementation in Python.

Core Topics Covered:

  • RL Fundamentals: Agent-environment setup, Markov Decision Processes (MDPs), rewards, value functions
  • Model-Based Methods: Dynamic Programming for planning and policy improvement
  • Model-Free Methods: Monte Carlo, TD(0), TD(ฮป), and SARSA
  • Function Approximation: Linear and nonlinear approximation in large state spaces
  • Deep RL: Deep Q-Networks (DQN), policy gradients, and advanced techniques like MuZero
  • Human Feedback and Real-World Considerations: Policy shaping, ChatGPT and RLHF

Readings are based on Reinforcement Learning: An Introduction (Sutton & Barto, 2nd Edition), Chapters 1โ€“11.


๐Ÿ“ Assignment Summaries

๐Ÿงฎ Assignment 1 โ€“ Dynamic Programming

Implemented policy evaluation and improvement using tabular value iteration and policy iteration. Focused on model-based planning in small, discrete MDPs.

๐ŸŽฏ Assignment 2 โ€“ Monte Carlo and Temporal-Difference Learning

Applied Monte Carlo and TD(0) methods to the CartPole environment. Compared learning performance and visualized episode rewards.

๐Ÿ“‰ Assignment 3 โ€“ Function Approximation

Extended TD learning with linear function approximation for continuous state spaces. Implemented batch updates and evaluated generalization.

๐Ÿค– Assignment 4 โ€“ Deep Q-Networks (DQN)

Built a Deep Q-Network (DQN) using PyTorch. Incorporated experience replay, target networks, and epsilon-greedy exploration in a high-dimensional setting.


๐Ÿ“— Exercise Summaries

๐Ÿงญ Exercise 1 โ€“ RL Foundations and MDPs

Introduced the agent-environment framework, MDPs, and core RL terminology. Explored return computation and value functions conceptually and programmatically.

๐Ÿ•น๏ธ Exercise 2 โ€“ Policy Evaluation in CartPole

Implemented policy rollouts and return tracking in CartPole. Gained experience with OpenAI Gym environments and episode-based analysis.

๐Ÿ”„ Exercise 3 โ€“ Monte Carlo and TD in CartPole

Used Monte Carlo and TD(0) methods for value estimation. Compared learning stability, sample efficiency, and convergence characteristics.

๐Ÿงฎ Exercise 4 โ€“ Value Function Approximation

Applied linear function approximation to generalize across continuous states. Built and analyzed value estimators using feature representations.


โš ๏ธ Academic Integrity

This repository is for educational reference only. If you are currently enrolled in APS1080, please do not copy or submit this work as your own.

About

University of Toronto - APS1080 (RL) Graduate Level Course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published