This repository contains coursework and assignments for APS1080: Reinforcement Learning, a graduate-level course at the University of Toronto.
APS1080 provides a comprehensive introduction to Reinforcement Learning (RL), covering both classical methods and modern deep RL techniques. The course blends theoretical foundations with hands-on implementation in Python.
- RL Fundamentals: Agent-environment setup, Markov Decision Processes (MDPs), rewards, value functions
- Model-Based Methods: Dynamic Programming for planning and policy improvement
- Model-Free Methods: Monte Carlo, TD(0), TD(ฮป), and SARSA
- Function Approximation: Linear and nonlinear approximation in large state spaces
- Deep RL: Deep Q-Networks (DQN), policy gradients, and advanced techniques like MuZero
- Human Feedback and Real-World Considerations: Policy shaping, ChatGPT and RLHF
Readings are based on Reinforcement Learning: An Introduction (Sutton & Barto, 2nd Edition), Chapters 1โ11.
Implemented policy evaluation and improvement using tabular value iteration and policy iteration. Focused on model-based planning in small, discrete MDPs.
Applied Monte Carlo and TD(0) methods to the CartPole environment. Compared learning performance and visualized episode rewards.
Extended TD learning with linear function approximation for continuous state spaces. Implemented batch updates and evaluated generalization.
Built a Deep Q-Network (DQN) using PyTorch. Incorporated experience replay, target networks, and epsilon-greedy exploration in a high-dimensional setting.
Introduced the agent-environment framework, MDPs, and core RL terminology. Explored return computation and value functions conceptually and programmatically.
Implemented policy rollouts and return tracking in CartPole. Gained experience with OpenAI Gym environments and episode-based analysis.
Used Monte Carlo and TD(0) methods for value estimation. Compared learning stability, sample efficiency, and convergence characteristics.
Applied linear function approximation to generalize across continuous states. Built and analyzed value estimators using feature representations.
This repository is for educational reference only. If you are currently enrolled in APS1080, please do not copy or submit this work as your own.