🎮 AIMS Coursework

DQN — CartPole

Late 2024 — Deep Q-Network with Experience Replay

DQN Q-Learning Replay Buffer Target Network

About

Implementation of Deep Q-Network to solve the classic CartPole environment. The agent learns to balance a pole on a cart using Q-learning with neural network function approximation, epsilon-greedy exploration, experience replay, and target networks.

View on GitHub →

Environment: CartPole-v1

State dims

Actions

500

Max reward

Per timestep

State: cart position, cart velocity, pole angle, pole angular velocity

Method: Deep Q-Network

Neural Network

• Input: 4-dim state vector
• 2 hidden layers × 20 units
• Output: 2 Q-values (left/right)

Key Components

• Replay buffer (10K transitions)
• Target network (stable targets)
• Epsilon-greedy exploration

Hyperparameters:

LR: 3e-4

Batch: 512

γ: 0.99

ε decay: 3000 steps

Results

✅ Solved! Agent achieves maximum reward (500) after ~800 episodes

Episode returns over training — steady improvement to max reward

Key Observations

✅ Convergence — Agent consistently achieves maximum reward
✅ Epsilon-greedy ensures adequate exploration initially
✅ Replay buffer significantly stabilizes training
✅ Target network reduces instability from bootstrapping

← Back to Journey View on GitHub