Reinforcement Learning

Train an agent to navigate the grid.

Algorithm Guide

  • Q-Learning: Model-free. The agent explores and learns from experience (trial and error).
  • Value Iteration: Model-based. Computes the optimal policy by planning (knowing the rules).

Parameters

  • Learning Rate ($\alpha$): How fast new info overrides old info.
  • Exploration ($\epsilon$): Chance to take a random action (explore) vs best action (exploit).