Reinforcement Learning
Train an agent to navigate the grid.
Algorithm Guide
- Q-Learning: Model-free. The agent explores and learns from experience (trial and error).
- Value Iteration: Model-based. Computes the optimal policy by planning (knowing the rules).
Parameters
- Learning Rate ($\alpha$): How fast new info overrides old info.
- Exploration ($\epsilon$): Chance to take a random action (explore) vs best action (exploit).