Reinforcement Learning

Q-Learning: Model-free. The agent explores and learns from experience (trial and error).
Value Iteration: Model-based. Computes the optimal policy by planning (knowing the rules).

Train an agent to navigate the grid.

Method

Learning Rate ($\alpha$): 0.1 Exploration ($\epsilon$): 0.1 Training Speed: 1x