Markov Decision Processes

Modify transitions and rewards to see the optimal policy.

Bellman Equation

The value of a state $V(s)$ is the maximum expected future reward. $\gamma$ (Gamma) controls how much future rewards matter. If $\gamma \approx 0$, the agent is short-sighted. If $\gamma \approx 1$, it cares about long-term rewards.

Ready

Instructions

  • Adjust Discount Factor ($\gamma$) to see how the optimal policy (arrows) changes.
  • High $\gamma$ encourages reaching the distant Goal (Reward 100).
  • Low $\gamma$ might prefer immediate small rewards or avoiding risk.