Markov Decision Processes
Modify transitions and rewards to see the optimal policy.
Bellman Equation
The value of a state $V(s)$ is the maximum expected future reward. $\gamma$ (Gamma) controls how much future rewards matter. If $\gamma \approx 0$, the agent is short-sighted. If $\gamma \approx 1$, it cares about long-term rewards.
Ready
Instructions
- Adjust Discount Factor ($\gamma$) to see how the optimal policy (arrows) changes.
- High $\gamma$ encourages reaching the distant Goal (Reward 100).
- Low $\gamma$ might prefer immediate small rewards or avoiding risk.