Markov Decision Processes

Modify transitions and rewards to see the optimal policy.

Discount Factor ($\gamma$): 0.9

Bellman Equation

The value of a state $V(s)$ is the maximum expected future reward. $\gamma$ (Gamma) controls how much future rewards matter. If $\gamma \approx 0$, the agent is short-sighted. If $\gamma \approx 1$, it cares about long-term rewards.

MDP Grid Editor

Ready

Instructions

Adjust Discount Factor ($\gamma$) to see how the optimal policy (arrows) changes.
High $\gamma$ encourages reaching the distant Goal (Reward 100).
Low $\gamma$ might prefer immediate small rewards or avoiding risk.