In an earlier post I defined the price of memory , $u$, and asked: Is $u < 1$? Worth €100. Determine $u$. Worth another €100, even if $u=1$. I'm happy to announce that there is now a solution: The price of memory, $u$, is 1. I don't want to repeat the precise definition of $u$, but here is a short recap: Suppose you have an acyclic finite MDP and a strategy $\sigma$ that guarantees to visit a red transition with probability 1. This strategy $\sigma$ might use memory, i.e., might base its behaviour in a state on the history of the run, as indicated in yellow: Any such strategy $\sigma$ induces a randomized strategy $\overline\sigma$ that acts like $\sigma$ in average, but is memoryless. In each state, when averaged over all runs, the strategies $\sigma$ and $\overline\sigma$ take the same actions with the same probabilities: One can show that $\sigma$ and $\overline\sigma$ visit red transitions equally often in average; in particular, $\overline\sigma$ visits ...
Blog by Oxford computer scientist Stefan Kiefer