Skip to main content

The Price of Memory

In an earlier post I defined the price of memory, $u$, and asked:
Is $u < 1$? Worth €100.
Determine $u$. Worth another €100, even if $u=1$.
I'm happy to announce that there is now a solution:
The price of memory, $u$, is 1.
I don't want to repeat the precise definition of $u$, but here is a short recap: Suppose you have an acyclic finite MDP and a strategy $\sigma$ that guarantees to visit a red transition with probability 1. This strategy $\sigma$ might use memory, i.e., might base its behaviour in a state on the history of the run, as indicated in yellow:
Any such strategy $\sigma$ induces a randomized strategy $\overline\sigma$ that acts like $\sigma$ in average, but is memoryless. In each state, when averaged over all runs, the strategies $\sigma$ and $\overline\sigma$ take the same actions with the same probabilities:
One can show that $\sigma$ and $\overline\sigma$ visit red transitions equally often in average; in particular, $\overline\sigma$ visits at least one red transition in average. The price of memory is defined as the smallest global (independent of the MDP or any strategy) constant $u$ such that the probability that $\overline\sigma$ does not visit a red transition is at most $u$. Trivially, $u \le 1$, and examples as above show that $u \ge \frac12$. Let us now prove that $u=1$:
Consider the following MDP, $M_1$:
In the controlled state in the middle, the strategy $\sigma$ goes left if and only if no red transition has been visited yet (which is the case if and only if the initial random state has chosen the right transition). In this way, $\sigma$ visits exactly one red transition with probability 1. It also follows that in the controlled state in the middle, the randomized strategy $\overline\sigma$ picks either outgoing transition with probability $\frac12$. Overall, the probability in $M_1$ that $\overline\sigma$ does not visit a red transition is $u_1 := \frac12 \cdot \frac12 = \frac14$. It follows that $u \ge u_1 = \frac14$, which is not a strong statement. Consider now the following MDP, $M_2$, which contains two copies of $M_1$:
As before, there is a strategy $\sigma$ that ensures to visit exactly one red transition: in any controlled state, $\sigma$ goes left if and only if no red transition has been visited yet. As before, the resulting randomized strategy $\overline\sigma$ picks, in any controlled state, either outgoing transition with probability $\frac12$. The probability in $M_2$ that $\overline\sigma$ does not visit a red transition is $u_2 := (\frac12 + \frac{1}2u_1)^2 = \frac{25}{64} \approx 0.39$. It follows that $u \ge u_2$.

One can define a sequence of MDPs $M_1, M_2, M_3, \ldots$ in this way:
Defining $u_{i+1} := (\frac12 + \frac{1}2u_i)^2$ and generalizing $\sigma$ in the natural way, we obtain that in $M_i$ the probability that $\overline\sigma$ does not visit a red transition is $u_i$. It is straightforward to check that the sequence $u_1, u_2, u_3, \ldots$ is non-decreasing and converges to the only fixed point of the function $f(x) = (\frac12 + \frac{1}2x)^2$, which is $1$. Since $u \ge u_i$, it follows that $u=1$.
I should award the prize of €200 to the person who found this solution. The only issue is: that would be myself. Was the problem difficult to solve? For me, yes. Could a Maths or CS undergrad have found this solution? Absolutely. Does this result, $u=1$, lead to a paper? It might, but my collaborators and I would have preferred $u \lt 1$. Oh well.

Comments

Popular posts from this blog

Tell me the price of memory and I give you €100

Markov Decision Processes (MDPs) are Markov chains plus nondeterminism: some states are random, the others are controlled (nondeterministic). In the pictures, the random states are round, and the controlled states are squares: The random states (except the brown sink state) come with a probability distribution over the successor states. In the controlled states, however, a controller chooses a successor state. What does the controller want to achieve? That depends. In this blog post, the objective is very simple: take a red transition . The only special thing about red transitions is that the controller wants to take them. We consider only MDPs with the following properties: There are finitely many states and transitions. The MDP is acyclic, that is, it has no cycles. There are a unique start state from which any run starts (in the pictures: blue, at the top) and a unique sink state where any run ends (in the pictures: brown, at the bottom). No matter what the controller does,

Calibrating Assessments

Last year I wrote a post on how Oxford selects undergraduate students for computer science. Oxford undergraduate admissions are politically charged: the British public suspects that Oxbridge is biased towards privileged applicants. This post is unpolitical though. I'm advocating a better way of solving a typical problem in academia: you have many candidates (think: papers, grant proposals, or Oxford applicants) and few assessors (think: members of program committees, assessment panels, or admission tutors), and the goal of the assessors is to select the best candidates. Since there are many candidates, each assessor can assess only some candidates. From a candidate's point of view: a paper may get 3 reviews, a grant proposal perhaps 5, and a shortlisted Oxford applicant may get 3 interviews. Each such review comes with a score that reflects the assessor's estimate of the candidate's quality. The problem is that different assessors will score the same candidate differ

Short Killing Words

Given a finite set $\mathcal{M}$ of $n \times n$ matrices over the integers, can you express the zero matrix as a product of matrices in $\mathcal{M}$? This is known as the mortality problem. Michael Paterson showed in 1970 that it is undecidable, even for $n=3$. Later it was shown that mortality remains undecidable for $n=3$ and $|\mathcal{M}| = 7$, and for $n=21$ and $|\mathcal{M}| = 2$. Decidability of mortality for $n=2$ is open. If all matrices in $\mathcal{M}$ are nonnegative, the mortality problem becomes decidable, because then it matters only whether matrix entries are $0$ or not. Specifically, view the given matrices as transition matrices of a nondeterministic finite automaton where all states are initial and accepting, and check whether there is a word that is not accepted. This problem is decidable and PSPACE-complete. One can construct cases where the shortest word that is not accepted by the automaton has exponential length. In this post we focus on the nonnegative