Loading web-font TeX/Math/Italic
Skip to main content

Equivalence Checking

There is a compact data structure for words: automata. Finite automata can “store” many words, even infinitely many—in the sense that the language of a finite automaton can be infinite. This powerful feature makes automata ubiquitous in computer science. Lecture notes by Javier Esparza emphasize this view of automata as data structures.

The most fundamental algorithmic problems on automata are:
  • operations, such as computing set union or complement of the languages of automata;
  • tests, such as checking their languages for equality.
In this post I focus on the latter: equivalence checking. In general, equivalence checking for automata is PSPACE-complete, which is excusable considering how compact and flexible data structures they are. One can check two deterministic finite automata for equivalence in polynomial time, by searching their product.

Less known is the fact that one can efficiently check much wider classes of automata for equivalence, including unambiguous automata, probabilistic automata, and weighted automata over fields. The algorithmic idea is always the same. To illustrate it, in this post I pick labelled Markov chains (LMCs). Here are two LMCs:
An LMC randomly produces an infinite word, by starting in the initial state and then taking transitions randomly according to the transition probabilities and outputting the letters on the traversed transitions. For example, the LMCs C_1 and C_2 above produce only words that start with a, and each subsequent letter is a or b, each one picked independently and uniformly at random. Thus, C_1 and C_2 are equivalent: for each finite prefix w \in \{a,b\}^*, the probability that C_1 generates w is equal to the probability that C_2 generates w.
One can check equivalence of LMCs in polynomial time.
Here is how. For each LMC we take a row vector that indicates the initial state. For C_1: \alpha_1 \ = \ \begin{pmatrix} 1 & 0 \end{pmatrix}
Further we take the transition matrix for both letters. For C_1: M_1(a) \ = \ \begin{pmatrix} 0 & 1 \\ 0 & \frac12 \end{pmatrix} \qquad M_1(b) \ = \ \begin{pmatrix} 0 & 0 \\ 0 & \frac12 \end{pmatrix}
Finally, we take a column vectors of 1s: \eta_1 \ = \ \begin{pmatrix} 1 \\ 1 \end{pmatrix}
Now, for any finite word w = \sigma_1 \sigma_2 \cdots \sigma_k \in \{a,b\}^* the probability that C_1 generates w is: \alpha_1 M_1(\sigma_1) M_1(\sigma_2) \cdots M_1(\sigma_k) \eta_1
Of course, this holds analogously for the other LMC, C_2. Equivalence checking is checking whether \begin{equation} \begin{aligned} &\alpha_1 M_1(\sigma_1) M_1(\sigma_2) \cdots M_1(\sigma_k) \eta_1 \\ \ = \ &\alpha_2 M_2(\sigma_1) M_2(\sigma_2) \cdots M_2(\sigma_k) \eta_2 \end{aligned} \tag{1} \end{equation}
holds for all \sigma_1 \sigma_2 \cdots \sigma_k \in \{a,b\}^*. To this end we “merge the data” of C_1 and C_2. Define: \begin{align*} \alpha \ := \ &\begin{pmatrix} \alpha_1 & \alpha_2 \end{pmatrix} \\ = \ &\begin{pmatrix} 1 & 0 & 1 & 0 \end{pmatrix} \end{align*}
and \begin{align*} M(a) \ := \ &\begin{pmatrix} M_1(a) & 0 \\ 0 & M_2(a) \end{pmatrix} \\[1mm] = \ &\begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & \frac12 & 0 & 0 \\ 0 & 0 & \frac12 & \frac12 \\ 0 & 0 & 0 & 0 \end{pmatrix} \end{align*}
and \begin{align*} M(b) \ := \ &\begin{pmatrix} M_1(b) & 0 \\ 0 & M_2(b) \end{pmatrix} \\[1mm] = \ &\begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & \frac12 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \frac12 & \frac12 \end{pmatrix} \end{align*}
and \eta \ := \ \begin{pmatrix} \eta_1 \\ -\eta_2 \end{pmatrix} \ = \ \begin{pmatrix} 1 \\ 1 \\ -1 \\ -1 \end{pmatrix}
Notice that we have taken minus signs in front of \eta_2, because in this way Equation (\text{1}) becomes \begin{equation} \alpha M(\sigma_1) M(\sigma_2) \cdots M(\sigma_k) \eta \ = \ 0\,. \tag{2} \end{equation}
That is, we want to check whether (\text{2}) holds for all \sigma_1 \sigma_2 \cdots \sigma_k \in \{a,b\}^*. Here comes the trick: this is equivalent to checking whether the vector space, \mathcal{V} \subseteq \mathbb{R}^4, spanned by the set of vectors \left\{\alpha M(\sigma_1) M(\sigma_2) \cdots M(\sigma_k) : \sigma_1 \sigma_2 \cdots \sigma_k \in \{a,b\}^*\right\}\,,
is orthogonal to \eta. Once we have computed a basis of the vector space \mathcal{V} we are basically done! One can equivalently characterize \mathcal{V} as the smallest vector space that includes \alpha and is closed under multiplication with M(a) and under multiplication with M(b). So we can compute a basis, \mathcal{B}, of \mathcal{V} as follows:
  1. \mathcal{B} := \{\alpha\}
  2. while there are v \in \mathcal{B} and \sigma \in \{a,b\} such that v M(\sigma) is not in the vector space spanned by \mathcal{B}:
    \hspace{10mm} \mathcal{B} := \mathcal{B} \cup \{v M(\sigma)\}
  3. return \mathcal{B}
Since the dimension of \mathcal{V} is at most the number of states, here 4, the loop terminates after at most 4 iterations. In our example the algorithm initializes \mathcal{B} as follows: \mathcal{B} \ := \ \{\alpha\} \ = \ \{ \begin{pmatrix} 1 & 0 & 1 & 0 \end{pmatrix} \}
Since \alpha M(a) \ = \ \begin{pmatrix} 0 & 1 & \frac12 & \frac12 \end{pmatrix}
is linearly independent of \alpha, the algorithm adds this vector to \mathcal{B}: \mathcal{B} \ := \ \{\alpha\} \cup \{\alpha M(a)\}
Now the algorithm terminates because right-multiplication with M(a) and M(b) produces either \begin{pmatrix} 0 & 0 & 0 & 0 \end{pmatrix} or \begin{pmatrix} 0 & \frac12 & \frac14 & \frac14 \end{pmatrix}, both of which are in the vector space spanned by \mathcal{B}. Since both basis vectors \alpha and \alpha M(a) are orthogonal to \eta, we can conclude that the two LMCs C_1 and C_2 are equivalent.

This algorithm can be traced back to a Schützenberger paper from 1961. In a recent paper, Nathanaël Fijalkow, Mahsa Shirmohammadi, and I show that one can compare labelled Markov decision processes (LMDPs) in the same way. LMDPs are like LMCs, but a controller can choose actions in each state to influence the transition probabilities. The equivalence checking problem for two LMDPs D_1, D_2 is whether for each finite prefix w \in \{a,b\}^*, the probability that D_1 generates w is equal to the probability that D_2 generates w, no matter how the controller chooses its actions. This problem is solvable in polynomial time, using a generalization of the algorithm above.

Here is yet another version: given an LMC C and an LMDP D, can the controller choose actions in D so that for each finite prefix w, the probability that D generates w is equal to the probability that C generates w? This version is undecidable.

Comments

Popular posts from this blog

Tell me the price of memory and I give you €100

Markov Decision Processes (MDPs) are Markov chains plus nondeterminism: some states are random, the others are controlled (nondeterministic). In the pictures, the random states are round, and the controlled states are squares: The random states (except the brown sink state) come with a probability distribution over the successor states. In the controlled states, however, a controller chooses a successor state. What does the controller want to achieve? That depends. In this blog post, the objective is very simple: take a red transition . The only special thing about red transitions is that the controller wants to take them. We consider only MDPs with the following properties: There are finitely many states and transitions. The MDP is acyclic, that is, it has no cycles. There are a unique start state from which any run starts (in the pictures: blue, at the top) and a unique sink state where any run ends (in the pictures: brown, at the bottom). No matter what the controller does, ...

Short Killing Words

Given a finite set \mathcal{M} of n \times n matrices over the integers, can you express the zero matrix as a product of matrices in \mathcal{M}? This is known as the mortality problem. Michael Paterson showed in 1970 that it is undecidable, even for n=3. Later it was shown that mortality remains undecidable for n=3 and |\mathcal{M}| = 7, and for n=21 and |\mathcal{M}| = 2. Decidability of mortality for n=2 is open. If all matrices in \mathcal{M} are nonnegative, the mortality problem becomes decidable, because then it matters only whether matrix entries are 0 or not. Specifically, view the given matrices as transition matrices of a nondeterministic finite automaton where all states are initial and accepting, and check whether there is a word that is not accepted. This problem is decidable and PSPACE-complete. One can construct cases where the shortest word that is not accepted by the automaton has exponential length. In this post we focus on the nonnegative ...

Oxford Admissions

An initial draft of this post was about how to make rankings more meaningful, a piece of mathsy computer science, as you expect it from this blog. My motivation is personal: I'm involved in selecting CS undergraduate students. Some people might be curious how Oxford admissions work: it's certainly a favourite subject for UK politicians and news media. Therefore I decided to postpone my original plan and to write this post about Oxford admissions. I hope I'm not playing with too much fire and the blog will not suddenly become famous for the wrong reasons! The tenor in newspapers and political campaigns is that the Oxbridge admissions process is “opaque” and unfair. Here I am going to offer transparency and some comments on fairness. The views expressed here are my own, and may not necessarily coincide with the university's views. Oxford's goal is to admit the strongest applicants. One can disagree with this basic premise, but we try to admit those applicants who ...