Lectures 9 & 10: Limitations of Finite Automata
Pumping Lemma
The following theorem, known as pumping lemma gives a way to show that a language is not regular.
Theorem: Suppose \(L\) is the language of a finite automaton \(M\). Then, there exists a number \(p\) (called pumping length) such that for every word \(w \in L\) of length at least \(p\), there exists a decomposition \(w=xyz\) with the following properties:
- \(x y^i z\in L\) for every integer \(i\ge 0\),
- \(\lvert y \rvert > 0\),
- \(\lvert xy \rvert \le p\).
Proof: We may assume that \(M\) doesn’t have \(\varepsilon\)-transitions. (We could even assume \(M\) is deterministic.) We choose \(p\) as the number of states of \(M\). Let \(w\in L\) be a word in the language \(L\). We are to show a decomposition of \(w\) with the properties above.
Consider a computation path of \(M\) that accepts \(w\). Since \(w\) has length at least \(p\), one state of \(M\) appears at least twice on this computation path (by the pigeonhole principle). Let \(q\) be the first repeated state on the computation path. We divide the computation path into three parts: from the start state to the first occurance of \(q\), from the first occurance of \(q\) to its second occurance, and from the second occurance of \(q\) to an accept state.
Let \(x,y,z\) be the strings corresponding to these three parts of the computation path. Then, \(w=xyz\). We are to check that this decomposition satisfies the above properties. First, \(\lvert y \rvert > 0\) because \(M\) doesn’t have \(\varepsilon\)-transitions. Second, \(\lvert xy \rvert \le p\) because \(q\) is the first repeated state and therefore all other states on the first and second part of the path are distinct. Finally, \(x y^i z\in L\) for every integers \(i\ge 0\), because we can obtain accepting computation paths for these strings by repeating the second part of the paths \(i\) times. \(\square\)
Examples
Lemma: The language \(\{0^n1^n \mid n \in {\mathbb N}\}\) is not regular.
Proof: We will show that for any potential pumping length \(p\), there exists a word \(w\in L\) of length \(\lvert w \rvert \ge p\) that does not admit a decomposition as in the pumping lemma. Then, from the pumping lemma it follows that the language cannot be the language of a finite automata.
Let \(p\) be a natural number. Consider the word \(w=0^p1^p\) in \(L\). Let \(w=xyz\) be any decomposition of this word that satisfies the second and third condition of the pumping lemma. Then, \(x=0^s\) and \(y=0^t\) for integers \(s\ge 0\) and \(t\ge 1\) (because \(\lvert x y \rvert \le p\) and \(\lvert y \rvert \ge 1\)). But then \(xy^2z=0^{p+t}1^p\notin L\), which means that the decomposition doesn’t satisfy the first condition of the pumping lemma. \(\square\)
\The next example shows that sometimes one has to be careful to choose the right word \(w\) in the language to show that the conditions of the pumping lemma are not satisfied.
Lemma: The language \(\{ ww\mid w\in \{0,1\}^\ast \}\) is not regular.
Proof: Same proof strategy as before.
Let \(p\) be a natural number. Consider the word \(0^p10^p1\) in \(L\). Let \(w=xyz\) be any decomposition of this word that satisfies the second and third condition of the pumping lemma. As before, \(x=0^s\) and \(y=0^t\) for integers \(s\ge 0\) and \(t\ge 1\). Again \(x y^2z=0^{p+t}10^{p}1\) is not in \(L\), which means that the decomposition doesn’t satisfy the first condition of the pumping lemma. (Convince yourself that for example, for \(w=(01)^p\in L\) the proof would fail.) \(\square\)
\The following example shows that sometimes it is useful to “pump down” (which means to choose \(i=0\) to show that the first condition in the pumping lemma is violated).
Lemma: The language \(\{0^i 1^j \mid i \geq j \}\) is not regular.
Proof: Same proof strategy as before.
Let \(p\) be a natural number. Consider the word \(0^p1^p\). Let \(w=xyz\) be any decomposition of this word that satisfies the second and third condition of the pumping lemma. As before, \(x=0^s\) and \(y=0^t\) for integers \(s\ge 0\) and \(t\ge 1\). But then, the word \(xy^0z=0^{p-t}1^p\) is not in \(L\), which means that the first condition in the pumping lemma is not satisfied. (Convince yourself that \(xy^iz\) is in the language for all integers \(i\ge 1\).) \(\square\)