In this note, we will prove the following theorem due to Goldreich and Levin.
Theorem: There exists an algorithm that given a parameter \(\varepsilon>0\) and query-access1 to a function \(f:\mathbb F_2^n\to\mathbb F_2\) computes a list \(L\) that contains all \(\alpha\in \mathbb F_2^n\) satisfying \[ \Pr_r\Big\{f(r)=\langle\alpha,r\rangle\Big\}>1/2+\varepsilon. \] The running time of the algorithm (and thus the length of the list) is polynomial in \(n\) and \(1/\varepsilon\).
Remark: The set of functions \(r\mapsto \langle\alpha,r\rangle\) is called the Hadamard code. The above theorem asserts that given query-access function \(f\) one can compute a list of all Hadamard codewords that have relative Hamming distance at most \(1/2-\varepsilon\) from \(f\). The idea of list-decoding is pervasive in complexity theory.
Notational switch: Instead of working over (the additive group) \(\mathbb F_2\), it is nicer to work over \(\mathbb R\) (multiplicatively). We can make this switch via the mapping \(a\mapsto (-1)^a\). (Since \((-1)^{a+b}=(-1)^a\cdot (-1)^b\), this mapping indeed translates addition over \(\mathbb F_2\) to multiplication over \(\mathbb R\). Formally, the mapping is a group homomorphism.)
Proof:
Let \(\chi_\alpha(r)=(-1)^{\langle \alpha,r\rangle}\) be a real-valued function on \(\mathbb F_2^n\). (These functions are the characters of the group \(\mathbb F_2^n\). Also note that they exactly correspond to the Hadmard code.)
For simplicity, let us redefine \(f\) so that it takes values in \(\{\pm1\}\). For real-valued functions \(g,h\) on \(\mathbb F_2^n\), we define their inner product \(\langle g,h\rangle = \mathbb E_{r\in\mathbb F_2^n} g(r) h(r)\).
With this notation, our goal is to find all \(\alpha\in\mathbb F_2^n\) such that \(\langle f,\chi_\alpha\rangle>\varepsilon\). (Here, we might be off by a factor \(2\), but that doesn’t matter.)
The functions \(\{\chi_\alpha\}\) form an orthonormal basis with respect to the inner product we defined above. (It is easy to check that \(\langle \chi_\alpha,\chi_\alpha\rangle = 1\) and \(\langle\chi_\alpha,\chi_\beta\rangle =0\) for \(\alpha\neq \beta\).)
If we express \(f\) in this basis, then \(f=\sum_\alpha c_\alpha \chi_\alpha\) for \(c_\alpha = \langle f,\chi_\alpha\rangle\). Hence, our goal is to identify the significant coefficients of \(f\) in the basis \(\{\chi_\alpha\}\). (The basis \(\{\chi_\alpha\}\) is often called Fourier-basis. The task of computing coefficients in this basis is called Fourier-transform. If we only care about few significant coefficients, the problem is called sparse Fourier-transform.)
As norms are invariant under orthogonal basis change, it follows that
\[ 1=\langle f,f\rangle = \sum_{\alpha} c_\alpha^2 \]
In particular, the number of coefficients \(\alpha\) with \(c_\alpha>\varepsilon\) is bounded by \(1/\varepsilon^2.\)
This observation alone still doesn’t allow us to find the signficant coefficients efficiently. (We cannot afford to look at all \(2^n\) coefficients.)
The idea is to use the algebraic structure that underlies these coefficients.
A subcube is a subset \(A\subseteq \mathbb F_2^n\) defined by constraints of the form \(\{\alpha_i=0\}\) and \(\{\alpha_i=1\}\). (One can think of subcubes as axis-aligned affine linear subspace of the vector space \(\mathbb F_2^n\).)
For a subcube \(A\), we define \(f_A=\sum_{\alpha\in A}c_\alpha\chi_\alpha\) (that is, we truncate all coefficients outside of \(A\)).
To find signficant coefficients \(c_\alpha\), we will do some kind of binary search on subcubes. (One can view binary search as a procedure to explore a binary tree. Typically, only one path from the root to a leaf. In our case, we will explore \(1/\varepsilon^2\) paths.)
Here is a more precise description of the search algorithm:
We will maintain a small collection of disjoint active subcubes. Initially, we have exactly one active subcube, namely \(\mathbb F_2^n\).
Until all active subcubes have size \(1\), let \(A\) be the largest active subcube and do the following:
Split \(A\) into two equal sized subcubes \(A_0\) and \(A_1\) and mark \(A\) as inactive.
Compute \(\langle f_{A_0},f_{A_0}\rangle\) and \(\langle f_{A_1},f_{A_1}\rangle\).
For \(i\in\{0,1\}\), activate the subcube \(A_i\) if \(\langle f_{A_i},f_{A_i}\rangle>\varepsilon\).
How many iterations does it take until the algorithm terminates? Since the active subcubes are always disjoint, at most \(1/\varepsilon^2\) subcubes can be active at the same time. Hence, the size of the largest subcube decreases by a factor \(2\) after at most \(1/\varepsilon^2\) iterations. Thus, the total number of iterations is at most \(n/\varepsilon^2\).
Why is the algorithm correct? After each iteration, the active subcubes cover all signficant coefficients \(c_\alpha\) with \(c_\alpha>\varepsilon\). Hence, at the end the active subcubes correspond precisely to the signficant coefficients.
Unfortunately, the algorithm doesn’t quite work as stated. The problem is to compute the numbers \(\langle f_{A_0},f_{A_0}\rangle\) and \(\langle f_{A_1},f_{A_1}\rangle\) in the algorithm. However, it turns out that one can estimate these numbers by random sampling.
Let us consider the special case \(A=\{\alpha\}\). How can we estimate \(\langle f_A,f_A\rangle=c_\alpha^2\) in this case? Recall that
\[ c_\alpha=\langle f,\chi_\alpha\rangle = \mathbb E_{r\in \mathbb F_2^n} f(r)\chi_\alpha(r) \]
Hence, we can estimate \(c_\alpha\) by sampling random points \(r_1,\ldots,r_k\) and computing the average value \(f(r_i)\chi_\alpha(r_i)\). From Chernoff bounds, we can see that it is enough to take \(k=O(n/\varepsilon)\) (because we want the estimate to be correct with high probability). It follows that we can estimate \(c_\alpha\) by querying \(f\) only at a polynomial number of points.
It turns out that the quantities \(\langle f_A,f_A\rangle\) can be estimate efficiently in a similar way for general subcubes \(A\). The details are a bit tedious here.
We can formalize query access using oracle Turing machines. Concretely, we have a polynomial-time oracle machine \(M\) and we use the function \(f\) as oracle. We run the machine on an arbitrary input of length \(n+1/\varepsilon\). (Hence, it will run in time poly\((n,1/\varepsilon)\).)↩