Kernels on the Hypercube Graph

Warning

You can get by fine without reading this page for almost all use cases, just use the standard MaternGeometricKernel, following the respective example notebook.

This is optional material meant to explain the basic theory and based mainly on Borovitskiy et al. [2023].

Motivation

The HypercubeGraph space \(C^d\) can be used to model \(d\)-dimensional binary vector inputs.

There are many settings where inputs are binary vectors or can be represented as such. For instance, upon flattening, binary vectors represent adjacency matrices of unweighted labeled graphs [1].

Structure of the Space

The elements of this space—given its dim is \(d \in \mathbb{Z}_{>0}\)—are exactly the binary vectors of length \(d\).

The geometry of this space is simple: it is a graph such that \(x, x' \in C^d\) are connected by an edge if and only if they differ in exactly one coordinate (i.e. there is exactly Hamming distance \(1\) between).

Being a graph, \(C^d\) could also be represented using the general Graph space. However, the number of nodes in \(C^d\) is \(2^d\), which is exponential in \(d\), rendering the general techniques infeasible.

Eigenfunctions

On graphs, kernels are computed using the eigenfunctions and eigenvalues of the Laplacian.

The eigenfunctions of the Laplacian on the hypercube graph are the Walsh functions [2] given analytically by the simple formula

\[ w_T(x_0, .., x_{d-1}) = (-1)^{\sum_{j \in T} x_j} \]
where \(x = (x_0, .., x_{d-1}) \in C^d\) and the index \(T\) is an arbitrary subset of the set \(\{0, .., d-1\}\).

The corresponding eigenvalues are \(\lambda_T = \lambda_{\lvert T \rvert} = 2 \lvert T \rvert / d\), where \(\lvert T \rvert\) is the cardinality of \(T\).

However, the problem is that the number of eigenfunctions is \(2^d\). Hence naive truncation of the sum in the kernel formula to a few hundred terms leads to a poor approximation of the kernel for larger \(d\).

Addition Theorem

Much like for the hyperspheres and unlike for the general graphs, there is an addition theorem for the hypercube graph:

\[ \sum_{T \subseteq \{0, .., d-1\}, \lvert T \rvert = j} w_T(x) w_T(x') = \sum_{T \subseteq \{0, .., d-1\}, \lvert T \rvert = j} w_T(x \oplus x') = \binom{d}{j} \widetilde{G}_{d, j, m} \]
where \(\oplus\) is the elementwise XOR operation, \(m\) is the Hamming distance between \(x\) and \(x'\), and \(\widetilde{G}_{d, j, m}\) is the Kravchuk polynomial of degree \(d\) and order \(j\) normalized such that \(\widetilde{G}_{d, j, 0} = 1\), evaluated at \(m\).

Normalized Kravchuk polynomials \(\widetilde{G}_{d, j, m}\) satisfy the following three-term recurrence relation

\[ \widetilde{G}_{d, j, m} = \frac{d - 2 m}{d - j + 1} \widetilde{G}_{d, j - 1, m} -\frac{j-1}{d - j + 1} \widetilde{G}_{d, j - 2, m}, \quad \widetilde{G}_{d, 0, m} = 1, \quad \widetilde{G}_{d, 1, m} = 1 - \frac{2}{d} m, \]
which allows for their efficient computation without the need to compute large sums of the individual Walsh functions.

With that, the kernels on the hypercube graph can be computed efficiently using the formula

\[\begin{split} k_{\nu, \kappa}(x, x') = \frac{1}{C_{\nu, \kappa}} \sum_{l=0}^{L-1} \Phi_{\nu, \kappa}(\lambda_l) \binom{d}{j} \widetilde{G}_{d, j, m} \qquad \Phi_{\nu, \kappa}(\lambda) = \begin{cases} \left(\frac{2\nu}{\kappa^2} + \lambda\right)^{-\nu-\frac{d}{2}} & \nu < \infty \text{ — Matérn} \\ e^{-\frac{\kappa^2}{2} \lambda} & \nu = \infty \text{ — Heat (RBF)} \end{cases} \end{split}\]
where \(m\) is the Hamming distance between \(x\) and \(x'\), and \(L \leq d + 1\) is the user-controlled number of levels parameters.

Notes:

  1. We define the dimension of the HypercubeGraph space \(C^d\) to be \(d\), in contrast to the graphs represented by the Graph space, whose dimension is defined to be \(0\).

    Because of this, much like in the Euclidean or the manifold case, the \(1/2, 3/2, 5/2\) are in fact reasonable values of for the smoothness parameter \(\nu\).

Footnotes