[1]:
# To run this in Google Colab, uncomment the following line
# !pip install geometric_kernels
# If you want to use a version of the library from a specific branch on GitHub,
# say, from the "devel" branch, uncomment the line below instead
# !pip install "git+https://github.com/geometric-kernels/GeometricKernels@devel"
Matérn and Heat Kernels on the Special Orthogonal Group \(\mathrm{SO}(n)\)¶
This notebook shows how define and evaluate kernels on the special orthogonal group \(\mathrm{SO}(3)\) that consists of \(3 \times 3\) orhogonal matrices with unit determinant.
Handling the special orthogonal group \(\mathrm{SO}(n)\) for \(n \geq 4\) is the same. We work with \(\mathrm{SO}(3)\) here because (1) it is the most well known case (2) it is easier (although still non-trivial) to visualize.
Note: the “points” in the special orthogonal group \(\mathrm{SO}(n)\) are represented by matrices (array
s of the suitable backend) in \(\mathbb{R}^{n \times n}\) whose determinant is equal to \(1\).
We use the numpy backend here.
Contents¶
Basics¶
[2]:
# Import a backend, we use numpy in this example.
import numpy as np
# Import the geometric_kernels backend.
import geometric_kernels
# Note: if you are using a backend other than numpy,
# you _must_ uncomment one of the following lines
# import geometric_kernels.tensorflow
# import geometric_kernels.torch
# import geometric_kernels.jax
# Import a space and an appropriate kernel.
from geometric_kernels.spaces import SpecialOrthogonal
from geometric_kernels.kernels import MaternGeometricKernel
import matplotlib as mpl
import matplotlib.pyplot as plt
INFO (geometric_kernels): Numpy backend is enabled, as always. To enable other backends, don't forget `import geometric_kernels.*backend name*`.
Defining a Space¶
First we create a GeometricKernels space
that corresponds to the special orthogonal group \(\mathrm{SO}(3)\), a subset of the set of all \(3 \times 3\) matrices \(\mathbb{R}^{3 \times 3}\).
[3]:
so = SpecialOrthogonal(n=3)
Defining a Kernel¶
First, we create a generic Matérn kernel.
To initialize MaternGeometricKernel
you just need to provide a Space
object, in our case this is the so
we have just created above.
There is also an optional second parameter num
which determines the order of approximation of the kernel. There is a sensible default value for each of the spaces in the library, so change it only if you know what you are doing.
A brief account on theory behind the kernels on compact manifold like (which \(\mathrm{SO}(n)\) are examples of) can be found on these documentation pages: one, two.
[4]:
kernel = MaternGeometricKernel(so)
To support JAX, our classes do not keep variables you might want to differentiate over in their state. Instead, some methods take a params
dictionary as input, returning its modified version.
The next line initializes the dictionary of kernel parameters params
with some default values.
Note: our kernels do not provide the outputscale/variance parameter frequently used in Gaussian processes. However, it is usually trivial to add it by multiplying the kernel by an (optimizable) constant.
[5]:
params = kernel.init_params()
print('params:', params)
params: {'lengthscale': array(1.), 'nu': array(inf)}
To define two different kernels, Matern-3/2 and Matern-∞ (aka heat, RBF, squared exponential, diffusion), we need two different versions of params
:
[6]:
params["lengthscale"] = np.array([0.5])
params_32 = params.copy()
params_inf = params.copy()
del params
params_32["nu"] = np.array([3/2])
params_inf["nu"] = np.array([np.inf])
Now two kernels are defined and we proceed to evaluating both on a set of random inputs.
Evaluating Kernels on Random Inputs¶
We start by sampling 10
(uniformly) random points in \(\mathrm{SO}(3)\). An explicit key
parameter is needed to support JAX as one of the backends.
[7]:
key = np.random.RandomState(1234)
key, xs = so.random(key, 10)
print(xs)
[[[-0.83118121 -0.53251727 0.15988481]
[-0.05989292 -0.20013482 -0.97793604]
[ 0.55276635 -0.82241803 0.13445429]]
[[-0.24549041 -0.96836999 -0.04465438]
[-0.07519823 -0.02690221 0.99680565]
[-0.96647798 0.24806416 -0.06621548]]
[[-0.7316939 0.30139255 0.61138087]
[-0.32058557 0.63938043 -0.69886877]
[-0.6015388 -0.7073579 -0.37120866]]
[[-0.64400808 -0.75505391 -0.12307395]
[-0.75449949 0.60029278 0.2652906 ]
[-0.1264283 0.26370852 -0.95628118]]
[[ 0.16603267 -0.8784055 0.44814834]
[ 0.98529484 0.12918269 -0.11182982]
[ 0.04033892 0.46012565 0.88693695]]
[[-0.1476533 0.58885089 0.79464025]
[ 0.06351094 0.80743353 -0.58653001]
[-0.9869979 -0.03613474 -0.1566187 ]]
[[ 0.76324266 -0.64597696 0.01320642]
[ 0.2130557 0.23233043 -0.94901519]
[ 0.60997369 0.72714259 0.31495357]]
[[-0.79343507 -0.45989196 -0.3986981 ]
[-0.08361101 0.73119258 -0.67702778]
[ 0.60288472 -0.50384203 -0.61860587]]
[[-0.24647287 0.67082793 0.69945766]
[ 0.81370207 -0.24879919 0.5253455 ]
[ 0.52644094 0.69863356 -0.48453183]]
[[ 0.75167902 0.6026576 0.26792249]
[-0.62726116 0.7787669 0.00809659]
[-0.2037697 -0.17414341 0.96340645]]]
Now we evaluate the two kernel matrices.
[8]:
kernel_mat_32 = kernel.K(params_32, xs, xs)
kernel_mat_inf = kernel.K(params_inf, xs, xs)
Finally, we visualize these matrices using imshow
.
[9]:
# find common range of values
minmin = np.min([np.min(kernel_mat_32), np.min(kernel_mat_inf)])
maxmax = np.max([np.max(kernel_mat_32), np.max(kernel_mat_inf)])
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2)
cmap = plt.get_cmap('viridis')
ax1.imshow(kernel_mat_32, vmin=minmin, vmax=maxmax, cmap=cmap)
ax1.set_title('k_32')
ax1.set_axis_off()
ax2.imshow(kernel_mat_inf, vmin=minmin, vmax=maxmax, cmap=cmap)
ax2.set_title('k_inf')
ax2.set_axis_off()
# add space for color bar
fig.subplots_adjust(right=0.85)
cbar_ax = fig.add_axes([0.88, 0.25, 0.02, 0.5])
# add colorbar
sm = plt.cm.ScalarMappable(cmap=cmap,
norm=plt.Normalize(vmin=minmin, vmax=maxmax))
fig.colorbar(sm, cax=cbar_ax)
plt.show()
Visualize Kernels¶
It is hard to visualize functions on \(\mathrm{SO}(n)\). Why so? Well, we have \(\dim \left(\mathrm{SO}(n)\right) = n(n-1)/2\), thus even \(\dim \left(\mathrm{SO}(2)\right) = 3\) is greater than \(2\). Functions are only easy to visualize on domains with dimension not higher than \(2\).
To get some understanding of how kernels and samples of Gaussian processes on \(\mathrm{SO}(n)\) look like, we will do something like visualizing \(f: \mathbb{R}^n \to \mathbb{R}\) by examining the plots of \(\alpha \to f(\alpha \mathbf{v})\) for random vectors \(\mathbf{v}\). In our case, we will take a random matrix \(\mathbf{U} \in \mathrm{SO}(n)\) and visualize functions \(\alpha \to f(\mathbf{E}_{\mathbf{U}}(\alpha))\), \(\alpha \in [0, 2 \pi)\) where \(\mathbf{E}_{\mathbf{U}}(\alpha) = \mathbf{U}^{\top} \mathbf{E}(\alpha) \mathbf{U}\) and
is the standard embedding of the one-dimensional rotation group \(\mathrm{SO}(2)\) (which can be regarded as the circle \(\mathbb{T}\)) into \(\mathrm{SO}(n)\). Intuitively, this corresponds to rotating by angle \(\alpha\) around a random axis in \(\mathbb{R}^n\). To better reflect the structure of \(\alpha\), we will actually treat it as a point on the circle, rather than an angle.
We will plot the functions \(k_{\nu, \kappa}(\mathbf{I}, \mathbf{E}_{\mathbf{U}}(\alpha))\).
In practice, we visualize \(k_{\nu, \kappa}(\) base_point
\(, x)\) for $x \in `$ ``other_points1` or $x \in `$ ``other_points2`. The base_point
is simply the identity matrix \(\mathbf{I}\). The other_points1
and other_points2
are defined by the _NUM_ANGLES
uniform discretization of the unit circle.
We define base_point
, other_points
, and other_points2
in the next cell.
[10]:
# define discretization
_NUM_ANGLES = 128
key, U = so.random(key, 2)
U1 = U[0, :]
U2 = U[1, :]
# generate a grid on [0, 2 \pi)
angles = np.linspace(0, 2*np.pi, num=_NUM_ANGLES)
embedding = np.broadcast_to(np.eye(3), (_NUM_ANGLES, 3, 3)).copy()
embedding[:, 0, 0] = np.cos(angles)
embedding[:, 0, 1] = -np.sin(angles)
embedding[:, 1, 0] = np.sin(angles)
embedding[:, 1, 1] = np.cos(angles)
base_point = embedding[[0], :, :]
def conj_batch(U, emb):
""" Compute U^T * emb[i, :] * U for all i.
:param U: [3, 3] array
:param emb: [N, 3, 3] array
:return: [N, 3, 3] array
"""
return np.tensordot(np.tensordot(emb, U, axes=([2], [0])),
U.T, axes=([1], [1])
)
other_points1, other_points2 = conj_batch(U1, embedding), conj_batch(U2, embedding)
The next cell evaluates \(k_{\nu, \kappa}(\) base_point
\(, x)\) for $x \in `$ ``other_points1` and $x \in `$ ``other_points2`, for \(\nu\) either \(3/2\) or \(\infty\).
[11]:
kernel_vals_32_1 = kernel.K(params_32, base_point, other_points1)
kernel_vals_32_2 = kernel.K(params_32, base_point, other_points2)
kernel_vals_inf_1 = kernel.K(params_inf, base_point, other_points1)
kernel_vals_inf_2 = kernel.K(params_inf, base_point, other_points2)
Finally, we are ready to plot the results.
[12]:
x_circle = np.cos(angles)
y_circle = np.sin(angles)
z_circle = np.zeros_like(angles) # z=0 for the circle
# Red ball position
red_ball_position = (x_circle[0], y_circle[0], z_circle[0])
def plot_kernel(ax, values, title):
# Draw the unit circle
ax.plot(x_circle, y_circle, z_circle, linestyle='dashed', color='gray')
# Draw the red ball
ax.scatter(*red_ball_position, color="r", s=50)
# Draw the vertical dashed line
ax.plot([1, 1], [0, 0], np.linspace(0, values[0, 0], 2), linestyle='dashed', color='gray')
# Plot the new function evaluated only on the unit circle
ax.plot(x_circle, y_circle, values[0, :], color='b')
# Set the viewing angle for better visualization
ax.view_init(elev=20., azim=180+30)
ax.set_title(title)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(figsize=(12.8, 9.6), nrows=2, ncols=2,
subplot_kw=dict(projection='3d',
computed_zorder=False))
plot_kernel(ax1, kernel_vals_32_1, r'$k_{3/2, \kappa}($base_point, other_points1$)$')
plot_kernel(ax2, kernel_vals_32_2, r'$k_{3/2, \kappa}($base_point, other_points2$)$')
plot_kernel(ax3, kernel_vals_inf_1, r'$k_{\infty, \kappa}($base_point, other_points1$)$')
plot_kernel(ax4, kernel_vals_inf_2, r'$k_{\infty, \kappa}($base_point, other_points2$)$')
# Display the plot
plt.show()
Feature Maps and Sampling¶
Here we show how to get an approximate finite-dimensional feature map for heat and Matérn kernels on the sphere, i.e. such \(\phi\) that
This might be useful for speeding up computations. We showcase this below by showing how to efficiently sample the Gaussian process \(\mathrm{GP}(0, k)\).
For a brief theoretical introduction into feature maps, see this documentation page.
Defining a Feature Map¶
The simplest way to get an approximate finite-dimensional feature map is to use the default_feature_map
function from geometric_kernels.kernels
. It has an optional keyword argument num
which determines the number of features, the \(M\) above. Below we rely on the default value of num
.
[13]:
from geometric_kernels.kernels import default_feature_map
feature_map = default_feature_map(kernel=kernel)
The resulting feature_map
is a function that takes the array of inputs and parameters of the kernel. There is also an optional parameter normalize
that determines if \(\langle \phi(x), \phi(x) \rangle_{\mathbb{R}^M} \approx 1\) or not. For the (hyper)sphere, normalize
follows the standard behavior of MaternKarhunenLoeveKernel
, being True
by default.
feature_map
outputs a tuple. Its second element is \(\phi(x)\) evaluated at all inputs \(x\). Its first element is either None
for determinstic feature maps, or contains the updated key
for randomized feature maps which take key
as a keyword argument. For default_feature_map
on a SpecialOrthogonal
space, the first element is key
since the feature map is random.
In the next cell, we evaluate the feature map at random points, using params_32
as kernel parameters. We check the basic property of the feature map: \(k(x, x') \approx \langle \phi(x), \phi(x') \rangle_{\mathbb{R}^M}\).
[14]:
# xs are random points from above
key, embedding = feature_map(xs, params_32, key=key)
print('xs (shape = %s):\n%s' % (xs.shape, xs))
print('')
print('emedding (shape = %s):\n%s' % (embedding.shape, embedding))
kernel_mat_32 = kernel.K(params_32, xs, xs)
kernel_mat_32_alt = np.matmul(embedding, embedding.T)
print('')
print('||k(xs, xs) - phi(xs) * phi(xs)^T|| =', np.linalg.norm(kernel_mat_32 - kernel_mat_32_alt))
xs (shape = (10, 3, 3)):
[[[-0.83118121 -0.53251727 0.15988481]
[-0.05989292 -0.20013482 -0.97793604]
[ 0.55276635 -0.82241803 0.13445429]]
[[-0.24549041 -0.96836999 -0.04465438]
[-0.07519823 -0.02690221 0.99680565]
[-0.96647798 0.24806416 -0.06621548]]
[[-0.7316939 0.30139255 0.61138087]
[-0.32058557 0.63938043 -0.69886877]
[-0.6015388 -0.7073579 -0.37120866]]
[[-0.64400808 -0.75505391 -0.12307395]
[-0.75449949 0.60029278 0.2652906 ]
[-0.1264283 0.26370852 -0.95628118]]
[[ 0.16603267 -0.8784055 0.44814834]
[ 0.98529484 0.12918269 -0.11182982]
[ 0.04033892 0.46012565 0.88693695]]
[[-0.1476533 0.58885089 0.79464025]
[ 0.06351094 0.80743353 -0.58653001]
[-0.9869979 -0.03613474 -0.1566187 ]]
[[ 0.76324266 -0.64597696 0.01320642]
[ 0.2130557 0.23233043 -0.94901519]
[ 0.60997369 0.72714259 0.31495357]]
[[-0.79343507 -0.45989196 -0.3986981 ]
[-0.08361101 0.73119258 -0.67702778]
[ 0.60288472 -0.50384203 -0.61860587]]
[[-0.24647287 0.67082793 0.69945766]
[ 0.81370207 -0.24879919 0.5253455 ]
[ 0.52644094 0.69863356 -0.48453183]]
[[ 0.75167902 0.6026576 0.26792249]
[-0.62726116 0.7787669 0.00809659]
[-0.2037697 -0.17414341 0.96340645]]]
emedding (shape = (10, 60000)):
[[ 0.00311364 -0.00034424 -0.00806244 ... 0.00067188 0.00035029
-0.00050742]
[ 0.00326345 -0.00158993 -0.00669235 ... 0.00054075 -0.00202368
-0.00101374]
[ 0.00324336 -0.00663689 0.00528191 ... 0.00065511 0.00066912
-0.00075673]
...
[ 0.00307407 -0.00409234 -0.00107197 ... 0.00077925 0.00054604
-0.00066446]
[ 0.0032015 0.00351986 -0.01087902 ... -0.00049863 -0.00064331
0.00060825]
[ 0.00310574 0.00242796 -0.01031697 ... -0.00082779 -0.0002594
0.00067754]]
||k(xs, xs) - phi(xs) * phi(xs)^T|| = 0.061406706061729016
Efficient Sampling using Feature Maps¶
GeometricKernels provides a simple tool to efficiently sample (without incurring cubic costs) the Gaussian process \(f \sim \mathrm{GP}(0, k)\), based on an approximate finite-dimensional feature map \(\phi\). The underlying machinery is briefly discussed in this documentation page.
The function sampler
from geometric_kernels.sampling
takes in a feature map and, optionally, the keyword argument s
that specifies the number of samples to generate. It returns a function we name sample_paths
. Since we are going to compute each of the two samples at two different sets of inputs, other_points1
and other_points2
, we make sure the randomness if fixed by using the make_deterministic
function.
sample_paths
operates much like feature_map
above: it takes in the points where to evaluate the samples and kernel parameters. Additionally, it takes in the keyword argument key
that specifies randomness in the JAX style. However, in our specific case, this keyword argument is not needed as it is automatically supplied by the make_deterministic
wrapper. sample_paths
returns a tuple. Its first element is the updated key
. Its second element is an array containing the
value of samples evaluated at the input points.
[15]:
from geometric_kernels.sampling import sampler
from geometric_kernels.utils.utils import make_deterministic
# introduce random state for reproducibility (optional)
# `key` is jax's terminology
key = np.random.RandomState(seed=1234)
sample_paths = make_deterministic(sampler(feature_map, s=2), key)
# new random state is returned along with the samples
key, samples = sample_paths(xs, params_32)
print('Two samples evaluated at the xs are:')
print(samples)
Two samples evaluated at the xs are:
[[ 0.07279943 -0.17314709]
[-0.91676587 0.45252729]
[ 1.22318893 1.11679705]
[-1.49580416 1.86520088]
[-0.04944704 -0.38410612]
[-0.73135949 -0.48286388]
[-0.2092536 -0.20690319]
[ 0.70462855 1.40441825]
[-0.03731635 -0.68776107]
[ 0.70267061 -0.09092921]]
Visualizing Samples¶
Here we visualize samples as functions on the sphere.
[16]:
key, samples_other_points1 = sample_paths(other_points1, params_32)
key, samples_other_points2 = sample_paths(other_points2, params_32)
sample1_other_points1 = samples_other_points1[:, 0]
sample2_other_points1 = samples_other_points1[:, 1]
sample1_other_points2 = samples_other_points2[:, 0]
sample2_other_points2 = samples_other_points2[:, 1]
x_circle = np.cos(angles)
y_circle = np.sin(angles)
z_circle = np.zeros_like(angles) # z=0 for the circle
def plot_sample(ax, values, title):
# Draw the unit circle
ax.plot(x_circle, y_circle, z_circle, linestyle='dashed', color='gray')
# Plot the new function evaluated only on the unit circle
ax.plot(x_circle, y_circle, values, color='b')
# Set the viewing angle for better visualization
ax.view_init(elev=20., azim=180+30)
ax.set_title(title)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(figsize=(12.8, 9.6), nrows=2, ncols=2,
subplot_kw=dict(projection='3d',
computed_zorder=False))
plot_sample(ax1, samples_other_points1[:, 0], r'Sample #1: $f(\cdot)$ for $\cdot$ in other_points1, $f \sim \mathrm{GP}(0, k_{3/2, \kappa})$')
plot_sample(ax2, samples_other_points2[:, 0], r'Sample #1: $f(\cdot)$ for $\cdot$ in other_points2, $f \sim \mathrm{GP}(0, k_{3/2, \kappa})$')
plot_sample(ax3, samples_other_points1[:, 1], r'Sample #2: $f(\cdot)$ for $\cdot$ in other_points1, $f \sim \mathrm{GP}(0, k_{3/2, \kappa})$')
plot_sample(ax4, samples_other_points2[:, 1], r'Sample #2: $f(\cdot)$ for $\cdot$ in other_points2, $f \sim \mathrm{GP}(0, k_{3/2, \kappa})$')
# Display the plot
plt.show()
Citation¶
If you are using Lie groups and GeometricKernels, please consider citing
@article{mostowsky2024,
title = {The GeometricKernels Package: Heat and Matérn Kernels for Geometric Learning on Manifolds, Meshes, and Graphs},
author = {Peter Mostowsky and Vincent Dutordoir and Iskander Azangulov and Noémie Jaquier and Michael John Hutchinson and Aditya Ravuri and Leonel Rozo and Alexander Terenin and Viacheslav Borovitskiy},
year = {2024},
journal = {arXiv:2407.08086},
}
@article{azangulov2022,
title={Stationary kernels and Gaussian processes on Lie groups and their homogeneous spaces I: the compact case},
author={Azangulov, Iskander and Smolensky, Andrei and Terenin, Alexander and Borovitskiy, Viacheslav},
journal={arXiv preprint arXiv:2208.14960},
year={2022}
}
[ ]: