Kernel => RKHS + Feature Map

Dec 15 2016

Let the input vector space be \(\mathcal{X} = \mathbb{R}^2\), and define kernel \(k: \mathcal{X} \times \mathcal{X} \to \mathbb{R}, k(x,y)=\|x-y\|_2^2 = (x_1-y_1)^2+(x_2-y_2)^2\). Let's find the space of functions \(\mathcal{H}\) on \(\mathcal{X}\), for which \(k\) is the reproducing kernel. The corresponding feature map is then \(\phi: \mathcal{X}\to \mathcal{H}, \phi(x) = k(\cdot, x) = \|\cdot - x\|_2^2\)

To construct a unique RKHS given kernel, I copied and pasted the standard procedure (i.e. Moore-Aronszajn theorem) here: For all \(x\in \mathcal{X}\), define the function

$$k_x = k(\cdot, x)$$

Let \(\mathcal{H}_0\) be the linear span of \(k_x\), i.e. \(\mathcal{H}_0\) consists of functions of the form

$$\forall i, x_i \in \mathcal{X}, f(\cdot) = \sum_i a_i k_{x_i}(\cdot)$$

Define an inner product on \(\mathcal{H}_0\) by

$$\langle f, g \rangle = \langle \sum_i^m a_i k_{x_i}, \sum_j^n b_j k_{x_j}\rangle = \sum_i^m \sum_j^n a_i b_j k(x_i, x_j) \tag{*} \label{eq:inner}$$

Let \(\mathcal{H}\) be the completion of \(\mathcal{H}_0\) with respect to this inner product. Then \(\mathcal{H}\) consists of functions of the form

$$f(\cdot) = \sum_i^\infty a_i k_{x_i}(\cdot)$$

Ok that's all very cool, but what does an element in \(\mathcal{H}\) actually look like for our kernel \(k(x,y)=\|x-y\|_2^2 = (x_1-y_1)^2+(x_2-y_2)^2\)? Consider a concrete \(\tilde y = (a,b) \in \mathcal{X}\), say \((a,b)=(3,4)\), then its representer \(k_{\tilde y} = k_{(a,b)} \in \mathcal{H}\) is the function such that \(k_{(a,b)} (x) = x_1^2 + x_2^2 - 2ax_1 - 2bx_2 + a^2 + b^2\). An element in \(\mathcal{H_0}\) then takes the form of a linear combination of functions \(k_{(a,b)}\), as we vary \((a,b) \in \mathcal{X}\). An element \(f\) in \(\mathcal{H}\) is some linear combination of infinitely many such \(k_{(a,b)}\), and it'll look something like \(f(x) = ax_1^2 + ax_2^2 + bx_1 + cx_2 + d\), where \(a,b,c,d\in\mathbb{R}\) have some funky convergent property.

For our earlier example \(k_{\tilde y} = k_{(3,4)}\), you may be tempted to write

$$k_{\tilde y} (x) = \langle k_{\tilde y}, k_x \rangle_{\mathcal{H}} = \begin{bmatrix} 1 & 1 & -6 & -8 & 25 \end{bmatrix} \begin{bmatrix} x_1^2 \\ x_2^2 \\ x_1 \\ x_2 \\ 1 \end{bmatrix}$$

then identify \(k_{\tilde y}\) with the vector \((1,1,-6,-8,25)\) and conclude that the feature map defined by \(k\) is \(\phi: \mathbb{R}^2 \to \mathbb{R}^5, \phi(x) = k_x = (x_1^2, x_2^2, x_1, x_2, 1)\). This is incorrect, as our RKHS \(\mathcal{H} \neq \mathbb{R}^5\), and

$$ \begin{bmatrix} 1 & 1 & -6 & -8 & 25 \end{bmatrix} \begin{bmatrix} x_1^2 \\ x_2^2 \\ x_1 \\ x_2 \\ 1 \end{bmatrix} $$

is not the inner product \eqref{eq:inner} on \({\mathcal{H}}\) (although it is the inner product on \(\mathbb{R}^5\)). If you are not convinced, take the inner product (on \(\mathbb{R}^5\)) between \((x_1^2, x_2^2, x_1, x_2, 1)\) and \((y_1^2, y_2^2, y_1, y_2, 1)\); this is certainly not equal to \(k(x,y)\).

To write kernel evaluation as inner product in the feature space \(\mathcal{H}\) defined by the kernel, we have to follow the definition \eqref{eq:inner}:

$$k(x, y) = \langle k_{x}, k_{y} \rangle_{\mathcal{H}}$$

Good resources on RKHS: