## Kernel => RKHS + Feature Map

Dec 15 2016

Let the input vector space be $$\mathcal{X} = \mathbb{R}^2$$, and define kernel $$k: \mathcal{X} \times \mathcal{X} \to \mathbb{R}, k(x,y)=\|x-y\|_2^2 = (x_1-y_1)^2+(x_2-y_2)^2$$. Let's find the space of functions $$\mathcal{H}$$ on $$\mathcal{X}$$, for which $$k$$ is the reproducing kernel. The corresponding feature map is then $$\phi: \mathcal{X}\to \mathcal{H}, \phi(x) = k(\cdot, x) = \|\cdot - x\|_2^2$$

To construct a unique RKHS given kernel, I copied and pasted the standard procedure (i.e. Moore-Aronszajn theorem) here: For all $$x\in \mathcal{X}$$, define the function

$$k_x = k(\cdot, x)$$

Let $$\mathcal{H}_0$$ be the linear span of $$k_x$$, i.e. $$\mathcal{H}_0$$ consists of functions of the form

$$\forall i, x_i \in \mathcal{X}, f(\cdot) = \sum_i a_i k_{x_i}(\cdot)$$

Define an inner product on $$\mathcal{H}_0$$ by

$$\langle f, g \rangle = \langle \sum_i^m a_i k_{x_i}, \sum_j^n b_j k_{x_j}\rangle = \sum_i^m \sum_j^n a_i b_j k(x_i, x_j) \tag{*} \label{eq:inner}$$

Let $$\mathcal{H}$$ be the completion of $$\mathcal{H}_0$$ with respect to this inner product. Then $$\mathcal{H}$$ consists of functions of the form

$$f(\cdot) = \sum_i^\infty a_i k_{x_i}(\cdot)$$

Ok that's all very cool, but what does an element in $$\mathcal{H}$$ actually look like for our kernel $$k(x,y)=\|x-y\|_2^2 = (x_1-y_1)^2+(x_2-y_2)^2$$? Consider a concrete $$\tilde y = (a,b) \in \mathcal{X}$$, say $$(a,b)=(3,4)$$, then its representer $$k_{\tilde y} = k_{(a,b)} \in \mathcal{H}$$ is the function such that $$k_{(a,b)} (x) = x_1^2 + x_2^2 - 2ax_1 - 2bx_2 + a^2 + b^2$$. An element in $$\mathcal{H_0}$$ then takes the form of a linear combination of functions $$k_{(a,b)}$$, as we vary $$(a,b) \in \mathcal{X}$$. An element $$f$$ in $$\mathcal{H}$$ is some linear combination of infinitely many such $$k_{(a,b)}$$, and it'll look something like $$f(x) = ax_1^2 + ax_2^2 + bx_1 + cx_2 + d$$, where $$a,b,c,d\in\mathbb{R}$$ have some funky convergent property.

For our earlier example $$k_{\tilde y} = k_{(3,4)}$$, you may be tempted to write

$$k_{\tilde y} (x) = \langle k_{\tilde y}, k_x \rangle_{\mathcal{H}} = \begin{bmatrix} 1 & 1 & -6 & -8 & 25 \end{bmatrix} \begin{bmatrix} x_1^2 \\ x_2^2 \\ x_1 \\ x_2 \\ 1 \end{bmatrix}$$

then identify $$k_{\tilde y}$$ with the vector $$(1,1,-6,-8,25)$$ and conclude that the feature map defined by $$k$$ is $$\phi: \mathbb{R}^2 \to \mathbb{R}^5, \phi(x) = k_x = (x_1^2, x_2^2, x_1, x_2, 1)$$. This is incorrect, as our RKHS $$\mathcal{H} \neq \mathbb{R}^5$$, and

$$\begin{bmatrix} 1 & 1 & -6 & -8 & 25 \end{bmatrix} \begin{bmatrix} x_1^2 \\ x_2^2 \\ x_1 \\ x_2 \\ 1 \end{bmatrix}$$

is not the inner product \eqref{eq:inner} on $${\mathcal{H}}$$ (although it is the inner product on $$\mathbb{R}^5$$). If you are not convinced, take the inner product (on $$\mathbb{R}^5$$) between $$(x_1^2, x_2^2, x_1, x_2, 1)$$ and $$(y_1^2, y_2^2, y_1, y_2, 1)$$; this is certainly not equal to $$k(x,y)$$.

To write kernel evaluation as inner product in the feature space $$\mathcal{H}$$ defined by the kernel, we have to follow the definition \eqref{eq:inner}:

$$k(x, y) = \langle k_{x}, k_{y} \rangle_{\mathcal{H}}$$

Good resources on RKHS: