Kernel => RKHS + Feature Map

Let the input vector space be $\mathcal{X} = \mathbb{R}^2$, and define kernel $k: \mathcal{X} \times \mathcal{X} \to \mathbb{R}, k(x,y)=\|x-y\|_2^2 = (x_1-y_1)^2+(x_2-y_2)^2$. Let's find the space of functions $\mathcal{H}$ on $\mathcal{X}$, for which $k$ is the reproducing kernel. The corresponding feature map is then $\phi: \mathcal{X}\to \mathcal{H}, \phi(x) = k(\cdot, x) = \|\cdot - x\|_2^2$

To construct a unique RKHS given kernel, I copied and pasted the standard procedure (i.e. Moore-Aronszajn theorem) here: For all $x\in \mathcal{X}$, define the function

$$k_x = k(\cdot, x)$$

Let $\mathcal{H}_0$ be the linear span of $k_x$, i.e. $\mathcal{H}_0$ consists of functions of the form

$$\forall i, x_i \in \mathcal{X}, f(\cdot) = \sum_i a_i k_{x_i}(\cdot)$$

Define an inner product on $\mathcal{H}_0$ by

$$\langle f, g \rangle = \langle \sum_i^m a_i k_{x_i}, \sum_j^n b_j k_{x_j}\rangle = \sum_i^m \sum_j^n a_i b_j k(x_i, x_j) \tag{*} \label{eq:inner}$$

Let $\mathcal{H}$ be the completion of $\mathcal{H}_0$ with respect to this inner product. Then $\mathcal{H}$ consists of functions of the form

$$f(\cdot) = \sum_i^\infty a_i k_{x_i}(\cdot)$$

Ok that's all very cool, but what does an element in $\mathcal{H}$ actually look like for our kernel $k(x,y)=\|x-y\|_2^2 = (x_1-y_1)^2+(x_2-y_2)^2$? Consider a concrete $\tilde y = (a,b) \in \mathcal{X}$, say $(a,b)=(3,4)$, then its representer $k_{\tilde y} = k_{(a,b)} \in \mathcal{H}$ is the function such that $k_{(a,b)} (x) = x_1^2 + x_2^2 - 2ax_1 - 2bx_2 + a^2 + b^2$. An element in $\mathcal{H_0}$ then takes the form of a linear combination of functions $k_{(a,b)}$, as we vary $(a,b) \in \mathcal{X}$. An element $f$ in $\mathcal{H}$ is some linear combination of infinitely many such $k_{(a,b)}$, and it'll look something like $f(x) = ax_1^2 + ax_2^2 + bx_1 + cx_2 + d$, where $a,b,c,d\in\mathbb{R}$ have some funky convergent property.

For our earlier example $k_{\tilde y} = k_{(3,4)}$, you may be tempted to write

$$k_{\tilde y} (x) = \langle k_{\tilde y}, k_x \rangle_{\mathcal{H}} = \begin{bmatrix} 1 & 1 & -6 & -8 & 25 \end{bmatrix} \begin{bmatrix} x_1^2 \\ x_2^2 \\ x_1 \\ x_2 \\ 1 \end{bmatrix}$$

then identify $k_{\tilde y}$ with the vector $(1,1,-6,-8,25)$ and conclude that the feature map defined by $k$ is $\phi: \mathbb{R}^2 \to \mathbb{R}^5, \phi(x) = k_x = (x_1^2, x_2^2, x_1, x_2, 1)$. This is incorrect, as our RKHS $\mathcal{H} \neq \mathbb{R}^5$, and

$$ \begin{bmatrix} 1 & 1 & -6 & -8 & 25 \end{bmatrix} \begin{bmatrix} x_1^2 \\ x_2^2 \\ x_1 \\ x_2 \\ 1 \end{bmatrix} $$

is not the inner product \eqref{eq:inner} on ${\mathcal{H}}$ (although it is the inner product on $\mathbb{R}^5$). If you are not convinced, take the inner product (on $\mathbb{R}^5$) between $(x_1^2, x_2^2, x_1, x_2, 1)$ and $(y_1^2, y_2^2, y_1, y_2, 1)$; this is certainly not equal to $k(x,y)$.

To write kernel evaluation as inner product in the feature space $\mathcal{H}$ defined by the kernel, we have to follow the definition \eqref{eq:inner}:

$$k(x, y) = \langle k_{x}, k_{y} \rangle_{\mathcal{H}}$$

Good resources on RKHS:

intro to Hilbert space video; optionally application in Fourier analysis
Arthur Gretton's tutorials here and here (great theoretical details)