research:covariance_function [Xiaoke Yang]

The role of covariance function <latex> $$ k(x_i,x_j) $$ Let's take two random variables $y_1$ and $y_2$, corresponding to $x_1$ and $x_2$ for example.

If $y_1$ and $y_2$ are independent, then we can sample from their distributions separately, which are $y_1 \sim \mathcal{N}(0, k(x_1,x_1))$, $y_2 \sim \mathcal{N}(0, k(x_2,x_2))$.

If $y_1$ and $y_2$ are not independent, i.e., their covariance $k(x_1, x_2)$ are not zero, then the sample process is a bit different. For $y_1$, we can still sample from its distribution $y_1 \sim \mathcal{N}(0, k(x_1,x_1))$ freely. But when we sample $y_2$, its distribution has been changed by the fact that $y_1$ has already been sampled, i.e. the distribution of $y_2$ now is a conditional distribution $p(y_2|y_1)$. It is also a Gaussian distribution, but the mean and variance are all changed, which are $$ \mathbb E[y_2] = k(x_2,x_1)K^{-1}y_1 $$ and $$ \sigma^2[y_2] = k(x_2,x_2) - k(x_2,x_1)K^{-1}k(x_1,x_2) $$ Every time we sample from $x_1$, we may get different values, which will affect our subsequent samples of $x_2$. If we sample multiple times with a larger number of $x$, a bunch of function curves will appear.

This explains why the covariance function implies a distribution over functions.

</latex>