Bayesian Curve Fitting and Gaussian Processes
So far, we have examined least squares regression in terms of empirical error minimization. We can also examine the same thing from a probabilistic perspective. Let us consider the input values to be x1, ..., xn and the target values to be y1, y2, ..., yn. We fit a function f(x; w) to the data and probabilistically, we can write it as p(t_n|x_n, w, inv(β)) = N(t_n | y(x_n,w), inv(β)) Hence, p(t|X, w, inv(β)) = Π N(t_i | y(x_i,w), inv(β)) Taking the log of the likelihood function, we have, ln p(t|X, w, inv(β)) = -β/2 Σ (t_i - y(x_i, w))^2 -N/2*ln(2Π) + N/2ln(β) Hence, we see that minimizing the negative log of the likelihood is equivalent to minimizing the sum of squares error which is our least squares formulation. This is also known as the Maximum Likelihood or ML formulation. Further, we can improve our estimates of w by considering a prior on w, of the form p(w|α) = N(w|0, inv(α)) = (α / 2Π)^(M+1)/2 exp...