Kernel Methods

June 04, 2012

One of the popular methods for handling non-linear data using linear models is kernel methods. A kernel function essentially maps the input data into a higher dimensional space. Note that the models are still linear in the parameter space and its only the data which is mapped to a higher dimensional space. Lets consider the example of linear regression. The least squares method fits a linear function to given data by minimizing the sum of the squares of the errors made at every point. It has a closed form solution given by:

w = inv(X'*X)*X'*y

This solution works perfectly fine, in case the relationships in the data are linear as shown by the following example.

However, consider the following example in which the relationship in the data is non-linear. The linear model fits a straight line to the data which does not capture the real relationship in the data.

The least squares model can be reformulated in a dual representation leading to a kernel representation of the problem as follows: (See Bishop, chapter 6 for the complete derivation.)

y(xn) = k(xn, x)'* inv(K + lambda*I)*t

We can write the matlab code as follows:

sig = 0.0001;

% create rbf kernel
K = zeros(n,n);
for i=1:n
    K(i,:) = exp(-0.5*sig*(x(i) - x).^2)';
end

% compute prediction
lambda = 1;
alpha = (K + lambda*eye(n))\y;
xr = [min(x) max(x)];
yr = [min(y) max(y)];
xp = xr(1):1:xr(2); %points you want to predict

m = length(xp);
yp = zeros(m,1);
for i = 1:m
    ki = exp(-0.5*(xp(i) - x).^2/sig)';
    yp(i) = ki*alpha;
end
plot(xp,yp,'g', 'LineWidth', 2);

Finally the image showing the linear regression and the kernel regression is as follows. As you can see, the kernel regression is able to fit a non-linear function to the data.

The Nadaraya Watson model is also one of the popularly used kernel regression methods. The main goal of the method is to do a weighted similarity in the neighbourhood to predict the value at a point. It is defined as follows:

hi = K(x, xi) * y / sum(K(x, xi)

Here, K(x, xi) represents the similarity of the point xi with all the points x. Effectively, we are predicting the y at a new point by taking the weighted sum of the y in the neighbourhood space and giving higher weights to points that are more similar to xi. In a sense it is similar to KNN. Using this model, we generate the following figure:

My Learnings in ML

Kernel Methods

Comments

Post a Comment

Popular posts from this blog

SVM

kmeans++

MinHash, Bloom Filters and LSH