Kernel SVMs
Lets now apply the idea of a dual form representation to the SVM problem to have a kernel SVM. The dual representation helps us code the problem in the form of kernels and thus is very useful to solve for non-linear problem settings. Recall that in the original formulation of the SVM, the optimization problem is as follows:
In this example, we used an RBF kernel which is infinite dimensional. Other popularly used kernels is the polynomial K(x,y) = (x.y +1)^p. The next question arises is what are the possible valid kernels ? A necessary and sufficient condition for a valid kernel is that the Gram matrix K should be positive semidefinite.
minimize 1/2|w|^2 subject to y_t(w'x + b) - 1 >= 0This can be written in a dual form using Laggrangian and KKT conditions as follows:
J(w, b; α) minimize 1/2|w|^2 - ∑(t=1:n) αt * (y_t(w'x + b) - 1) dJ/db = 0 => ∑(t=1:n) αt * y_t = 0 dJ/dw = 0 => w = ∑(t=1:n) αt * y_t * x_t = 0 Substituting it back, we get, min ∑(i=1:n) αi - 1/2 ∑(i=1:n)∑(j=1:n)αi αj*y_i*y_j* xi'*xj subject to ∑(i=1:n)αi = 0This optimization is still a quadratically constrained optimization and the matlab code for classification using the RBF kernel is shown as follows:
%Generate Data n = 50; r1 = sqrt(rand(n,1)); % radius t1 = 2*pi*rand(n,1); % angle d1 = [r1.*cos(t1) r1.*sin(t1)]; % points r2 = sqrt(3*rand(n,1)+1); % radius t2 = 2*pi*rand(n,1); % angle d2 = [r2.*cos(t2), r2.*sin(t2)]; % points x = [d1; d2]; y = [ones(n,1); -ones(n,1)]; n = 2*n; %Construct RBF Kernel K = zeros(n); sig2 = 1; for i = 1:n for j = i:n xi = x(i,:) - x(j,:); K(i,j) = exp( (-1/(2*sig2)) * (xi*xi') ); K(j,i) = K(i,j); end end H = (y*y').*K; f = -ones(n,1); Aeq = y'; beq = 0; %Run quadprog qp_opts = optimset('LargeScale','Off','display','off'); [alpha, fval, exitflag, output, lambda] = ... quadprog(H,f,[],[],Aeq,beq, zeros(n,1), ... Inf*ones(n,1), ones(n,1), ... qp_opts); % Find the support vectors % calculate the parameters of the separating line from the support % vectors. svIndex = find(alpha > sqrt(eps)); sv = x(svIndex,:); alphaHat = y(svIndex).*alpha(svIndex); % Calculate the bias by applying the indicator function to the support % vector with largest alpha. [maxAlpha,maxPos] = max(alpha); bias = y(maxPos) - sum(alphaHat.*K(svIndex,maxPos)); figure; hold on; plot(d1(:,1),d1(:,2),'r+') plot(d2(:,1),d2(:,2),'b*') plot(sv(:,1),sv(:,2),'ko'); axis equal % plot decision boundary [d1,d2] = meshgrid(-2:.1:2, -2:.1:2); Xp = [reshape(d1,numel(d1),1) reshape(d2,numel(d2),1)]; m = length(Xp); Z = zeros(m,1); for i = 1:m xi = repmat(Xp(i,:),n,1) - x; xd = diag(xi * xi')'; Kx = exp( (-1/(2*sig2)) * xd ); Z(i) = Kx(svIndex) * alphaHat +bias; end contour(d1,d2,reshape(Z,size(d1,1),size(d1,2)), [0 0] ,'k');Finally, the figure showing the separation between the classes is as follows: The points that are circled are the support vectors. Note that a linear SVM would fail in such a case and using the kernel functions, we are able to get a beautiful separation in the data.
In this example, we used an RBF kernel which is infinite dimensional. Other popularly used kernels is the polynomial K(x,y) = (x.y +1)^p. The next question arises is what are the possible valid kernels ? A necessary and sufficient condition for a valid kernel is that the Gram matrix K should be positive semidefinite.
Comments
Post a Comment