Perceptron Algorithm

The perceptron algorithm is a linear classifier using a linear function (w'x + b) to classify the points into two classes. The loss function for the perceptron algorithm just counts the points misclassified and then incrementally adjusts the weight vector and thus the decision boundary to appropriately classify the two classes. Perhaps this is the simplest algorithm one can think of to update the weight vector and is shown as follows:
w(k+1) = w(k) + yt*xt if yt ≠ f(xt; w)
i.e. we cycle through the training points and adjust the weight vector only if the points are misclassified. Note that our class labels are -1 and 1 and hence if the points are misclassified then y_t* f(xt; w) will be negative. In order to understand why the seemingly simple update rule works and leads to a convergence, lets closely examine it. One of the underlying assumption in the perceptron model is that the points are linearly separated hence there exists an w* such that y_t * f(xt; w*) ≥ γ for all t=1, 2,...n. The proof of convergence is based upon examining the similarity between w* and w(k) and it proceeds by saying that the similarity increases in every iteration. The proof is as follows:
w*' * w(k) = w*'(w(k-) + yt*xt) ≥ kγ  (as y_t * f(xt; w*) ≥ γ)
|w(k)|^2 = |w(k-1) + yt * xt|^2
         = |w(k-1)|^2 + 2*yt*w(k-1) + |xt|^2
         ≤ kR^2 (as yt*w(k-1) < 0 and |xt| ≤ R)
Now,
cos(w*, w(k)) = w*' * w(k) / |w*| |w(k)|
              ≥ kγ / √ kR^2 |w*|
As cosine is bounded by 1, we get,
k ≤ R^2 |w*|^2/ γ^2
This tells us that the perceptron algorithm is guaranteed to converge in a finite number of steps if the data is linearly separable. Not only that but the convergence is inversely proportional to γ which is the geometric margin between the two classes. Smaller the margin, more difficult will be the convergence and vice-versa. Further, the equation provides generalization guarantees if we assume that the points can be classified using a linear classifier. The matlab code for the perceptron algorithm is follows:
repeat
for i=1:n
    xi = X(i,:);
    yi = y(i);
        
    f = sign(yi * (xi*w));
    if f == -1
        w = w + (yi*xi)';
    end
end  
until convergence=true
Finally, the figure showing the converged decision boundary is shown as follows

Helpful Links
Chapters 1, 2 MIT Open Course ware Lecture notes on ML - Jaakkola

Comments

Popular posts from this blog

MinHash, Bloom Filters and LSH

Logistic Regression