Siamese Network

Lets first talk about the problem of one shot learning. One shot learning is learning from a single training example. This problem occurs for example in an organization where you want to recognize faces and you might have only one face of the employee. Using a convnet to output a multi-class label is not a great idea as a small training set is not enough to train a classifier and it doesn't scale to new employee joining. Instead, one way to handle this problem is to learn a similarity function between two images. One way to train the neural network to learn the similarity function is via a siamese network. The Siamese network consists of two identical neural network with same parameters so that it computes a distance function between the encodings of the two input images [Ref: DeepFace]. To define an objective function, one way is to use a triplet loss. In a triplet loss, there is an anchor image along with a positive example and a negative example (A, P, N). So what is required is
| f(A) - f(P)|^2 < |f(A) - f(N)|^2 + α 
One trivial solution is to learn an encoding of all zeros for the images and another trivial solution is that images all have the same encoding. This can be avoided by having a margin parameter α. The loss function can thus be defined as a hinge loss as follows:
L = max(|f(A) - f(P)|^2 - |f(A) - f(N)|^2 + α, 0)
The effect of taking max is that so long as the objective is achieved the result is less than 0, thus the loss is 0. However, if the term is positive the loss becomes positive. In order to choose the triplets to form the training set, using random negatives is not recommended as it becomes an easy problem and thus we need to find hard triplets. [Ref: FaceNet] Another way to learn the NN is to train the NN and use a contrastive loss. A contrastive loss is computed between a pair of images. For example one way is to use logistic regression output which outputs 1 is the two images are similar and 0 if not. The final logistic unit could apply the sigmoid function to the difference of encodings.
Here is the code for the Siamese network using Keras functional APIs
from keras import layers
from keras import Input
from keras.models import Model

lstm = layers.LSTM(32)
left_input = Input(shape=(None, 128))
left_output = lstm(left_input)

right_input = Input(shape=(None, 128))
right_output = lstm(right_input)

merged = layers.concatenate([left_output, right_output], axis=-1) 
predictions = layers.Dense(1, activation='sigmoid')(merged)
model = Model([left_input, right_input], predictions)
model.fit([left_data, right_data], targets)


References:
  1. Andrew Ng's Deeplearning.ai youtube videos

Comments

  1. mistake in Eq1. Sign of alpha should be flipped.

    | f(A) - f(P)|^2 < |f(A) - f(N)|^2 - α

    ReplyDelete

Post a Comment

Popular posts from this blog

MinHash, Bloom Filters and LSH

Perceptron Algorithm

Logistic Regression