ResNet

Lets next discuss a very famous NN architecture called the ResNet. One of the key questions asked in deep learning is "Would deeper networks result in higher accuracy ?" Intuitively, this may make sense but practically it is observed that the training accuracy starts to degrade with deeper networks. This is surprising as it is not caused by overfitting because we see the degradation in training error and not test error. Infact, constructing deeper networks from their shallower counterparts by just adding identity mappings also shows a similar degradation in test error. The degradation problem suggests that solvers might have difficulty in approximating identity mappings from multiple nonlinear layers. One possible causes is the problem of vanishing/exploding gradients.

Resnet addresses the problem by adding residual connections or shortcut connections. Adding these connections makes it easier to learn identity mapping as:
a[l+2] = g(z[l+2] + a[l]), where a[l] is the skip connection to layer l+2
       = g(w[l+2]a[l+1] + b[l+2] + a[l])
If we are using, L2 regularization it will shrink the weights to 0 and thus a[l+2] = a[l] Thus it is easy to learn identity functions using residual connections. Thus it doesn't hurt performance. But assuming the hidden units learn something, it should improve performance.

References:
Resnet paper cited more than 20k times!

Comments

Popular posts from this blog

MinHash, Bloom Filters and LSH

Perceptron Algorithm

Logistic Regression