Posts

Showing posts from May, 2019

ResNet

Lets next discuss a very famous NN architecture called the ResNet. One of the key questions asked in deep learning is " Would deeper networks result in higher accuracy ? " Intuitively, this may make sense but practically it is observed that the training accuracy starts to degrade with deeper networks. This is surprising as it is not caused by overfitting because we see the degradation in training error and not test error. Infact, constructing deeper networks from their shallower counterparts by just adding identity mappings also shows a similar degradation in test error. The degradation problem suggests that solvers might have difficulty in approximating identity mappings from multiple nonlinear layers. One possible causes is the problem of vanishing/exploding gradients. Resnet addresses the problem by adding residual connections or shortcut connections. Adding these connections makes it easier to learn identity mapping as: a[l+2] = g(z[l+2] + a[l]), where a[l] is the ski