Transfer learning & Multi-task learning

In transfer learning, you learn from a sequential process, i.e. learn from task A and transfer it to task B. However, in multi-task learning, you learn from multiple tasks simultaneously. In transfer learning, learn the NN for a big task. Then, for a smaller task just retrain the weights of the last layer only (or last 1-2 layers). You could also retrain all the parameters of the NN and in that case, it is called as pre-training because you are initializing the weights of the NN from a pre-trained model. When you are updating the weights of the model, it is also called as fine-tuning. A couple of ways in which fine tuning works is:
  1. Truncate the last layer of the NN and replace it with the new layer to learn the new output.
  2. Use a smaller learning rate to train the network.
  3. Freeze the weights of the first few layers of the NN.
When does transfer learning make sense ?
  1. You have a lot of data from the task that you are originally learning from and small amount of data for the problem you are transferring to. It would not make sense when the opposite is true.
In multitask learning, you learn the multiple tasks simultaneously. Typically, in Deep Learning, multi-task learning is done with hard or soft parameter sharing. Hard parameter sharing is generally applied by sharing the hidden layers between tasks while keeping task specific output layers. In soft parameter sharing on the other hand, each task has its own model with its own parameters. The distance between the parameters of the model is then regularized in order to encourage the parameters to be similar. Even though, hard parameter sharing is easy and is widely used in practice, it quickly breaks down if the tasks are not related. The following fig from Sebastian Ruder clearly explains hard parameter sharing.
When does Multi-task learning make sense?
  1. Training on tasks that could benefit from shared features
  2. Usually the amount of data from each task is quite similar. (In transfer learning, learn from A and transfer to B)
  3. Can train a big enough NN to do well on all the tasks.(Rich Caruana)
In practice Multi-task learning used much less than transfer learning (Source: Andrew Ng, 2017).

References:
  1. http://ruder.io/multi-task/

Comments

Popular posts from this blog

MinHash, Bloom Filters and LSH

Perceptron Algorithm

Logistic Regression