Universal sentence encoder

Ok so I might be a bit late to join the NLP bus but am so glad I boarded! Lets start with the universal sentence encoder. The universal sentence encoder was released a couple of years ago by Google and is widely appreciated by the NLP community as a quick way to generate a sentence embedding before any further processing can be done on it. One reason to not to use a naive encoding scheme based upon term frequency is that it ignores word ordering and can have high similarity even when the meaning of the sentence is not the same. Some examples mentioned in the blog below shows that the sentence it is cool and it and is it cool have a high similarity.

The original paper mentions two ways to encode the natural language sentence - a) Transformer encoder - This consists of 6 stacked transformer layers (each has a self-attention module followed by a feed-forward network).The self attention takes care of the nearby context to generate the word embeddings. b) Deep averaging network - begins by averaging the embeddings for the word and bi-grams and then puts it through a 4-layer feed-forward deep DNN as output.

A question then arises is when to use USE vs BERT. [3] is an interesting reference showing the comparion numbers using BERT vs others. Without tuning BERT is not meant to find similar sentences. BERT is trained on two main tasks - masked language model where words are hidden and the model is trained to predict the missing words and next sentence prediction to identify whether sentence B follows sentence A. USE is better suited for the similarity of sentence task.

References
[1] A very nice blog.
[2] Original paper of Universal sentence encoder
[3] Nice blog on BERT vs USE and others

Comments

Popular posts from this blog

MinHash, Bloom Filters and LSH

My learnings of Karpathy 1-3

Perceptron Algorithm