Posts

Showing posts from 2020

Universal sentence encoder

Ok so I might be a bit late to join the NLP bus but am so glad I boarded! Lets start with the universal sentence encoder. The universal sentence encoder was released a couple of years ago by Google and is widely appreciated by the NLP community as a quick way to generate a sentence embedding before any further processing can be done on it. One reason to not to use a naive encoding scheme based upon term frequency is that it ignores word ordering and can have high similarity even when the meaning of the sentence is not the same. Some examples mentioned in the blog below shows that the sentence it is cool and it and is it cool have a high similarity. The original paper mentions two ways to encode the natural language sentence - a) Transformer encoder - This consists of 6 stacked transformer layers (each has a self-attention module followed by a feed-forward network).The self attention takes care of the nearby context to generate the word embeddings. b) Deep averaging network - be...

Tips for working better.

Being assertive is needed the most for the role without coming across as bossy. Every word, every sentence you say counts. If you say nothing, that is counted as a negative. Too much talk, too little talk, too much assertion, too little assertion, all bad. Try to find some allies to work with. Establish with your team that you know your shit technically. Have an opinion on everything that pertains to the team - from stand ups to team meetings to calls to code reviews, etc. Important to not to lose your point in a discussion. Lastly, think about how your boss would have navigated your questions and other questions and problems.

Recommendations at YouTube

Lets take a look at some of the practical papers published for recommendation algorithms at YouTube. Paper 1: Davidson et al., The YouTube Video Recommendation System One of the oldest papers around the topic is The Youtube Recommendation System . The paper mentions that users come to Youtube for either for direct navigation to locate the single video they found elsewhere or for search and goal directed browse to find specific videos around a topic or just to be entertained by the content they find. The recommender is a top-N recommender rather than a predictor. Challenges: poor metadata corpus size very large mostly short form (under 10 min length) user interactions are relatively short and noisy videos have a short life cycle going from upload to viral in the order of days requiring constant freshness The goal is to find recommendations that are reasonably recent and fresh as well as relevant and diverse to the users taste. The input data can be divided into two p...