My Learnings in ML

Posts

My learnings of Karpathy 1-3

April 08, 2023

Andrej Karpathy has put together one of the most awesome video blogs on NNs and I finished watching the first three of them. Wanted to put out some of my ramblings on the same. These are my notes so that I internalize them well and also so that I do not forget :) Have become a bit forgetful of late (blame it on the forties! :| ). Lecture 1: Bacpropagation is the core of any modern NN . Derivative of function: If you slightly bump up f(x) by h how does it respond. f(x + h) - f(h) / h. Lecture 2: Bigram Model . Lecture 3: Basic Language model

MinHash, Bloom Filters and LSH

November 30, 2021

Lets talk about some large scale algorithms widely used in the document world. MinHash: Minhash is a really cute algorithm to determine how does a document compare to another. One simple way to compute document similarity is to compute the jaccard similarity between them. The jaccard similarity between the documents is computed as follows: def jaccard_similarity(set1: set, set2: set): if len(set1) == 0 or len(set2) == 0: return 0.0 common = len(set1.intersection(set2)) if common == 0: return 0.0 union = len(set1.union(set2)) return common / union One problem with Jaccard similarity is that it can take a lot of time to compute especially for large scale documents. The idea behind minhashing is to first operate on the space of document shingles. Shingles are basically any combination of k consecutive words of the document. For example, for the sentence "hi i am jaya" the possible 3 shingles are "hi i am" and "i am jaya...

Universal sentence encoder

September 26, 2020

Ok so I might be a bit late to join the NLP bus but am so glad I boarded! Lets start with the universal sentence encoder. The universal sentence encoder was released a couple of years ago by Google and is widely appreciated by the NLP community as a quick way to generate a sentence embedding before any further processing can be done on it. One reason to not to use a naive encoding scheme based upon term frequency is that it ignores word ordering and can have high similarity even when the meaning of the sentence is not the same. Some examples mentioned in the blog below shows that the sentence it is cool and it and is it cool have a high similarity. The original paper mentions two ways to encode the natural language sentence - a) Transformer encoder - This consists of 6 stacked transformer layers (each has a self-attention module followed by a feed-forward network).The self attention takes care of the nearby context to generate the word embeddings. b) Deep averaging network - be...

Tips for working better.

April 01, 2020

Being assertive is needed the most for the role without coming across as bossy. Every word, every sentence you say counts. If you say nothing, that is counted as a negative. Too much talk, too little talk, too much assertion, too little assertion, all bad. Try to find some allies to work with. Establish with your team that you know your shit technically. Have an opinion on everything that pertains to the team - from stand ups to team meetings to calls to code reviews, etc. Important to not to lose your point in a discussion. Lastly, think about how your boss would have navigated your questions and other questions and problems.

Recommendations at YouTube

February 01, 2020

Lets take a look at some of the practical papers published for recommendation algorithms at YouTube. Paper 1: Davidson et al., The YouTube Video Recommendation System One of the oldest papers around the topic is The Youtube Recommendation System . The paper mentions that users come to Youtube for either for direct navigation to locate the single video they found elsewhere or for search and goal directed browse to find specific videos around a topic or just to be entertained by the content they find. The recommender is a top-N recommender rather than a predictor. Challenges: poor metadata corpus size very large mostly short form (under 10 min length) user interactions are relatively short and noisy videos have a short life cycle going from upload to viral in the order of days requiring constant freshness The goal is to find recommendations that are reasonably recent and fresh as well as relevant and diverse to the users taste. The input data can be divided into two p...

Deep Reinforcement Learning

November 05, 2019

Lets talk about one of the difficult areas in ML - Deep Reinforcement Learning. Two of the most popularly used approaches in the space policy gradients and deep Q-networks . An agent interacts with the user within an environment and receives rewards . Policy search is finding a good set of parameters in the policy space . One way to explore the policy space is via policy gradient approach which evaluates the gradients of the rewards w.r.t the parameters and then moves in the direction of maximizing reward. The policy themselves can be defined via let's say Neural Networks. In the case of Supervised ML, we already know the best action from the set of actions and the NN could be trained by minimizing the cross-entropy loss between the estimated and target distributions. However, in RL, as we focus on long term reward, the reward itself could be delayed or sparse. This is known as the classic credit assignment problem. This problem is generally solved by summing up all...