Word2Vec
Ok, first blog post after a loooooooong time and the first one after baby #2. Have been wanting to catch up on readings for so long and what a better way than to write it and explain it myself. Lets begin with the model which has perhaps been beaten to death in the past few years. Word2Vec is a popular algorithm to generate word embeddings. The original algorithm by Mikolov, et al. has been proposed in the following references ( 1 and 2 ). There are two key pieces of the model: The Skip-gram model: In this model, we are given a corpus of word and its context (context for example is nearby word within a window size). The goal is to train a NN to predict the probability of every word in the vocabulary given the input word (hopefully the probability of context words would be much higher). We thus need to find the parameters that maximize the probability: argmax Π p(c|w;θ) where c is the set of contexts for the word w in the corpus The conditional probability is usually ...