Wide and Deep learning for Recommender Systems

Lets discuss this widely cited Google paper on Wide and Deep learning. The paper mentions that an important challenge in recommender systems is to achieve both memorization and generalization. Memorization is learning the frequent co-occurrence of items and features whereas generalization explores new feature combinations that have rarely occurred in the past. LR models have been widely used in Google settings and generally have sparse features with one hot encoding. Memorization and generalization can be added in such models by cross-product transformations in the feature space but require a lot of manual feature engineering. On the other hand are embedding based models that learn a low dimensional embedding for each of the categorical features. One of the problems with embedding based model is that it will lead to non-zero predictions even when the user-item matrix is high rank and consists of niche users. To solve this problem the authors present a very neat idea - use both wide and deep part in the NN. The wide part is the generalized linear model which returns the output wTx + b and the input features are both the raw features as well as cross product transformations. The deep part is a feed-forward NN. The figure from the paper shows the architecture in details.
The final output consists of combining the output of the wide part and the deep part via a sigmoid function and joint training of both the parts as opposed to having ensembles where the model output is combined only at inference time.

Comments

Popular posts from this blog

MinHash, Bloom Filters and LSH

Perceptron Algorithm

Logistic Regression