t-SNE

August 20, 2015

t-SNE is a non-linear dimensionality reduction technique that is well suited as a visualization technique as its goal is to reduce the dimensionality to 2 or 3 dimensions. The ideal goal of visualization is to build a map in low dimensional space that reflects the similarities in the original space. Popular dimensionality reduction techniques like PCA are not very useful for this purpose because the goal of PCA is to maximize variance which is the same as trying to minimize the squared error between distances in the original data and the map and hence this leads to preserving large pairwise distance in the map. However large pairwise distances are not always reliable and on the other small distances between neighbors is reliable. Techniques relying upon preserving neighboring distance are Isomap, LLE, SNE and t-SNE.

t-SNE is based upon the SNE (stochastic neighbor embedding) algorithm and places a Gaussian around every point in the high dimensional space and then measure the density at all the other points w.r.t the point. In other words it represents a probability distribution over a pair of points which represents the similarity between the points.

The basic idea is as follows:
t-SNE first computes the pairwise affinities between points as -
p _ij = p_j|i + p_i|j where,
p_j|i = exp( || x_i - x_j ||² /2σ_i²) / ∑ k≠i exp( || x_i - x_k ||² /2σ_i²)

Note that a different bandwidth σ_i associated w.r.t every point. The σi is set in such a way that the conditional distribution has a fixed perplexity. This helps in setting the σi to a very low value for dense points. The goal of t-SNE is to learn a low dimensional mapping from y1, ...yN that reflects the similarities pij. The similarities qij between the low dimensional points is obtained using a heavy tailed student-t distribution as
q_ij = ( 1 + || y_i - y_j||²)^-1 / ∑ k≠l ( 1 + || y_k - y_l||²) ^-1
t-SNE aims to minimize the KL divergence between the two distributions P and Q using gradient descent.

Future Work

Scalability is still a major issue with the algorithm. The fastest implementation available takes a couple of hours for a million data points.

References

Van der Maaten, Laurens, and Geoffrey Hinton. "Visualizing data using t-SNE."Journal of Machine Learning Research 9.2579-2605 (2008): 85.
Laurens excellent talk on the topic.

My Learnings in ML

t-SNE

Comments

Post a Comment

Popular posts from this blog

SVM

kmeans++

MinHash, Bloom Filters and LSH