t-SNE

t-SNE is a non-linear dimensionality reduction technique that is well suited as a visualization technique as its goal is to reduce the dimensionality to 2 or 3 dimensions. The ideal goal of visualization is to build a map in low dimensional space that reflects the similarities in the original space. Popular dimensionality reduction techniques like PCA are not very useful for this purpose because the goal of PCA is to maximize variance which is the same as trying to minimize the squared error between distances in the original data and the map and hence this leads to preserving large pairwise distance in the map. However large pairwise distances are not always reliable and on the other small distances between neighbors is reliable. Techniques relying upon preserving neighboring distance are Isomap, LLE, SNE and t-SNE.

t-SNE is based upon the SNE (stochastic neighbor embedding) algorithm and places a Gaussian around every point in the high dimensional space and then measure the density at all the other points w.r.t the point. In other words it represents a probability distribution over a pair of points which represents the similarity between the points.

The basic idea is as follows:
t-SNE first computes the pairwise affinities between points as -
p ij = pj|i + pi|j where,
pj|i = exp( || xi - xj ||2 /2σi2) / ∑ k≠i exp( || xi - xk ||2 /2σi2)

Note that a different bandwidth σi associated w.r.t every point. The σi is set in such a way that the conditional distribution has a fixed perplexity. This helps in setting the σi to a very low value for dense points. The goal of t-SNE is to learn a low dimensional mapping from y1, ...yN that reflects the similarities pij. The similarities qij between the low dimensional points is obtained using a heavy tailed student-t distribution as
qij = ( 1 + || yi - yj||2) -1 / ∑ k≠l ( 1 + || yk - yl||2) -1
t-SNE aims to minimize the KL divergence between the two distributions P and Q using gradient descent.

Future Work
  1. Scalability is still a major issue with the algorithm. The fastest implementation available takes a couple of hours for a million data points.

References
  1. Van der Maaten, Laurens, and Geoffrey Hinton. "Visualizing data using t-SNE."Journal of Machine Learning Research 9.2579-2605 (2008): 85.
  2. Laurens excellent talk on the topic.


Comments

Popular posts from this blog

MinHash, Bloom Filters and LSH

Perceptron Algorithm

Logistic Regression