MCMC Sampling - Part I

April 29, 2013

Markov Chain Monte Carlo (MCMC) methods are popularly used in Bayesian inference in order to obtain samples from complex distributions. Let us examine two of the popularly used MCMC methods today -

Metropolis Algorithm
Gibbs Sampling

Before we discuss the two methods, let us start by examining Markov chains. A first order Markov process refers to a process such that the transition in the state space is only dependent on the current state. In simple words it can be said as "memoryless".

P(Xt+1 | X0, X1, X2, Xt) = Pr(Xt+1|Xt)

A Markov chain is irreducible if it is possible to get from any state to any other state. In other words, the states communicate with each other and the transition probability p_i,j > 0 for any state i, j. A chain is aperiodic if there are no cycles. A state i is transient as opposed to recurrent if there is a non-zero probability that upon starting from state i we will never return to it. A chain is ergodic if it is aperiodic and recurrent. Finally a chain reaches a stationary distribution if the following equation holds where π* is the probability of being in any particular state and P is the transition matrix.

π* = π*P

Metropolis Algorithm:
The Metropolis algorithm is used to generate random samples in order to integrate over complex distributions. The Metropolis algorithm like the rejection sampling algorithm generates samples from an envelope or proposal or candidate-generating distribution. However, unlike in rejection sampling the random samples are generated from a Markov chain. Further the Metropolis algorithm requires the candidate generating distribution to be symmetric. The steps of the algorithm are as follows:

Initialize θ0 such that f(θ0) > 0
Using the jumping distribution generate a new point θ*.
Compute the acceptance probability α as - α = f(θ*)/f(θt-1)
If α > 1 accept the candidate point and update θt. else accept it with a probability α

The Metropolis Hastings algorithm relaxes the notion of a symmetric envelope distribution and works for any complex distributions. Here is the acceptance probability for the Metropolis Hastings algorithm:

α = min (f(θ*)q(θ*, θt-1)/f(θt-1)q(θt-1, θ*) ,1)

There are a number of issues that need to be resolved in order to implement the algorithm. One of key decision is the choice of the jumping distribution. Another issue is the choice of the initial value for the chain. Typically there are two approaches followed to choose the jumping distribution - random walk and independent chain. In the random walk chain algorithm the candidate is drawn according to the process y=x +z where the point z is drawn from the distribution q. If the distribution q is symmetric then the acceptance probability is f(θ*)/f(θt-1). In case of independent chain the probability of of the new jumping point y is independent of x.
A poor choice of initial value can lead to a poorly mixing chain. A chain is said to be poorly mixing if it stays in the small region of the parameter space for a long time. One of the reasons of poor mixing could be getting stuck in a local minima. One of the ways to avoid poorly mixing chains is by running the algorithm with multiple initial values. Generally the initial value is set as the mode of the distribution. Another practical problem is understanding the convergence of the chain. Typically the first few values of the chain are thrown away as they correspond to the burn-in period. The burn-in period corresponds to the period when the initial value has an influence on the current value of the chain. One of the ways to handle convergence is via examining the autocorrelation in the time series trace of the parameter. Lets try to code the Metropolis Hastings algorithm in Matlab. Let us assume that the original distribution is the Cauchy distribution and the jumping distribution is the Uniform distribution. The Cauchy distribution is a pathological distribution with its mean and variance undefined. The matlab code is as follows:

function [output] = cauchy(x)    
    output = 1/(pi*(1 + x^2))
end

In the next post we will discuss Gibbs sampling another popular approach in sampling.
Helpful Links: Markov Chain Monte Carlo and Gibbs Sampling

My Learnings in ML

MCMC Sampling - Part I

Comments

Post a Comment

Popular posts from this blog

My learnings of Karpathy 1-3

MinHash, Bloom Filters and LSH

Two Tower