Convex Optimization

October 31, 2012

Convex optimization problems are problems of the kind -

min f(x)
subject to g_i(x) ≤ b ∀ i=1...n
           h_i(x) = 0 ∀ i=1...m

where the both the objective function and the inequality constraints are convex and the equality constraints are affine. A simple definition of a convex function is that if we consider two points on the function the line segment is contained in the function. An alternate definition is that a tangent at any point on the function always lies below the function. In simple words, a convex function is a function with positive curvature as shown in the figure below.

Some example convex functions are as follows:

An affine function. (Actually it is both convex and concave)
Power of x, x^α α ≥ 1 or α ≤ 0 (Note that if α is between 0-1 the function is concave)
Exponential (Logarithm is concave)
Norms

How to systematically determine if a function is convex ?

Jensen's inequality: A function f is convex if

f(θx + (1 - θ)y) ≤ θf(x) + (1 - θ)f(y)  for 0 ≤ θ ≤ 1

If we consider any two points x & y on the function, then the Bregman divergence is a positive quantity if the function is convex and x is the optimal point. (Note that the Bregman divergence computes the difference between the first order taylor approximation of f at x as a function of y)
```
D(x,y) = f(y) - (f(x) + ∇f(x)(y-x)) ≥ 0 
```
A function is convex if and only if it is 2nd differentiable then the Hessian (2nd derivative) is positive semi-definite.
If the function is composed of simple convex functions with operations that preserve convexity.

The advantage of convex formulation is that there is a global optimal solution available. Lets talk about how to solve these problems. Let us first consider problems of the kind:

min f(x)
subject to g_i(x) ≤ b ∀ i=1...n
           h_i(x) = 0 ∀ i=1...m

The Lagrangian of this standard form of problems which are not necessarily convex is defined by

L(x, λ, ν) = f(x) + ∑λ_i(g_i(x) -b) + ∑ν_i(h_i(x))

where λ_i and ν_i are called as the Lagrangian multipliers. Now, the dual function of the Lagrangian is defined as the infimum over x of the Lagrangian function and is defined as follows.

d(λ, ν) = inf x ∈ D L(x, λ, ν)

The dual function of the Lagrangian is necessarily a concave function. In order to prove this, let us first understand that the dual is affine in λ and ν. The infimum of a set of affine functions is a concave function and hence the dual is a concave function. Next, we claim that the optimal value of the dual function is a lower bound on the optimal value of the original function f if λ_i ≥ 0. This can be proved trivially as the term g_i(x) - b ≤ 0 and h_i(x) = 0 and hence the last term does not matter.
Useful Links:

Stephen Boyd's lectures on Convex Optimization are really awesome.
His entire book is available online.
CVX matlab toolbox for convex programming. This makes writing programs for convex optimization very easy.

My Learnings in ML

Convex Optimization

Comments

Post a Comment

Popular posts from this blog

My learnings of Karpathy 1-3

Two Tower

MinHash, Bloom Filters and LSH