Interior point

Added 2021-05-30, Modified 2022-03-12

Solving linear programming problems


Main idea

Instead of minimizing cTxc^Tx while avoiding the constraints, we define a barrier function F(x)F(x) which blows up to infinity as we approach the barrier. then we minimize tcTx+F(x)t c^T x + F(x) for some scaler tt, which indicates how much we care about the objective vs the barrier.

Now we can use newton's method and other gradient based methods.

The Lagrangian duel

The KKT conditions state that minf(x)\min f(x) subject to g(x)0g(x) \le 0 is solved if and only if

f(x)+ug(x)=0u0u is nonnegativeg(x)0x is feasibleug(x)=0the penalty term is zero\begin{aligned} \nabla f(x) + u \nabla g(x) &= 0 \\ u &\ge 0 &&\text{u is nonnegative} \\ g(x) &\le 0 &&\text{x is feasible} \\ ug(x) &= 0 &&\text{the penalty term is zero} \end{aligned}

We can modify this to ug(x)=tug(x) = -t for t>0t > 0 then get u=tg(x)u = \frac{-t}{g(x)} Substituting into the gradient equation we get

f(x)+tg(x)g(x)=[f(x)tlog(g(x))]=0\nabla f(x) + \frac{-t}{g(x)} \nabla g(x) = \nabla\left[f(x) - t\log(-g(x))\right] = 0

The log term is typically called the the barrier function.

Newton's method

The taylor approximation for a multivariate function is

f(x+h)f(x)+hTf(x)+12hT2f(x)hf(x + h) \approx f(x) + h^T\nabla f(x) + \frac{1}{2} h^T \nabla^2 f(x)h

We want to pick hh such that the quadratic approximation is minimized1 so we take the gradient/differentiate2 w/r to h

hf(x+h)f+(2f)h\nabla_h f(x+h) \approx \nabla f + (\nabla^2 f)h

Equate to zero and solve for hh

f+(2f)h=0    h=(2f)1f\nabla f + (\nabla^2 f)h = 0 \implies h = -(\nabla^2f)^{-1} \nabla f

Therefor the update rule is

xn+1=xn+h=xn(2f)1fx_{n+1} = x_n + h = x_n - (\nabla^2 f)^{-1}\nabla f

A key fact about newton's method is if we're close enough to a local optimum we get quadratic convergence.

let y=Axy = Ax and ϕ(y)=f(A1y)=f(x)\phi(y) = f(A^{-1}y) = f(x) Applying newton's method to ϕ(y)\phi(y) is the same as applying it to f(x)f(x)

Preforming a change of basis3 we can find the new gradient and hessian

ϕ(y)=(A1)Tf(A1y)2ϕ(y)=(A1)T2f(A1y)A1\begin{aligned} \nabla \phi(y) &= (A^{-1})^T \nabla f(A^{-1}y) \\ \nabla^2 \phi(y) &= (A^{-1})^T \nabla^2 f(A^{-1}y) A^{-1} \end{aligned}

Now if we preform newton's method on ϕ(y)\phi(y) after some nice cancellation we get

[2ϕ(y)]1ϕ(y)=A([2f(A1y)]1f(A1y))[\nabla^2 \phi(y)]^{-1}\nabla \phi(y) = A\left([\nabla^2 f(A^{-1}y)]^{-1} \nabla f(A^{-1}y)\right)

Which is just preforming newton's method in the xx world, then transforming back to yy.

How much can we increase t

Recall our objective function tcTx+F(x)tc^Tx + F(x), we want to find how large we can set tt' such that we're still in the radius of convergence for newton's method.

We define the newton decrement4 for some function ff as

λf(x)=f(x)T[2f(x)]1f(x)\lambda_f(x) = \sqrt{\nabla f(x)^T [\nabla^2 f(x)]^{-1} \nabla f(x)}

Losely speaking, this measures the distance from a local optimum5. notice λ(x)=0\lambda(x^*) = 0

Quadratic convergence6 is written as

λf(x(2f(x))1f(x))(λf(x)1λf(x))2\lambda_f\left(x - (\nabla^2 f(x))^{-1} \nabla f(x)\right) \le \left(\frac{\lambda_f(x)}{1 - \lambda_f(x)}\right)^2

For our purposes ft(x)=tcTx+F(x)f_t(x) = t c^Tx + F(x)

(TODO: Finish analysis/simplify lecture. for now I'm just stealing the t update rule.)

ν\nu is defined here in the lecture

A result in the lecture is

tt(1+14ν)t' \le t\left(1 + \frac{1}{4\sqrt \nu}\right)

Which means we can increase tt by a factor of 41ν1/24^{-1} \nu^{-1/2}.

What is ν\nu for the log barrier? (shown below)?

F(x)=ilog(bi(Ax)i)F(x) = - \sum_i \log(b_i - (Ax)_i)

TODO: Find ν\nu

TODO

Footnotes

  1. Actually we're going to the closest extreme point, we assume we're close to a minimum though.

  2. See this for derivations of the gradients

  3. We transpose the inverse (A1)T(A^{-1})^T because (TODO)

  4. The reason we restate in terms of the newton decrement is because the standard newton's method analysis isn't invariant under linear transformations.

  5. We don't need to worry about local optimum though since we're optimizing a convex function

  6. This requires the function is self concordant