Interior point

Added 2021-05-30, Modified 2022-03-12

Solving linear programming problems

Main idea

Instead of minimizing $c^Tx$ while avoiding the constraints, we define a barrier function $F(x)$ which blows up to infinity as we approach the barrier. then we minimize $t c^T x + F(x)$ for some scaler $t$ , which indicates how much we care about the objective vs the barrier.

Now we can use newton's method and other gradient based methods.

The Lagrangian duel

The KKT conditions state that $\min f(x)$ subject to $g(x) \le 0$ is solved if and only if

\begin{aligned} \nabla f(x) + u \nabla g(x) &= 0 \\ u &\ge 0 &&\text{u is nonnegative} \\ g(x) &\le 0 &&\text{x is feasible} \\ ug(x) &= 0 &&\text{the penalty term is zero} \end{aligned}

We can modify this to $ug(x) = -t$ for $t > 0$ then get $u = \frac{-t}{g(x)}$ Substituting into the gradient equation we get

\nabla f(x) + \frac{-t}{g(x)} \nabla g(x) = \nabla\left[f(x) - t\log(-g(x))\right] = 0

The log term is typically called the the barrier function.

Newton's method

The taylor approximation for a multivariate function is

f(x + h) \approx f(x) + h^T\nabla f(x) + \frac{1}{2} h^T \nabla^2 f(x)h

We want to pick $h$ such that the quadratic approximation is minimized¹ so we take the gradient/differentiate² w/r to h

\nabla_h f(x+h) \approx \nabla f + (\nabla^2 f)h

Equate to zero and solve for $h$

\nabla f + (\nabla^2 f)h = 0 \implies h = -(\nabla^2f)^{-1} \nabla f

Therefor the update rule is

x_{n+1} = x_n + h = x_n - (\nabla^2 f)^{-1}\nabla f

A key fact about newton's method is if we're close enough to a local optimum we get quadratic convergence.

let $y = Ax$ and $\phi(y) = f(A^{-1}y) = f(x)$ Applying newton's method to $\phi(y)$ is the same as applying it to $f(x)$

Preforming a change of basis³ we can find the new gradient and hessian

\begin{aligned} \nabla \phi(y) &= (A^{-1})^T \nabla f(A^{-1}y) \\ \nabla^2 \phi(y) &= (A^{-1})^T \nabla^2 f(A^{-1}y) A^{-1} \end{aligned}

Now if we preform newton's method on $\phi(y)$ after some nice cancellation we get

[\nabla^2 \phi(y)]^{-1}\nabla \phi(y) = A\left([\nabla^2 f(A^{-1}y)]^{-1} \nabla f(A^{-1}y)\right)

Which is just preforming newton's method in the $x$ world, then transforming back to $y$ .

How much can we increase t

Recall our objective function $tc^Tx + F(x)$ , we want to find how large we can set $t'$ such that we're still in the radius of convergence for newton's method.

We define the newton decrement⁴ for some function $f$ as

\lambda_f(x) = \sqrt{\nabla f(x)^T [\nabla^2 f(x)]^{-1} \nabla f(x)}

Losely speaking, this measures the distance from a local optimum⁵. notice $\lambda(x^*) = 0$

Quadratic convergence⁶ is written as

\lambda_f\left(x - (\nabla^2 f(x))^{-1} \nabla f(x)\right) \le \left(\frac{\lambda_f(x)}{1 - \lambda_f(x)}\right)^2

For our purposes $f_t(x) = t c^Tx + F(x)$

(TODO: Finish analysis/simplify lecture. for now I'm just stealing the t update rule.)

$\nu$ is defined here in the lecture

A result in the lecture is

t' \le t\left(1 + \frac{1}{4\sqrt \nu}\right)

Which means we can increase $t$ by a factor of $4^{-1} \nu^{-1/2}$ .

What is $\nu$ for the log barrier? (shown below)?

F(x) = - \sum_i \log(b_i - (Ax)_i)

TODO: Find $\nu$

TODO

Prove the error on the taylor approximation is $O(||h||^2)$
Prove quadratic convergence of newton's method

Actually we're going to the closest extreme point, we assume we're close to a minimum though. ↩
See this for derivations of the gradients ↩
We transpose the inverse $(A^{-1})^T$ because (TODO) ↩
The reason we restate in terms of the newton decrement is because the standard newton's method analysis isn't invariant under linear transformations. ↩
We don't need to worry about local optimum though since we're optimizing a convex function ↩
This requires the function is self concordant ↩