Instead of minimizing while avoiding the constraints, we define a barrier function which blows up to infinity as we approach the barrier. then we minimize for some scaler , which indicates how much we care about the objective vs the barrier.
Now we can use newton's method and other gradient based methods.
The Lagrangian duel
The KKT conditions state that subject to is solved if and only if
We can modify this to for then get Substituting into the gradient equation we get
The log term is typically called the the barrier function.
The taylor approximation for a multivariate function is
Equate to zero and solve for
Therefor the update rule is
A key fact about newton's method is if we're close enough to a local optimum we get quadratic convergence.
let and Applying newton's method to is the same as applying it to
Preforming a change of basis3 we can find the new gradient and hessian
Now if we preform newton's method on after some nice cancellation we get
Which is just preforming newton's method in the world, then transforming back to .
How much can we increase t
Recall our objective function , we want to find how large we can set such that we're still in the radius of convergence for newton's method.
We define the newton decrement4 for some function as
Losely speaking, this measures the distance from a local optimum5. notice
Quadratic convergence6 is written as
For our purposes
(TODO: Finish analysis/simplify lecture. for now I'm just stealing the t update rule.)
is defined here in the lecture
A result in the lecture is
Which means we can increase by a factor of .
What is for the log barrier? (shown below)?
- Prove the error on the taylor approximation is
- Prove quadratic convergence of newton's method
Actually we're going to the closest extreme point, we assume we're close to a minimum though. ↩
We transpose the inverse because (TODO) ↩
The reason we restate in terms of the newton decrement is because the standard newton's method analysis isn't invariant under linear transformations. ↩
We don't need to worry about local optimum though since we're optimizing a convex function ↩