Instead of minimizing while avoiding the constraints, we define a barrier function which blows up to infinity as we approach the barrier. then we minimize
for some scaler t, which indicates how much we care about the objective vs the barrier.
Now we can use newton's method and other gradient based methods.
The taylor approximation for a multivariate function is
We want to pick such that the quadratic approximation is minimized so we take the gradient/differentiate w/r to h
Equate to zero and solve for
Therefor the update rule is
A key fact about newton's method is if we're close enough to a local optimum we get quadratic convergence.
let and Applying newton's method to is the same as applying it to
Preforming a change of basis we can find the new gradient and hessian
Now if we preform newton's method on after some nice cancellation we get
Which is just preforming newton's method in the world, then transforming back to .
Recall our objective function , we want to find how large we can set such that we're still in the radius of convergence for newton's method.
We define the newton decrement for some function as
Losely speaking, this measures the distance from a local optimum. notice
Quadratic convergence is written as
For our purposes
Actually we're going to the closest extreme point, we assume we're close to a minimum though. ↩︎
See this for derivations of the gradients ↩︎
We transpose the inverse because (TODO) ↩︎
The reason we restate in terms of the newton decrement is because the standard newton's method analysis isn't invariant under linear transformations. ↩︎
We don't need to worry about local optimum though since we're optimizing a convex function ↩︎
This requires the function is self concordant ↩︎