Prerequisites
- Familiarity with derivatives
- Familiarity with vectors (adding them and multiplying them by numbers)
- Knowledge of matrix multiplication and matrix inverses (you can manage without, but you'll miss some stuff)
If you're ever confused by notation, see Notation
Root Finding
Newton's method is typically presented as a method for finding roots, values of where . Many people don't realize how general this is
- Solving is the same as solving . Root finding can be used to solve equations.
- Finding the minimum/maximum of some function requires solving . Root finding strikes again
Newton's method also works in any number of dimensions, have a system of equations to solve? A cost function you want to minimize with multiple parameters?
To illustrate all the places Newton's method can be used, Let's look at some examples
Example: Square roots
The (obligatory) standard demo of newton's method is this magical formula
Plug in a guess for , and it gives you a better guess. If you plug in and you get while the true value starts . 4 decimal places of accuracy!
Example: Roots of Polynomials
It's a pop-math fact that there does not exist a formula for the roots of polynomials with degree greater than 5. Newton's method works just fine, In fact finding roots of high degree polynomials is a solved problem with efficient algorithms existing for polynomials of degree greater than a thousand!
Finding roots of polynomials comes all the time in computer graphics, so this has a lot of applications. (Example: Given a point, finding the closest point on a quadratic Bézier curve reduces to finding the roots of a third degree polynomial)
Example: Projectile Aiming
At the end of this post we'll use these techniques to shoot down an enemy drone in a toy physics model.
Example: Landing Rockets
Newton's Method is used to solve convex optimization problems, which in turn are used to land rockets with the G-FOLD algorithm. See NASA's G-FOLD Diversion Test or SpaceX literally cheating.
Example: Robot Backflips
You've seen Boston Dynamics's Atlas robots right? Hm, I wonder what they use to control the robot, according to this paper lots of convex optimization and Inverse kinematics. Both of which involve Newton's Method!
Single variable Newton's Method
Hyped up? Let's start the math with Newton's Method in one variable. We have a function and some guess for a root which we want to improve. Newton's Method does this by finding a root of the linear approximation of around .
Let's break that down, we want to find such that . We can use the tangent line approximation from calculus
Solve for when the tangent/linear approximation is zero
Exercise: Verify the formula for Square Roots (hint: consider )
Solution
Let , we have so assuming our previous guess was our new guess is
Here's a visualization of how this works
Exercise: Suppose we're trying to find , what's the geometric interpretation of ?
Solution
The point where the tangent line has a root, newton's method works by iteratively finding the root of the tangent approximation.
Don't get too attached to the visuals! Because next up is...
Multivariable Newton's Method
Before we do newton's method in multiple variables, we need to talk about differentiation in multiple variables. Consider a function from to dimensions, fully written in vector form would be
Where denotes the first output coordinate, the second, etc. We use a bold to indicate that is a vector.
If you want to use fancy symbols, you could say to mean is in the set of length vectors with real components, and say to mean takes a vector in to a vector in .
Generalizing the derivative
Skip to Finding our new guess if you know jacobians, also see khan academy if my explanation doesn't make sense to you.
Consider tweaking by adding a vector , and consider the first coordinate of the difference
We're just adding all the changes in as you can see if you imagine "cancelling" .
The same logic applies for the other coordinates. In fact this is the a matrix multiplication!
I'm going to denote the matrix on the left as .
Exercise: If you know about basis vectors, think about the basis vectors of . Watch this after
Meaning our linear approximation for is
Notice the similarity to the single variable case! The only difference is is a matrix instead a number.
Finding our new guess
Like before we solve to make the linear approximation zero, In exactly the same way except we use the matrix inverse instead of division.
Which is our new guess!
Exercise: If we're trying to minimize a loss function by making . What's the formula for Newton's Method in this case? (hint: The jacobian of has a name!)
Solution
The jacobian of is the Hessian of meaning we get
Notice how this is analgous to the single variable case, the Hessian is the multivariable analgous of the second derivative.
Projectile Aiming
Let's put everything together in one semi-real world example. Shooting down drones. Suppose gives the position of the enemy drone at a specific time, and we're allowed to pick a direction (in spherical coordinates) to shoot. The velocity, and acceleration, from gravity of the shot are given by
The position of the shot at time is given by .
We want to find and such that . We can use multivariable Newton's Method for this! We need to find the jacobian of
I'm lazy, so I'll make JAX do it for me
import jax.numpy as jnp
from jax import jacobian
# p(t) is made up for illustration
def p(t):
return jnp.array([t, jnp.sin(t), 0.1*t**2])
def v(theta, phi):
return 5*jnp.array([
jnp.sin(theta)*jnp.cos(phi),
jnp.sin(theta)*jnp.sin(phi),
jnp.cos(theta)
])
g = jnp.array([0, 0, -9.8])
def s(theta, phi, t):
return t*v(theta, phi) + 0.5*t**2*g
def f(v):
return p(v[2]) - s(v[0], v[1], v[2])
Df = jacobian(f)
def step(x):
Dfx = Df(x)
Dfx_inv = jnp.linalg.inv(Dfx)
return x - Dfx_inv @ f(x)
# In a real application you would not hardcode this,
# setting it to hit the target ignoring movement and
# gravity would be a good initial guess.
guess = jnp.array([1,2,0.5])
for i in range(5):
print(f'missed by {jnp.linalg.norm(f(guess)):.5f}')
guess = step(guess)
Prints:
missed by 1.98916
missed by 1.51225
missed by 0.26145
missed by 0.00024
missed by 0.00000
Resources
Here are other good explanations of Newton's Method
- Visually Explained: Newton's Method in Optimization (watch all his videos)
- Newton's Fractal (which Newton knew nothing about) (likewise ^)
Here's some programming resources
- JAX Python autodiff
- Zygote.jl Julia autodiff
- How ∂P is enabling new science in Julia has a nice demo of applying these techniques to aim a trebuchet
Notation
denotes the set of length vectors with real components, for example
Bold variables and denote vectors, denotes the first component of .
I write vectors in two ways, but they represent the same object
Sometimes I get lazy and write when takes in a vector in , technically I should write since doesn't take two numbers, it takes one vector .