## Prerequisites

- Familiarity with derivatives
- Familiarity with vectors (adding them and multiplying them by numbers)
- Knowledge of matrix multiplication and matrix inverses (you can manage without, but you'll miss some stuff)

If you're ever confused by notation, see Notation

## Root Finding

Newton's method is typically presented as a method for finding *roots*, values of $x$ where $f(x) = 0$.
Many people don't realize how general this is

- Solving $g(x) = h(x)$ is the same as solving $g(x) - h(x) = 0$. Root finding can be used to solve equations.
- Finding the minimum/maximum of some function $f$ requires solving $f'(x) = 0$. Root finding strikes again

Newton's method also works in any number of dimensions, have a system of equations to solve? A cost function you want to minimize with multiple parameters?

To illustrate all the places Newton's method can be used, Let's look at some examples

### Example: Square roots

The (obligatory) standard demo of newton's method is this magical formula

Plug in a guess for $\sqrt s$, and it gives you a better guess. If you plug in $s=2$ and $x=1.4$ you get $1.41428$ while the true value starts $1.41421$. 4 decimal places of accuracy!

### Example: Roots of Polynomials

It's a pop-math fact that there does not exist a formula for the roots of polynomials with degree greater than 5. Newton's method works just fine, In fact finding roots of high degree polynomials is a solved problem with efficient algorithms existing for polynomials of degree greater than a thousand!

Finding roots of polynomials comes all the time in computer graphics, so this has a lot of applications. (Example: Given a point, finding the closest point on a quadratic Bézier curve reduces to finding the roots of a third degree polynomial)

### Example: Projectile Aiming

At the end of this post we'll use these techniques to shoot down an enemy drone in a toy physics model.

### Example: Landing Rockets

Newton's Method is used to solve convex optimization problems, which in turn are used to land rockets with the G-FOLD algorithm. See NASA's G-FOLD Diversion Test or SpaceX literally cheating.

### Example: Robot Backflips

You've seen Boston Dynamics's Atlas robots right? Hm, I wonder what they use to control the robot, according to this paper lots of convex optimization and Inverse kinematics. Both of which involve Newton's Method!

## Single variable Newton's Method

Hyped up? Let's start the math with Newton's Method in one variable.
We have a function $f$ and some guess $x$ for a root which we want to improve.
Newton's Method does this by finding a root of the *linear approximation* of $f$ around $x$.

Let's break that down, we want to find $y$ such that $f(y) = 0$. We can use the tangent line approximation from calculus

Solve for when the tangent/linear approximation is zero

**Exercise:** Verify the formula for Square Roots (hint: consider $f(x) = x^2 - s$)

## Solution

Let $f(x) = x^2 - s$, we have $f'(x) = 2x$ so assuming our previous guess was $x$ our new guess is

Here's a visualization of how this works

**Exercise:** Suppose we're trying to find $f'(x) = 0$, what's the geometric interpretation of $x - f'(x)/f''(x)$?

## Solution

The point where the tangent line has a root, newton's method works by iteratively finding the root of the tangent approximation.

Don't get too attached to the visuals! Because next up is...

## Multivariable Newton's Method

Before we do newton's method in multiple variables, we need to talk about differentiation in multiple variables. Consider a function $f$ from $n$ to $m$ dimensions, fully written in vector form $\mathbf f$ would be

Where $f_1(\mathbf x_1,\dots,\mathbf x_n)$ denotes the first output coordinate, $f_2$ the second, etc. We use a bold $\mathbf x$ to indicate that $\mathbf x$ is a vector.

If you want to use fancy symbols, you could say $\mathbf x \in \mathbf R^n$ to mean $\mathbf x$ is in the set of length $n$ vectors with real components, and say $f:\mathbf{R}^n\to \mathbf R^m$ to mean $f$ takes a vector in $\mathbf R^n$ to a vector in $\mathbf R^m$.

### Generalizing the derivative

Skip to Finding our new guess if you know jacobians, also see khan academy if my explanation doesn't make sense to you.

Consider tweaking $\mathbf x$ by adding a vector $\mathrm d\mathbf x$, and consider the first coordinate of the difference

We're just adding all the changes in $f$ as you can see if you imagine "cancelling" $\mathrm d\mathbf x$.

The same logic applies for the other coordinates. In fact this is the a matrix multiplication!

I'm going to denote the matrix on the left as $D_f(\mathbf x)$.

**Exercise:** If you know about basis vectors, think about the basis vectors of $D_f(\mathbf x)$. Watch this after

Meaning our linear approximation for $f(\mathbf y)$ is

Notice the similarity to the single variable case! The only difference is $D_f(\mathbf x)$ is a matrix instead a number.

### Finding our new guess

Like before we solve to make the linear approximation zero, In exactly the same way except we use the matrix inverse $D_f(\mathbf x)^{-1}$ instead of division.

Which is our new guess!

**Exercise:** If we're trying to minimize a loss function $g : \mathbf R^n \to \mathbf R$ by making $\nabla g(x) = 0$. What's the formula for Newton's Method in this case? (hint: The jacobian of $\nabla g$ has a name!)

## Solution

The jacobian of $\nabla g$ is the Hessian $H_g$ of $g$ meaning we get

Notice how this is analgous to the single variable case, the Hessian is the multivariable analgous of the second derivative.

## Projectile Aiming

Let's put everything together in one semi-real world example. Shooting down drones. Suppose $\mathbf p(t)$ gives the position of the enemy drone at a specific time, and we're allowed to pick a direction $(\theta, \phi)$ (in spherical coordinates) to shoot. The velocity, and acceleration, from gravity of the shot are given by

The position of the shot at time $t$ is given by $\mathbf s(\theta,\phi,t) = t\mathbf v(\theta,\phi) + \frac 12 t^2\mathbf g$.

We want to find $\theta, \phi$ and $t_\text{hit}$ such that $\mathbf p(t_\text{hit}) = \mathbf s(t_\text{hit})$. We can use multivariable Newton's Method for this! We need to find the jacobian of

I'm lazy, so I'll make JAX do it for me

```
import jax.numpy as jnp
from jax import jacobian
# p(t) is made up for illustration
def p(t):
return jnp.array([t, jnp.sin(t), 0.1*t**2])
def v(theta, phi):
return 5*jnp.array([
jnp.sin(theta)*jnp.cos(phi),
jnp.sin(theta)*jnp.sin(phi),
jnp.cos(theta)
])
g = jnp.array([0, 0, -9.8])
def s(theta, phi, t):
return t*v(theta, phi) + 0.5*t**2*g
def f(v):
return p(v[2]) - s(v[0], v[1], v[2])
Df = jacobian(f)
def step(x):
Dfx = Df(x)
Dfx_inv = jnp.linalg.inv(Dfx)
return x - Dfx_inv @ f(x)
# In a real application you would not hardcode this,
# setting it to hit the target ignoring movement and
# gravity would be a good initial guess.
guess = jnp.array([1,2,0.5])
for i in range(5):
print(f'missed by {jnp.linalg.norm(f(guess)):.5f}')
guess = step(guess)
```

Prints:

```
missed by 1.98916
missed by 1.51225
missed by 0.26145
missed by 0.00024
missed by 0.00000
```

## Resources

Here are other good explanations of Newton's Method

- Visually Explained: Newton's Method in Optimization (watch all his videos)
- Newton's Fractal (which Newton knew nothing about) (likewise ^)

Here's some programming resources

- JAX Python autodiff
- Zygote.jl Julia autodiff
- How ∂P is enabling new science in Julia has a nice demo of applying these techniques to aim a trebuchet

## Notation

$\mathbf R^n$ denotes the set of length $n$ vectors with real components, $(\sqrt 2,\pi) \in \mathbf R^2$ for example

Bold variables $\mathbf x$ and $\mathbf y$ denote vectors, $\mathbf x_1$ denotes the first component of $\mathbf x$.

I write vectors in two ways, but they represent the same object

Sometimes I get lazy and write $g(\mathbf{x}) = g(\mathbf{x_1}, \mathbf{x_2})$ when $g$ takes in a vector in $\mathbf{R}^2$, technically I should write $g(\mathbf{x}) = g((\mathbf{x_1}, \mathbf{x_2}))$ since $g$ doesn't take two numbers, it takes one vector $\mathbf x=(\mathbf x_1,\mathbf x_2)$.