Newton’s method is one of the most fundamental methods for rootfinding but it also introduces us to some other big ideas in iterative methods – superlinear convergence!
We can see that there is a root near \(x = 1\). This will be our initial guess, \(x_1\).
Rather than finding the root of \(f\) itself, we settle for finding the root of the tangent line and let this be our next approximation \(x_2\).
x2 = x1 - y1 / m1 = 0.8678794411714423
The residual (i.e., value of \(f\) at \(x_2\)) is smaller than before but not zero. Thus, we repeat!
x3 = x2 - y2 / m2 = 0.8527833734164099
The residual is decreasing quickly, so we appear to be getting much closer to the root!
Given a function \(f\), its derivative \(f'\), and an initial value \(x_1\), iteratively define \[x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}, \quad\quad k = 1, 2, \cdots.\]
First, note that this is a special case of the fixed-point iteration! Define \(g(x) = x - \frac{f(x)}{f'(x)}\). When we identify a root, \(x\), where \(f(x) = 0\), we have a fixed point of \(g\), which is the function defining the Newton update.
The previous example also suggests why Newton’s method might converge to a root – as we zoom in on the function \(f\), the tangent line and the graph of the differentiable function \(f\) must become more and more similar. However, we don’t yet know that it will converge or how quickly!
Assume that the sequence \(x_k\) converges to limit \(r\) which is a root, \(f(r) = 0\). Define again the error \(\epsilon_k = x_k - r\) for \(k = 1, 2, \cdots\).
We can rewrite the update in terms of the \(\epsilon\) sequence as \[\epsilon_{k+1} + r = \epsilon_k + r - \frac{f(r+\epsilon_k)}{f'(r + \epsilon_k)}.\] Now, note that we know \(|\epsilon_k| \rightarrow 0\), so we can use a Taylor expansion of \(f\) about \(x = r\) to show \[\epsilon_{k+1} = \epsilon_k - \frac{f(r) +\epsilon_k f'(r) + \frac12 \epsilon_k^2f''(r) + O(\epsilon_k^3)}{f'(r) + \epsilon_k f''(r) + O(\epsilon_k^2)}.\]
Using that \(f(r) = 0\) and dividing through the numerator and denominator by \(f'(r)\), we have \[\epsilon_{k+1} = \epsilon_k - \epsilon_k \left[ 1 + \frac12 \frac{f''(r)}{f'(r)}\epsilon_k + O(\epsilon_k^2) \right] \left[ 1 + \frac{f''(r)}{f'(r)}\epsilon_k + O(\epsilon_k^2) \right]^{-1}.\]
The denominator term is of the form \(1/(1+z)\) and provided \(|z| < 1\), this is the limit of \(1 - z + z^2 - z^3 + \cdots\). Cutting terms off at the quadratic term we have \[\epsilon_{k+1} = \epsilon_k - \epsilon_k \left[ 1 + \frac12 \frac{f''(r)}{f'(r)}\epsilon_k + O(\epsilon_k^2) \right] \left[ 1 - \frac12 \frac{f''(r)}{f'(r)}\epsilon_k + O(\epsilon_k^2) \right] = \frac12 \frac{f''(r)}{f'(r)}\epsilon_k^2 + O(\epsilon_k^3).\]
Fact:
Asymptotically, each iteration of Newton’s method roughly squares the error.
Definition: Quadratic convergence
Suppose a sequence \(x_k\) approaches a limit \(x^*\). If the error \(\epsilon_k = x_k - x^*\) satisfies \[\lim_{k\rightarrow \infty} \frac{|\epsilon_{k+1}|}{|\epsilon_k|^2} = L\] for a positive constant \(L\), then the sequence has quadratic convergence to the limit.
Quadratic convergence is an example of superlinear convergence.
Fact:
While practically linear convergence trends towards a straight line on a log-linear plot of the error, when the convergence is quadratic, no such straight line exists – the convergence keeps getting steeper.
1-element Vector{Float64}:
0.852605502013726
5-element Vector{Float64}:
1.0
0.8678794411714423
0.8527833734164099
0.8526055263689221
0.852605502013726
The error reaches \(\epsilon_{\text{mach}}\) quickly, so we use extended precision (and software emulation arithmetic) to do the calculations for a few more iterations.
0.8526055020137254913464724146953174668984533001514035087721073946525150656742605
6-element Vector{Float64}:
2.184014482339964
2.0648638810676476
2.0302996897413403
2.014917265833641
2.007403413131773
2.003688054470438
The above convergence to 2 constitutes good evidence of quadratic convergence!
Fact:
In our derivations above, we have made some assumptions – these are requirements for \(f\):
"""
newton(f,dfdx,x1[;maxiter,ftol,xtol])
Use Newton's method to find a root of 'f' starting from 'x1', where 'dfdx' is the derivative of 'f'.
Returns a vector of root estimates.
The optional keyword parameters set the maximum number of iterations and the stopping tolerance for
values of 'f' and changes in 'x'.
"""
function newton(f,dfdx,x1;maxiter=40,ftol=100*eps(),xtol=100*eps())
# maxiter, ftol, and xtol are keyword parameters
# the default values are given but others may be entered by a user
x = [float(x1)]
y = f(x1)
delx = Inf
k = 1
while (abs(delx) > xtol) && (abs(y) > ftol)
dydx = dfdx(x[k])
delx = -y/dydx # Newton step
push!(x,x[k]+delx) # append iterate
k += 1
y = f(x[k])
if k==maxiter
@warn "Maximum number of iterations reached."
break # exit loop
end
end
return x
endnewton
We use three stopping criterions here:
One of the biggest challenges to use Newton’s method is that you must provide (or approximate) \(f'\). However, the following observation yields a way to avoid this drawback!
Fact:
When a step produces an approximate result, you are free to carry it out approximately.
Instead of using the tangent line, we can use the secant line!
y3 = f(x3) = -0.17768144843679456
This is the secant method! Given function \(f\) and two initial values \(x_1\) and \(x_2\), define \[x_{k+1} = x_k - \frac{f(x_k)(x_k - x_{k-1})}{f(x_k) - f(x_{k-1})}, \quad\quad k = 2, 3, \cdots.\]
"""
secantmethod(f,x1,x2[;maxiter,ftol,xtol])
Use the secant method to find a root of 'f' starting from 'x1' and 'x2'.
Returns a vector of root estimates.
"""
function secantmethod(f,x1,x2;maxiter=40,ftol=100*eps(),xtol=100*eps())
x = [float(x1),float(x2)]
y1 = f(x1)
delx, y2 = Inf, Inf
k = 2
while (abs(delx) > xtol) && (abs(y2) > ftol)
y2 = f(x[k])
delx = -y2 * (x[k]-x[k-1]) / (y2-y1) # secant step
push!(x,x[k]+delx) # append new estimate
k += 1
y1 = y2
if k==maxiter
@warn "Maximum number of iterations reached."
break # exit loop
end
end
return x
endsecantmethod
An annoying Taylor series expansion calculation like we did for Newton’s method show that, for the errors for the secant method, \[\epsilon_{k+1} = \frac12 \frac{f''(r)}{f'(r)} \epsilon_k \epsilon_{k-1}.\]
Guessing that \(\epsilon_{k+1} = c(\epsilon_k)^\alpha\), the above equation is \[[\epsilon_{k-1}^\alpha]^\alpha \approx C \epsilon_{k-1}^{\alpha + 1}\] and thus we must have \(\alpha^2 = \alpha + 1\) which has positive solution \[\alpha = \frac{1 + \sqrt{5}}{2} \approx 1.618.\]
Definition: Superlinear convergence
Suppose a sequence \(x_k\) approaches limit \(x^*\). If the error sequence \(\epsilon_k = x_k - x^*\) satisfies \[\lim_{k \rightarrow \infty} \frac{|\epsilon_{k+1}|}{|\epsilon_k|^\alpha} = L\] for constants \(\alpha > 1\) and \(L > 0\), then the sequence has superlinear convergence with rate \(\alpha\).
12-element Vector{Float64}:
-0.14739449798627452
0.3526055020137255
0.04223372706144885
-0.013026425327222755
0.00042747994131549927
4.269915586133851e-6
-1.4054770126368277e-9
4.620323656624992e-15
4.999480931132388e-24
-1.7783862252641536e-38
6.845099610444838e-62
0.0
Fact:
If function evaluations are used to measure computational work, the secant iteration converges more rapidly than Newton’s method.
There are two additional ideas used to strengthen the type of method we’ve studied in this lecture.