Code
= x -> (x-1)*(x-2); # function of which we wish to find a root f
Many problems in engineering, various sciences, data science, and machine learning can be rephrased as finding a root of a given function! It is so important an archetypical problem that we study a variety of methods for this generic problem formulation, but you will see that many may be familiar from specific applications where they may go under different names.
In Calculus, you have likely already encountered such a problem. In optimization, we often seek a stationary point of a given objective function as candidates for the maximizer or minimizer of a given function \(L(\mathbf{x})\). Mathematically, this is seeking \(\mathbf{x}\) so that \[\nabla L(\mathbf{x}) = \mathbf{0}.\]
In this class, we’ll focus on the case where the rootfinding problem is defined by continuous scalar function \(f\) of a scalar variable; that is, we seek \(r\) so that \[f(r) = 0.\]
In the rootfinding problem, the data is the function \(f\), and the result is a root. How does the result change in response to perturbations in \(f\)?
The perturbation to the function could be due to rounding in the evaluation of values of \(f\), or evaluating \(f\) may require computation via an inexact algorithm (sometimes evaluating relatively simple functions require complicated algorithms).
Assum4 \(f\) has at least one continous derivative near a roor \(r\). Suppose \(f\) is perturbed to \(\tilde{f}(x) = f(x) + \epsilon\) (constant perturbation). As a result, the root (if it exists) will be perturbed to \(\tilde{r} = r + \delta\) such that \(\tilde{f}(\tilde{r}) = 0\). We compute here an absolute condition number \(\kappa_r\), which is the ratio \(|\delta/\epsilon|\) as \(\epsilon \rightarrow 0\).
Using Taylor’s theorem, we have \[0 = \tilde{f}(\tilde{r}) = f(r + \delta) + \epsilon \approx f(r) + f'(r)\delta + \epsilon = f'(r)\delta + \epsilon.\]
Let’s see what this condition number looks like visually!
= x -> (x-1)*(x-2); # function of which we wish to find a root f
= [0.8, 1.2]
interval plot(f,interval...,ribbon=0.03, aspect_ratio=1, xlabel="x",yaxis=("f(x)",[-0.2,0.2]))
scatter!([1],[0],title="Well-conditioned root")
The possible values for the perturbed root lie within the intersection of the ribbon with the \(x\)-axis. The width of this region is similar in length to the vertical thickness of the ribbon.
= x -> (x-1)*(x-1.01); #a new function for us to rootfind! f
plot(f, interval...,ribbon=0.03,aspect_ratio=1, xlabel="x",yaxis=("f(x)",[-0.2,0.2]))
scatter!([1],[0],title="Poorly-conditioned root")
The bound on the constant perturbation to \(f\) (the vertical width of the band) is the same as before, but now the potential displacement of the root (the horizontal width of the intersection of the band with the \(x\)-axis) is much wider! In fact, the root could even cease to exist under possible perturbations.
If \(|f'|\) is small at the root, it may not be possible to get a small error in a computed estimate of the root. We can’t measure this error, but, as usual, can measure the residual.
Define \(g(x) = f(x) - f(\tilde{r})\) and note that \(\tilde{r}\) is a root of \(g\). Next, note then that \(f(\tilde{r}) = g(x) - f(x)\), so we see that the residual is the distance of \(f\) to an exactly solved rootfinding problem. This is the backward error!
To summarize – we can’t expect a small error in a root approximation if the condition number is large, but we can gauge the backward error from the residual!
We typically employ iterative methods to approximate a solution to the rootfinding problem. The first method relies on a reformulation of the rootfinding problem as a fixed-point problem.
We may pass back and forth between equivalent fixed-point and rootfinding problems (meaning they have the same set of solutions).
The reason that we are interested in transforming our problem to a fixed-point problem is that there is a very simple method for approximating a solution. This method is known as the fixed-point iteration or fixed-point iterative method.
Given function \(g\) and initial value \(x_1\), define \[x_{k+1} = g(x_k), \quad\quad\quad k = 1, 2, \cdots.\]
Let’s apply this technique to an equivalent fixed-point problem derived from a rootfinding problem for a quadratic polynomial \(f(x)\).
= Polynomial([3.5,-4,1])
f = roots(f)
r = extrema(r)
rmin,rmax @show rmin,rmax;
(rmin, rmax) = (1.2928932188134525, 2.7071067811865475)
= x -> x - f(x) g
#43 (generic function with 1 method)
= plot([g x->x],2,3,l=2,label=["y=g(x)" "y=x"], xlabel="x", ylabel="y", aspect_ratio=1,title="Finding a fixed point", legend=:bottomright) plt
= 2.1;
x = g(x) y
2.59
\(x = 2.1\) is not a fixed-point, but \(y = g(x)\) is much closer to the fixed-point near 2.7 than \(x\)! Let’s take this as our next \(x\) value.
plot!([x,y],[y,y],arrow=true,color=3,label="")
Now, we compute another new \(x\) value by calculating \(y = g(x)\) and taking this as our new \(x\) value.
= y; y = g(x)
x plot!([x,x],[x,y], arrow=true, color = 4,label="")
plot!([x,y],[y,y], arrow=true, color = 3,label="")
We can repeat this a few times to see what happens!
for k = 1:5
= y
x = g(x)
y plot!([x,x],[x,y],color=4,label="");
plot!([x,y],[y,y],color=3,label="")
end
plt
Let’s measure the error in our current approximation of the fixed-point!
abs(y-rmax)/rmax
0.0001653094344995643
Now, let’s try to find the other fixed point near \(1.29\) using this method.
= plot([g x->x],1,2,l=2,label=["y=g(x)" "y=x"],aspect_ratio=1,xlabel="x",ylabel="y",title="Divergence",legend=:bottomright)
plt
= 1.3; y=g(x);
x =false
arrowfor k = 1:5
plot!([x,y],[y,y],arrow=arrow,color=3)
= y
x = g(x)
y plot!([x,x],[x,y],arrow=arrow,color=4)
if k>2; arrow=true; end
end
plt
This time the fixed-point iterative method is making worse and worse approximations to the fixed-point that are moving away from the solution.
What do you notice that is different in this case?
Suppose fixed-point \(p\) is the desired limit of an iteration \(x_1, x_2, \cdots\). Consider the error sequence \(\epsilon_1, \epsilon_2, \cdots\) where \(\epsilon_k := x_k - p\).
By the definition of the iteration, we have \[\epsilon_{k+1} + p = g(\epsilon_k + p) = g(p) + g'(p)\epsilon_k + \frac12 g''(p) \epsilon_k^2 + \cdots,\] assuming that \(g\) has at least two continuous derivatives.
Since \(g(p) = p\), we have \[\epsilon_{k+1} = g'(p)\epsilon_k + O(\epsilon_k^2).\]
Now, if the iteration is to converge to \(p\), we must have the errors converging to \(0\). If this is the case, we can neglect the second quadratic term and conclude that \(\epsilon_{k+1} \approx g'(p) \epsilon_k\). This will be convergent if \(|g'(p)| < 1\), but we see that the errors must grow (and not vanish) if \(|g'(p)| >1\).
In our previous example, we have \(g(x) = x - (x^2 - 4x + 3.5) = -x^2 + 5x - 3.5\) and \(g'(x) = -2x + 5\). Near \(p = 2.71\) (the convergent fixed point), we have \(g'(p) \approx - 0.42\), indicating convergence. Near \(p = 1.29\) (the divergent fixed point), we get \(g'(p) \approx 2.42\) which is consistent with the observed divergence.
Our prediction of the convergence of the error, given our work above, is that the errors will approximately satisfy \[|\epsilon_{k+1}| = \sigma |\epsilon_k|\] for \(\sigma = |g'(p)| < 1\). This is known as linear convergence.
= x -> x - f(x)
g = [2.1]
x for k = 1:12
push!(x, g(x[k]))
end
x
13-element Vector{Float64}:
2.1
2.59
2.7419000000000002
2.69148439
2.713333728386328
2.7044887203327885
2.7081843632566587
2.7066592708954196
2.7072919457529734
2.7070300492259465
2.707138558717502
2.707093617492436
2.7071122335938966
= @. abs(x - rmax)/rmax
err
plot(0:12,err,m=:o, xaxis="iteration number",yaxis=("error",:log10),title="Convergence of fixed-point iteration")
= log.(err[5:12])
y = Polynomials.fit(5:12,y,1) #fit a linear function to the logarithm of the error sequence (i.e., fit to the data in the previous plot)
p = exp(p.coeffs[2]) sig
0.41448513854854707
+1]/err[i] for i in 8:11 ] [ err[i
4-element Vector{Float64}:
0.41376605208171086
0.4143987269383
0.4141368304124451
0.4142453399049934
These agree well!
A function satisfying a Lipschitz condition (for any \(L\)) is known to be continuous on \(S\). Do you see why?
If \(L < 1\), we call \(g\) a contraction mapping because applying \(g\) decreases distances between points.