So far, we have considered only rootfinding problems defined for scalar functions of scalar variables.
Nonlinear systems
We will consider here the rootfinding problem defined by \(\mathbf{f}: \mathbb{R}^n \rightarrow \mathbb{R}^n\).
Definition: Rootfinding problem
Given a continuous function \(\mathbf{f}\) of a variable input \(\mathbf{v}\), the rootfinding problem is to find a real input \(\mathbf{r}\), called a root such that \[\mathbf{f}(\mathbf{r}) = \mathbf{0}\] or equivalently,
The steady state of interactions between the population \(w(t)\) of a predator species and the population \(h(t)\) of a prey species might be modeled as
This fits into the form of the rootfinding problem by defining \(\mathbf{x} = [h, w]\), \[f_1(x_1,x_2) = ax_1 - bx_1 x_2,\] and \[f_2(x_1,x_2) = -cx_2 + dx_1 x_2.\]
Linear model
We use the same principle as in the scalar case: construct a simpler model of the exact nonlinear function.
We begin with a linar model based on the multidimensional Taylor series, \[\mathbf{f}(\mathbf{x} + \mathbf{h}) = \mathbf{f}(\mathbf{x}) + \mathbf{J}(\mathbf{x})\mathbf{h} + O(\|\mathbf{h}\|^2),\] where \(\mathbf{J}\) is the Jacobian matrix of \(\mathbf{f}\) defined as \[\mathbf{J}(\mathbf{x}) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \end{bmatrix}.\]
Multidimensional Newton method
To derive Newton’s method in this case, we assume \(\mathbf{x}_k\) is reasonable approximation to a root \(\mathbf{x}\) and define \(\mathbf{h} = \mathbf{x} - \mathbf{x}_k\).
The linear model then becomes \[\mathbf{f}(\mathbf{x}) \approx \mathbf{q}(\mathbf{x}) = \mathbf{f}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k)(\mathbf{x} - \mathbf{x}_k).\]
Now, we note that \(\mathbf{f}(\mathbf{x}) = \mathbf{0}\) and require that \(\mathbf{q}(\mathbf{x}_{k+1}) = \mathbf{0}\), \[\mathbf{0} = \mathbf{f}(\mathbf{x}_k) + \mathbf{J}(\mathbf{x}_k)(\mathbf{x}_{k+1} - \mathbf{x}_k)\] and thus, \[\mathbf{x}_{k+1} = \mathbf{x}_k - \left[\mathbf{J}(\mathbf{x}_k)\right]^{-1} \mathbf{f}(\mathbf{x}_k).\]
The multidimensional Newton’s method consists of three steps performed iteratively. Given \(\mathbf{f}\) and starting value \(\mathbf{x}_1\), for each \(k = 1, 2, 3, \cdots,\)
Compute \(\mathbf{y}_k = \mathbf{f}(\mathbf{x}_k)\) and \(\mathbf{A}_k = \mathbf{J}(\mathbf{x}_k)\).
Solve the linear system \(\mathbf{A}_k \mathbf{s}_k = -\mathbf{y}_k\) for the Newton step\(\mathbf{s}_k\).
Let \(\mathbf{x}_{k+1} = \mathbf{x}_k + \mathbf{s}_k\).
One can analyze the multidimensional Newton’s method by generalizing the analysis we performed for the scalar Newton’s method. This shows that the multidimensional Newton’s method will be quadratically convergent in any vector norm, under suitable circumstances and when the method converges at all.
Code
""" newtonsys(f,jac,x1[;maxiter,ftol,xtol])Use Newton's method to find a root of a system of equations, starting from 'x1'.The functions 'f' and 'jac' should return the residual vector and the Jacobianmatrix, respectively. Returns the history of root estimates as a vector of vectors.The optional keyword parameters set the maximum number of iterations and the stopping tolerance for values of 'f' and changes in 'x'."""functionnewtonsys(f,jac,x1;maxiter=40,ftol=100*eps(),xtol=1000*eps()) x = [float(x1)] y, J =f(x1), jac(x1) delx =Inf# for initial pass below k =1while (norm(delx) > xtol) && (norm(y) > ftol) delx =-(J\y) # Newton steppush!(x, x[k] + delx) # append to history k +=1 y, J =f(x[k]), jac(x[k])if k == maxiter@warn"Maximum number of iterations reached."breakendendreturn xend
Newton’s method is a very important algorithmic approach, but there are some drawbacks. Calculating the Jacobian in every iteration is very costly, and the fact that the method diverges from many starting points is very problematic. The quasi-Newton methods modify the method to try to overcome these challenges.
Jacobian by finite differences
In the scalar case, we developed the secant method by replacing the derivative \(f'(x_k)\) by the difference quotient \[\frac{f(x_k) - f(x_{k-1})}{x_k - x_{k-1}}.\] If \(x_k\) converges to a root \(r\), then this quotient converges to \(f'(r)\).
In the system case, replacing the Jacobian is more complicated, due to the number of variables. The replacement will approximate the \(j\)th column, \[\mathbf{J}(\mathbf{x})\mathbf{e}_j = \begin{bmatrix} \frac{\partial f_1}{\partial x_j} \\ \frac{\partial f_2}{\partial x_j} \\ \vdots \\ \frac{\partial f_n}{\partial x_j} \end{bmatrix} \approx \frac{\mathbf{f}(\mathbf{x} + \delta \mathbf{e}_j) - \mathbf{f}(\mathbf{x})}{\delta}, \quad\quad j = 1, \cdots, n.\]
Code
""" fdjac(f,x0[,y0])Compute a finite-difference approximation of the Jacobian matrix for 'f' at 'x0',where 'y0 = f(x0)' may be given."""functionfdjac(f,x0,y0=f(x0)) δ =sqrt(eps())*max(norm(x0),1) # FD step size m,n =length(y0),length(x0)if n ==1 J = (f(x0+δ) - y0) / δelse J =zeros(m,n) x =copy(x0)for j in1:n x[j] += δ J[:,j] = (f(x) - y0) / δ x[j] -= δendendreturn Jend
fdjac
Broyden’s update
Note that the finite-difference approximation to the Jacobian requires \(n\) additional function evaluations in each iteration, which may be too costly, especially since the function and Jacobian are supposed to change little as the iterates get close to the root.
Now, we’ll seek a cheaper approximation to the Jacobian. The Newton iteration is defined by \[\mathbf{A}_k\mathbf{s}_k = -\mathbf{y}_k\] where \(\mathbf{A}_k\) is the Jacobian.
We now assume the above linear system defines \(\mathbf{s}_k\) but with \(\mathbf{A}_k\)approximating the Jacobian. Once we use \(\mathbf{s}_k\) to obtain \(\mathbf{x}_{k+1}\), we need to update the approximate Jacobian to a new matrix \(\mathbf{A}_{k+1}\). We’ll assume \[\mathbf{y}_{k+1} - \mathbf{y}_k = \mathbf{A}_{k+1}(\mathbf{x}_{k+1} - \mathbf{x}_k) = \mathbf{A}_{k+1}\mathbf{s}_k.\]
Thus, we require \[\mathbf{A}_{k+1} \mathbf{s}_k = \mathbf{y}_{k+1} - \mathbf{y}_k.\] This is not enough to uniquely define \(\mathbf{A}_{k+1}\), but we’ll also require that \(\mathbf{A}_{k+1} - \mathbf{A}_k\) is rank-one.
If the method using this approximation fails to make improvement, one can reinitialize the \(\mathbf{A}\) matrix by finite differences.
Levenberg’s method
An important principle is that we can always check to see if a potential step reduces the backward error (\(\|\mathbf{f}\|\)) before accepting (or rejecting!) a potential step.
We will use here an alternative step defined by \[(\mathbf{A}_k^\top \mathbf{A}_k + \lambda \mathbf{I})\mathbf{s}_k = - \mathbf{A}_k^\top \mathbf{f}_k.\]
Given \(\mathbf{f}\), a starting value \(\mathbf{x}_1\), and a scalar \(\lambda\), for each \(k = 1, 2, 3, \cdots,\)
Compute \(\mathbf{y}_k = \mathbf{f}(\mathbf{x}_k)\), and let \(\mathbf{A}_k\) be an exact or approximate Jacobian matrix.
Solve the linear system above for \(\mathbf{s}_k\).
Let \(\hat{\mathbf{x}} = \mathbf{x}_k + \mathbf{s}_k\).
If the residual is reduced at \(\hat{\mathbf{x}}\), then let \(\mathbf{x}_{k+1} = \hat{\mathbf{x}}\).
Update \(\lambda\) and update \(\mathbf{A}_k\) to \(\mathbf{A}_{k+1}\).
When \(\lambda = 0\), the system defining \(\mathbf{s}_k\) is equivalent to the usual linear model (like used in Newton’s method).
As \(\lambda \rightarrow \infty\), the system approaches \[\lambda \mathbf{s}_k = - \mathbf{A}_k^\top \mathbf{f}_k.\] In this case, if \(\mathbf{A}_k = \mathbf{J}(\mathbf{x}_k)\) then \(\mathbf{s}_k\) is a steepest descent step for the objective \(\|\mathbf{f}(\mathbf{x})\|^2\).
The \(\lambda\) parameter defines a smooth transition between the Newton step (with very rapid convergence near a root but potential divergence) and a gradient descent step (with guaranteed progress when far from a root).
Code
""" levenberg(f,x1[;maxiter,ftol,xtol])Use Levenberg's quasi-Newton iteration to find a root of the system 'f'starting from 'x1'. Returns the history of root estimates as a vectorof vectors.The optional keyword parameters set the maximum number of iterations andthe stopping tolerance for values of 'f' and changes in 'x'."""functionlevenberg(f,x1;maxiter=40,ftol=1e-12,xtol=1e-12) x = [float(x1)] yk =f(x1) k =1; s =Inf; A =fdjac(f,x[k],yk) # start with FD Jacobian jac_is_new =true λ =10;while (norm(s) > xtol) && (norm(yk) > ftol)# Compute the proposed step. B = A'*A + λ*I z = A'*yk s =-(B\z) x̂ = x[k] + s ŷ =f(x̂)# Do we accept the result?ifnorm(ŷ) <norm(yk) # accept λ = λ/10# get closer to Newton# Broyden update of the Jacobian A += (ŷ - yk - A*s)*(s'/norm(s)^2) jac_is_new =falsepush!(x,x̂) yk = ŷ k +=1else# don't accept# Get closer to gradient descent. λ =4λ# Re-initialize the Jacobian if its out of date.if !jac_is_new A =fdjac(f,x[k],yk) jac_is_new =trueendendif k==maxiter@warn"Maximum number of iterations reached."breakendendreturn xend