3.872983346207417
Measuring magnitude and length is fundamental to vector and matrix analysis, and will be fundamental to our analysis of numerical methods. Recall that a norm \(\|\cdot\|\) is a real-valued function defined over a vector space with the following properties for all vectors and scalars \(\alpha\):
Perhaps the most commonly encountered vector norms on \(\mathbb{R}^n\) are these three:
We say that a sequence of vectors \(\mathbf{x}_1, \mathbf{x}_2, \cdots\) converges to \(\mathbf{x}\) if \[\lim_{k \rightarrow \infty} \|\mathbf{x}_k - \mathbf{x}\| = 0.\]
This will be very important as we begin to study iterative methods!
Theorem: Norm equivalence
In a finite-dimensional space, convergence in any norm implies convergence in all norms.
As you may recall from Linear Algebra, the space of real-valued matrices of a given size define a vector space. There are many norms for this vector space. One such norm is quite interesting since it has a nice interpretation.
This norm is the Frobenius norm and it is defined as \[\|\mathbf{A}\|_F = \sqrt{\sum_{i,j} A_{ij}^2}.\]
However, note that this norm does not inherently involve the action of the matrix as an operator. There are other matrix norms which do, and these are sometimes more useful.
Definition: Induced (natural) matrix norms
Given a vector norm \(\|\cdot\|\), the induced or natural matrix norm for any \(m \times n\) matrix \(\mathbf{A}\) is \[\|\mathbf{A}\| = \max_{\|\mathbf{x}\| = 1} \|\mathbf{A}\mathbf{x}\| = \max_{\mathbf{x} \not= 0} \frac{\|\mathbf{A}\mathbf{x}\|}{\|\mathbf{x}\|}.\]
The induced norm definition causes these norms to satisfy some useful inequalities:
Theorem: Norm inequalities
Let \(\|\cdot\|\) designate a matrix norm and the vector norm that induced it. Then for all matrices and vectors of compatible sizes, \[\|\mathbf{A}\mathbf{x}\| \le \|\mathbf{A}\| \|\mathbf{x}\|.\] For all matrices of compatible sizes, \[\|\mathbf{A}\mathbf{B}\| \le \|\mathbf{A}\| \|\mathbf{B}\|.\]
Exercise: Prove these inequalities!
The induced matrix \(\infty\)- and \(1\)-norms can be equivalently defined in terms of the entries of the matrix.
Theorem: Matrix \(\infty\)- and \(1\)-norms
\[\|\mathbf{A}\|_\infty = \max_{1 \le i \le n} \sum_{j=1}^n |A_{ij}|\]
\[\|\mathbf{A}\|_1 = \max_{1 \le j \le n} \sum_{i=1}^n |A_{ij}|\]
We can also see that the entry-wise definitions of the \(\infty\)- and \(1\)-norms are equivalent to their induced norm definition.
Now, we’ll try to construct a geometric interpretation of the \(\ell_2\) norm.
Now, the function \(\mathbf{f}(\mathbf{x}) = \mathbf{A} \mathbf{x}\) defines a mapping from \(\mathbb{R}^2\) to \(\mathbb{R}^2\). Let’s see what this does to the vectors in the unit circle!
We’ve talked a bit last week about direct methods for solving linear systems of equations. There is another class of methods known as iterative methods which use an iterative sequence of steps to make incremental improvement of an approximate solution to the system.
These methods typically use information from some subset of the system to make iterative improvements to the approximate solution (iterate).
The first method hinges on the observation that if \(\mathbf{A}\mathbf{x} = \mathbf{b}\) then \[A_{i1}x_1 + A_{i2}x_2 + \cdots + A_{ii}x_i + \cdots + A_{in} x_n = b_i \text{ for } i = 1, \cdots, n.\]
This can be arranged as \[x_i = \frac{1}{A_{ii}} \left[ \sum_{j = 1\\j\not= i}^n (-A_{ij}x_j) + b_i \right] \text{ for } i=1, \cdots, n.\]
Jacobi’s method uses this observation iteratively, forcing the \(i\)th component of the current approximation to satisfy the \(i\)th equation, that is \(\mathbf{x}^{(k)}\) satisfies \[x_i^{(k)} = \frac{1}{A_{ii}} \left[ \sum_{j = 1\\j\not= i}^n (-A_{ij}x_j^{(k-1)}) + b_i \right] \text{ for } i=1, \cdots, n.\]
Exercise: Sketch Jacobi’s method
Sketch two iterations of Jacobi’s method applied to the system visualized below with the given initial iterate, \(\mathbf{x}^{(0)}\).
We can collect all of the updates to the coordinates of \(\mathbf{x}^{(k)}\) together into a vector update.
Note that we can rewrite \(\mathbf{A}\mathbf{x} = \mathbf{b}\) as \[\mathbf{D}(\mathbf{A})\mathbf{x} + \mathbf{L}(\mathbf{A})\mathbf{x} + \mathbf{U}(\mathbf{A})\mathbf{x} = \mathbf{b}\] where \(\mathbf{D}(\mathbf{A})\) is the diagonal matrix containing the diagonal elements of \(\mathbf{A}\), \(\mathbf{L}(\mathbf{A})\) is the strictly lower-triangular matrix containing the strictly lower-triangular elements of \(\mathbf{A}\), and \(\mathbf{U}(\mathbf{A})\) is the strictly upper-triangular matrix containing the strictly upper-triangular elements of \(\mathbf{A}\).
In this notation, the Jacobi iterates satisfy \[\mathbf{D}(\mathbf{A})\mathbf{x}^{(k)} + [\mathbf{L}(\mathbf{A}) + \mathbf{U}(\mathbf{A})] \mathbf{x}^{(k-1)} = \mathbf{b} \]
and so, \[\mathbf{x}^{(k)} = -\mathbf{D}(\mathbf{A})^{-1} [\mathbf{L}(\mathbf{A}) + \mathbf{U}(\mathbf{A})] \mathbf{x}^{(k-1)} + \mathbf{D}(\mathbf{A})^{-1} \mathbf{b}.\]
"""
jacobi(A,b,x_0,N)
Run N iterations of Jacobi's method on system Ax=b with initial iterate x_0.
"""
function jacobi(A,b,x_0,N,x_true)
D = Diagonal(A);
L = tril(A,-1);
U = triu(A,1);
x_old = x_0
errs = zeros(N+1)
errs[1] = norm(x_old - x_true)/norm(x_true)
for i in 1:N
x_new = D\(-(L+U)*x_old + b);
errs[i+1] = norm(x_new - x_true)/norm(x_true);
x_old = x_new;
end
return x, errs
endjacobi
Now, we’ll run a small experiment to check that our code works!
We can possibly improve Jacobi’s method by taking a second look at the equation \[x_i^{(k)} = \frac{1}{A_{ii}} \left[ \sum_{j = 1\\j\not= i}^n (-A_{ij}x_j^{(k-1)}) + b_i \right] \text{ for } i=1, \cdots, n.\]
If we think about doing the updates to the entries of \(\mathbf{x}^{(k)}\) sequentially, we see that we update \(x^{(k)}_1\) before updating \(x^{(k)}_2\), but in the update of \(x^{(k)}_2\), we use \(x^{(k-1)}_1\), despite having the (likely better) estimate for the first entry computed in \(x^{(k)}_1\).
This motivates using the most up-to-date estimates available in this update: \[x_i^{(k)} = \frac{1}{A_{ii}} \left[ \sum_{j = 1}^{i-1} (-A_{ij}x_j^{(k)}) \sum_{j = i+1}^{n} (-A_{ij}x_j^{(k-1)})+ b_i \right] \text{ for } i=1, \cdots, n.\]
Exercise: Sketch the Gauss-Seidel method
Sketch two iterations of the Gauss-Seidel method applied to the system visualized below with the given initial iterate, \(\mathbf{x}^{(0)}\).
In the same way we did for Jacobi, we can succinctly write the Gauss-Seidel update in matrix-vector form.
Noting, now, that the update \[x_i^{(k)} = \frac{1}{A_{ii}} \left[ \sum_{j = 1}^{i-1} (-A_{ij}x_j^{(k)}) \sum_{j = i+1}^{n} (-A_{ij}x_j^{(k-1)})+ b_i \right] \text{ for } i=1, \cdots, n\] involves the entire lower triangular part of \(\mathbf{A}\) interacting with entries of \(\mathbf{x}^{(k)}\),
we can rewrite this as \[[\mathbf{D}(\mathbf{A}) + \mathbf{L}(\mathbf{A})]\mathbf{x}^{(k)} + \mathbf{U}(\mathbf{A}) \mathbf{x}^{(k-1)} = \mathbf{b}.\]
Thus, the update is \[\mathbf{x}^{(k)} = -[\mathbf{D}(\mathbf{A}) + \mathbf{L}(\mathbf{A})]^{-1} \mathbf{U}(\mathbf{A}) \mathbf{x}^{(k-1)} + [\mathbf{D}(\mathbf{A}) + \mathbf{L}(\mathbf{A})]^{-1} \mathbf{b}.\]
Exercise: Implement the Gauss-Seidel method
Complete the Gauss-Seidel method code below and run this method on the same experimental system above.
"""
gaussseidel(A,b,x_0,N)
Run N iterations of the Gauss-Seidel method on system Ax=b with initial iterate x_0.
"""
function gaussseidel(A,b,x_0,N,x_true)
D = Diagonal(A);
L = tril(A,-1);
U = triu(A,1);
x_old = x_0
errs = zeros(N+1)
errs[1] = norm(x_old - x_true)/norm(x_true)
for i in 1:N
x_new = ############################################### complete this code!
errs[i+1] = norm(x_new - x_true)/norm(x_true);
x_old = x_new;
end
return x, errs
endgaussseidel
The Jacobi and Gauss-Seidel methods update using individual coordinate direction updates onto the hyperplanes defined by the individual equations. This is not the only way that the iterates can move towards these hyperplanes!
Perhaps the most natural updating strategy is to project each iterate onto a hyperplane to produce the subsequent iterate. This strategy is known as the Kaczmarz method.
The update for this type of method is given by \(\mathbf{x}^{(k)} = \mathbf{x}^{(k-1)} + \frac{b_i - \mathbf{a}_i^\top \mathbf{x}^{(k-1)}}{\|\mathbf{a}_i\|^2} \mathbf{a}_i\) where \(\mathbf{a}_i\) is the \(i\)th row of \(\mathbf{A}\) and \(i\) is chosen from \(1, \cdots, n\).
Exercise: Decreasing error
If \(\mathbf{x}^*\) satisfies \(\mathbf{A}\mathbf{x}^* = \mathbf{b}\), then \[\|\mathbf{x}^{(k)} - \mathbf{x}^*\|_2 \le \|\mathbf{x}^{(k-1)} - \mathbf{x}^*\|_2.\] Show that this is true!