Code
using LinearAlgebra
= [2, -3, 1, -1]
x = norm(x) twonorm
3.872983346207417
Measuring magnitude and length is fundamental to vector and matrix analysis, and will be fundamental to our analysis of numerical methods. Recall that a norm \(\|\cdot\|\) is a real-valued function defined over a vector space with the following properties for all vectors and scalars \(\alpha\):
Perhaps the most commonly encountered vector norms on \(\mathbb{R}^n\) are these three:
using LinearAlgebra
= [2, -3, 1, -1]
x = norm(x) twonorm
3.872983346207417
= norm(x,Inf) infnorm
3.0
= norm(x,1) onenorm
7.0
We say that a sequence of vectors \(\mathbf{x}_1, \mathbf{x}_2, \cdots\) converges to \(\mathbf{x}\) if \[\lim_{k \rightarrow \infty} \|\mathbf{x}_k - \mathbf{x}\| = 0.\]
This will be very important as we begin to study iterative methods!
As you may recall from Linear Algebra, the space of real-valued matrices of a given size define a vector space. There are many norms for this vector space. One such norm is quite interesting since it has a nice interpretation.
This norm is the Frobenius norm and it is defined as \[\|\mathbf{A}\|_F = \sqrt{\sum_{i,j} A_{ij}^2}.\]
This norm is what is by default computed by the norm
function in Julia.
One of the most interesting aspects of this norm is that we can view it as the \(\ell_2\) norm on a vectorization of the matrix. If you imagine stacking columns of \(\mathbf{A}\) to form a vector, then the \(\ell_2\) norm of this vector is equal to the Frobenius norm of the matrix.
Matrices are actually column-stacked when stored in memory in Julia – this is known as column-major order. MATLAB
is also column-major, while C
and Python
are row-major.
However, note that this norm does not inherently involve the action of the matrix as an operator. There are other matrix norms which do, and these are sometimes more useful.
The induced norm definition causes these norms to satisfy some useful inequalities:
The induced matrix \(\infty\)- and \(1\)-norms can be equivalently defined in terms of the entries of the matrix.
= [2 0; 1 -1] A
2×2 Matrix{Int64}:
2 0
1 -1
= norm(A) Fronorm
2.449489742783178
= opnorm(A) twonorm
2.2882456112707374
We can also see that the entry-wise definitions of the \(\infty\)- and \(1\)-norms are equivalent to their induced norm definition.
= opnorm(A,1) onenorm
3.0
maximum( sum(abs.(A),dims=1) )
3
= opnorm(A,Inf) infnorm
2.0
maximum( sum(abs.(A),dims=2) )
2
Now, we’ll try to construct a geometric interpretation of the \(\ell_2\) norm.
# sample a lot of vectors on the unit circle in R^2
= 2pi*(0:1/600:1)
theta = [ fun(t) for fun in [cos,sin], t in theta ]; #what a cool comprehension! x
using Plots
plot(aspect_ratio=1, layout=(1,2), xlabel="x_1", ylabel="x_2") #creates a "layout" -- subsequent plot! calls actually add the individual subplots
plot!(x[1,:],x[2,:], subplot=1,title="Unit circle")
Now, the function \(\mathbf{f}(\mathbf{x}) = \mathbf{A} \mathbf{x}\) defines a mapping from \(\mathbb{R}^2\) to \(\mathbb{R}^2\). Let’s see what this does to the vectors in the unit circle!
= A*x; Ax
plot!(Ax[1,:],Ax[2,:],subplot=2,title="Image under x -> Ax")
plot!(twonorm*x[1,:],twonorm*x[2,:], subplot=2,l=:dash)
We’ve talked a bit last week about direct methods for solving linear systems of equations. There is another class of methods known as iterative methods which use an iterative sequence of steps to make incremental improvement of an approximate solution to the system.
These methods typically use information from some subset of the system to make iterative improvements to the approximate solution (iterate).
The first method hinges on the observation that if \(\mathbf{A}\mathbf{x} = \mathbf{b}\) then \[A_{i1}x_1 + A_{i2}x_2 + \cdots + A_{ii}x_i + \cdots + A_{in} x_n = b_i \text{ for } i = 1, \cdots, n.\]
This can be arranged as \[x_i = \frac{1}{A_{ii}} \left[ \sum_{j = 1\\j\not= i}^n (-A_{ij}x_j) + b_i \right] \text{ for } i=1, \cdots, n.\]
Jacobi’s method uses this observation iteratively, forcing the \(i\)th component of the current approximation to satisfy the \(i\)th equation, that is \(\mathbf{x}^{(k)}\) satisfies \[x_i^{(k)} = \frac{1}{A_{ii}} \left[ \sum_{j = 1\\j\not= i}^n (-A_{ij}x_j^{(k-1)}) + b_i \right] \text{ for } i=1, \cdots, n.\]
You should also think about what happens here if the order of the equations is changed or if the initial iterate is somewhere else. Spoiler alert: Jacobi’s method doesn’t always converge!
We can collect all of the updates to the coordinates of \(\mathbf{x}^{(k)}\) together into a vector update.
Note that we can rewrite \(\mathbf{A}\mathbf{x} = \mathbf{b}\) as \[\mathbf{D}(\mathbf{A})\mathbf{x} + \mathbf{L}(\mathbf{A})\mathbf{x} + \mathbf{U}(\mathbf{A})\mathbf{x} = \mathbf{b}\] where \(\mathbf{D}(\mathbf{A})\) is the diagonal matrix containing the diagonal elements of \(\mathbf{A}\), \(\mathbf{L}(\mathbf{A})\) is the strictly lower-triangular matrix containing the strictly lower-triangular elements of \(\mathbf{A}\), and \(\mathbf{U}(\mathbf{A})\) is the strictly upper-triangular matrix containing the strictly upper-triangular elements of \(\mathbf{A}\).
In this notation, the Jacobi iterates satisfy \[\mathbf{D}(\mathbf{A})\mathbf{x}^{(k)} + [\mathbf{L}(\mathbf{A}) + \mathbf{U}(\mathbf{A})] \mathbf{x}^{(k-1)} = \mathbf{b} \]
and so, \[\mathbf{x}^{(k)} = -\mathbf{D}(\mathbf{A})^{-1} [\mathbf{L}(\mathbf{A}) + \mathbf{U}(\mathbf{A})] \mathbf{x}^{(k-1)} + \mathbf{D}(\mathbf{A})^{-1} \mathbf{b}.\]
"""
jacobi(A,b,x_0,N)
Run N iterations of Jacobi's method on system Ax=b with initial iterate x_0.
"""
function jacobi(A,b,x_0,N,x_true)
= Diagonal(A);
D = tril(A,-1);
L = triu(A,1);
U
= x_0
x_old = zeros(N+1)
errs 1] = norm(x_old - x_true)/norm(x_true)
errs[for i in 1:N
= D\(-(L+U)*x_old + b);
x_new +1] = norm(x_new - x_true)/norm(x_true);
errs[i= x_new;
x_old end
return x, errs
end
jacobi
Now, we’ll run a small experiment to check that our code works!
= 100
n #A = randn(n,n)
= diagm(ones(n)) + 0.05*randn(n,n)
A = ones(n)
x_true = A*x_true; b
= 100
N = jacobi(A,b,zeros(n),N,x_true); x, errs
plot(1:N+1,errs,yscale=:log10,label="Jacobi error",xaxis="Iteration (k)", yaxis="Relative error")
We can possibly improve Jacobi’s method by taking a second look at the equation \[x_i^{(k)} = \frac{1}{A_{ii}} \left[ \sum_{j = 1\\j\not= i}^n (-A_{ij}x_j^{(k-1)}) + b_i \right] \text{ for } i=1, \cdots, n.\]
If we think about doing the updates to the entries of \(\mathbf{x}^{(k)}\) sequentially, we see that we update \(x^{(k)}_1\) before updating \(x^{(k)}_2\), but in the update of \(x^{(k)}_2\), we use \(x^{(k-1)}_1\), despite having the (likely better) estimate for the first entry computed in \(x^{(k)}_1\).
This motivates using the most up-to-date estimates available in this update: \[x_i^{(k)} = \frac{1}{A_{ii}} \left[ \sum_{j = 1}^{i-1} (-A_{ij}x_j^{(k)}) \sum_{j = i+1}^{n} (-A_{ij}x_j^{(k-1)})+ b_i \right] \text{ for } i=1, \cdots, n.\]
In the same way we did for Jacobi, we can succinctly write the Gauss-Seidel update in matrix-vector form.
Noting, now, that the update \[x_i^{(k)} = \frac{1}{A_{ii}} \left[ \sum_{j = 1}^{i-1} (-A_{ij}x_j^{(k)}) \sum_{j = i+1}^{n} (-A_{ij}x_j^{(k-1)})+ b_i \right] \text{ for } i=1, \cdots, n\] involves the entire lower triangular part of \(\mathbf{A}\) interacting with entries of \(\mathbf{x}^{(k)}\),
we can rewrite this as \[[\mathbf{D}(\mathbf{A}) + \mathbf{L}(\mathbf{A})]\mathbf{x}^{(k)} + \mathbf{U}(\mathbf{A}) \mathbf{x}^{(k-1)} = \mathbf{b}.\]
Thus, the update is \[\mathbf{x}^{(k)} = -[\mathbf{D}(\mathbf{A}) + \mathbf{L}(\mathbf{A})]^{-1} \mathbf{U}(\mathbf{A}) \mathbf{x}^{(k-1)} + [\mathbf{D}(\mathbf{A}) + \mathbf{L}(\mathbf{A})]^{-1} \mathbf{b}.\]
"""
gaussseidel(A,b,x_0,N)
Run N iterations of the Gauss-Seidel method on system Ax=b with initial iterate x_0.
"""
function gaussseidel(A,b,x_0,N,x_true)
= Diagonal(A);
D = tril(A,-1);
L = triu(A,1);
U
= x_0
x_old = zeros(N+1)
errs 1] = norm(x_old - x_true)/norm(x_true)
errs[for i in 1:N
= ############################################### complete this code!
x_new +1] = norm(x_new - x_true)/norm(x_true);
errs[i= x_new;
x_old end
return x, errs
end
gaussseidel
The Jacobi and Gauss-Seidel methods update using individual coordinate direction updates onto the hyperplanes defined by the individual equations. This is not the only way that the iterates can move towards these hyperplanes!
Perhaps the most natural updating strategy is to project each iterate onto a hyperplane to produce the subsequent iterate. This strategy is known as the Kaczmarz method.
The update for this type of method is given by \(\mathbf{x}^{(k)} = \mathbf{x}^{(k-1)} + \frac{b_i - \mathbf{a}_i^\top \mathbf{x}^{(k-1)}}{\|\mathbf{a}_i\|^2} \mathbf{a}_i\) where \(\mathbf{a}_i\) is the \(i\)th row of \(\mathbf{A}\) and \(i\) is chosen from \(1, \cdots, n\).