5.109816297946132e7
We consider now the conditioning of solving the square linear system \(\mathbf{A}\mathbf{x} = \mathbf{b}\). Here, the data is \(\mathbf{A}\) and \(\mathbf{b}\), and the solution is \(\mathbf{x}\).
For simplicity, we’ll imagine that there are perturbations only to \(\mathbf{b}\), while \(\mathbf{A}\) is fixed. Suppose \(\mathbf{A}\mathbf{x} = \mathbf{b}\) is perturbed to \[\mathbf{A}(\mathbf{x} + \mathbf{h}) = \mathbf{b} + \mathbf{d}.\]
The condition number is the relative change in the solution divided by the relative change in the data, \[\frac{\frac{\|\mathbf{h}\|}{\|\mathbf{x}\|}}{\frac{\|\mathbf{d}\|}{\|\mathbf{b}\|}} = \frac{\|\mathbf{h}\| \|\mathbf{b}\|}{\|\mathbf{d}\|\|\mathbf{x}\|}.\]
Since \(\mathbf{h} = \mathbf{A}^{-1}\mathbf{d}\), we can bound \(\|\mathbf{h}\|\) as \[\|\mathbf{h}\| \le \|\mathbf{A}^{-1}\|\|\mathbf{d}\|.\]
Similarly, we have \(\|\mathbf{b}\| \le \|\mathbf{A}\| \|\mathbf{x}\) and so \[\frac{\|\mathbf{h}\| \|\mathbf{b}\|}{\|\mathbf{d}\|\|\mathbf{x}\|} \le \frac{\|\mathbf{A}^{-1}\|\|\mathbf{d}\|\|\mathbf{A}\|\|\mathbf{x}\|}{\|\mathbf{d}\|\|\mathbf{x}\|} = \|\mathbf{A}^{-1}\|\|\mathbf{A}\|.\]
This bound is tight – the inequalities are equations for some choices of \(\mathbf{b}\) and \(\mathbf{d}\).
Definition: Matrix condition number
The matrix condition number of an invertible square matrix \(\mathbf{A}\) is \[\kappa(\mathbf{A}) = \|\mathbf{A}^{-1}\|\|\mathbf{A}\|.\] This value depends on the choice of norm; a subscript on \(\kappa\) such as 1, 2, or \(\infty\) is used if clarification is needed. If \(\mathbf{A}\) is singular, we define \(\kappa(\mathbf{A}) = \infty\).
Theorem: Conditioning of linear systems
If \(\mathbf{A}(\mathbf{x} + \triangle \mathbf{x}) = \mathbf{b} + \triangle \mathbf{b}\), then \[\frac{\|\triangle \mathbf{x}\|}{\|\mathbf{x}\|} \le \kappa(\mathbf{A}) \frac{\|\triangle \mathbf{b}\|}{\|\mathbf{b}\|}.\]
If \((\mathbf{A} + \triangle \mathbf{A})(\mathbf{x} + \triangle \mathbf{x}) = \mathbf{b}\), then \[\frac{\|\triangle \mathbf{x}\|}{\|\mathbf{x}\|} \le \kappa(\mathbf{A}) \frac{\|\triangle \mathbf{A}\|}{\|\mathbf{A}\|},\] in the limit \(\|\triangle \mathbf{A}\| \rightarrow 0\).
Exercise: Lower bound on condition number
Show that \(\kappa(\mathbf{A}) \ge 1\).
We’ll begin with an example of a Hilbert matrix which is famously ill-conditioned.
When solving a linear system with this matrix, we will lose nearly 8 digits of accuracy due to the ill-conditioning of this problem!
We perturb the system randomly by \(10^{-10}\) in norm.
We solve the perturbed problem and see how the solution is changled.
6-element Vector{Float64}:
-7.449594121577974e-6
0.0001247466230993588
-0.0006403322152883639
0.0013944543468378257
-0.0013561726908424276
0.00048554115369814355
relative_error = norm(◬x) / norm(x) = 0.0002210141477023834
These errors are due to our manual perturbations we made to the data. Even just machine roundoff perturbs this data and affects the solution of this ill-conditioned problem. This error will scale with \(\epsilon_{\text{mach}}\).
Larger Hilbert matrices are even more ill-conditioned and their linear systems suffer from more error during solution.
relative_error = norm(◬x) / norm(x) = 4.469466154206132
There are zero accurate digits!
When we don’t know the solution of a linear system, we cannot compare our approximate computed solution to the true solution, so we use the residual error.
Definition: Residual of a linear system
For the problem \(\mathbf{A}\mathbf{x} = \mathbf{b}\), the residual at a solution estimate \(\hat{\mathbf{x}}\) is \[\mathbf{r} = \mathbf{b} - \mathbf{A}\hat{\mathbf{x}}.\]
A zero residual means we have an exact solution, and if the matrix is rank \(n\), then we have \(\hat{\mathbf{x}} = \mathbf{x}\).
More generally, though, we have \[\mathbf{A}\hat{\mathbf{x}} = \mathbf{b} - \mathbf{r}.\] This means that \(\hat{\mathbf{x}}\) is an exact solution for a linear system with right hand error changed by \(-\mathbf{r}\).
This is what we search for when studying background error!
Hence, residual error of a linear system is the system’s backward error. We can connect this error to the forward error by making the definition \(\mathbf{h} = \hat{\mathbf{x}} - \mathbf{x}\) in the equation \(\mathbf{A}(\mathbf{x}+\mathbf{h}) = \mathbf{b}+\mathbf{d}\).
Then \[\mathbf{d} = \mathbf{A}(\mathbf{x}+\mathbf{h}) - \mathbf{b} = \mathbf{A}\mathbf{h} = -\mathbf{r}.\]
Thus, our previous theorem yields \[\frac{\|\mathbf{x} - \hat{\mathbf{x}}\|}{\|\mathbf{x}\|} \le \kappa(\mathbf{A}) \frac{\|\mathbf{r}\|}{\|\mathbf{b}\|}.\]
Fact:
When solving a linear system, we can only expect that the backward (residual) error is small, not the error, since this will suffer from scaling by the matrix condition number.
Many matrices typically encountered in scientific computing have special structure. It can be very helpful to understand and exploit these special structures!
An \(n \times n\) matrix \(\mathbf{A}\) is (row) diagonally dominant if \[|A_{ii}| > \sum_{j=1//j\not=i}^n |A_{ij}| \text{ for each } i=1, \cdots, n.\]
Definition: Bandwidth
A matrix \(\mathbf{A}\) has upper bandwidth \(b_u\) if \(j - i > b_u\) implies \(A_{ij} = 0\), and lower bandwidth \(b_l\) if \(i-j > b_l\) implies \(A_{ij} = 0\). We say the total bandwidth is \(b_u + b_l + 1\). When \(b_u = b_l = 1\), we have the important case of a tridiagonal matrix.
7×7 Matrix{Float64}:
1.0 -1.0 0.0 0.0 0.0 0.1 0.0
0.0 1.0 -2.0 0.0 0.0 0.0 0.1
0.0 0.0 1.0 -3.0 0.0 0.0 0.0
50.0 0.0 0.0 1.0 -4.0 0.0 0.0
0.0 50.0 0.0 0.0 1.0 -5.0 0.0
0.0 0.0 50.0 0.0 0.0 1.0 -6.0
0.0 0.0 0.0 50.0 0.0 0.0 1.0
The LU factors are also banded!
Note:
The number of flops needed by LU factorization without pivoting is \(\mathcal{O}(b_u b_t n)\) when the upper and lower bandwidths are \(b_u\) and \(b_l\).
In order for Julia to take advantage of banded matrix advantages if we use an ordinary (dense) matrix representation (since it doesn’t know in advance where the zeros are).
3.492128 seconds (7 allocations: 763.016 MiB, 0.19% gc time)
Extremely large matrices cannot be stored in primary memory of a computer unless they are sparse – that is, they have few nonzero entries. A sparse matrix has structural zeros, entries that are known to be zero and thus no value need be stored.
Sparse matrices are not (should not be) represented as a usual matrix array in memory. Instead, one can use one of a variety of sparse matrix representations.
For example, you can store triples \((i,j,A_{ij})\) for all locations of nonzeros \((i,j)\) in the matrix. This requires \(3\text{nnz}(A)\) storage, whereas usual storage requires \(\mathcal{O}(n^2)\) storage – this can be a significant advantage when \(\text{nnz}(A) \ll n^2\).
A common source of sparse matrices is graphs or networks – large graphs often have few edges and thus their adjacency matrices (and other matrix representations) are often large, very sparse matrices!
The computer memory consumed by any variable can be learned by using the summarysize command. We see the storage savings offered by sparse matrix representations is dramatic!
Matrix-vector products are also much more efficient when the matrix is given in sparse form, because the operations using structural zeros are completely skipped.
0.000903223
Computer arithmetic operations exploit sparsity whenever they can and calculations on sparse matrices can be much more efficient than calculations on their dense counterparts!
However, some operations are not guaranteed to preserve sparsity (mathematically) – this phenomenon is known as fill-in.
The LU decomposition for a symmetric matrix (if it exists), takes on a special form: \[\mathbf{A} = \mathbf{L}\mathbf{D}\mathbf{L}^\top.\]
Note:
\(\mathbf{L}\mathbf{D}\mathbf{L}^\top\) factorization on an \(n \times n\) symmetric matrix, when successful, takes \(\sim \frac13 n^3\) flops – half as many as is necessary for regular \(\mathbf{L}\mathbf{U}\) factorization.
Suppose \(\mathbf{A} \in \mathbb{R}^{n \times n}\) and \(\mathbf{x} \in \mathbb{R}^n\). Note that \(\mathbf{x}^\top \mathbf{A}\mathbf{x}\) is scalar valued – this is called a quadratic form.
Definition: Symmetric positive (semi-)definite matrix
A real \(n \times n\) matrix \(\mathbf{A}\) is called a symmetric positive definite matrix (SPD) if it is symmetric and, for all \(\mathbf{x} \not= 0\), \[\mathbf{x}^\top \mathbf{A}\mathbf{x} > 0.\] A matrix is a symmetric positive definite matrix if the inequality above holds but possibly with equality for some nonzero vectors \(\mathbf{x}\).
This definition, in combination with knowledge of either the spectral decomposition or \(\mathbf{L}\mathbf{D}\mathbf{L}^\top\) factorization, allows one to prove the following theorem.
Theorem: Cholesky factorization
Any SPD matrix \(\mathbf{A}\) may be factored as \[\mathbf{A} = \mathbf{R}^\top \mathbf{R},\] where \(\mathbf{R}\) is an upper triangular matrix with positive diagonal elements. This is called the Cholesky factorization.
Note:
Cholesky factorization of an \(n \times n\) SPD matrix takes \(\sim \frac13 n^3\) flops.
LoadError: PosDefException: matrix is not positive definite; Factorization failed.
PosDefException: matrix is not positive definite; Factorization failed.
Stacktrace:
[1] checkpositivedefinite
@ ~/.julia/juliaup/julia-1.11.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/factorization.jl:68 [inlined]
[2] #cholesky!#163
@ ~/.julia/juliaup/julia-1.11.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/cholesky.jl:269 [inlined]
[3] cholesky!
@ ~/.julia/juliaup/julia-1.11.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/cholesky.jl:267 [inlined]
[4] #cholesky!#164
@ ~/.julia/juliaup/julia-1.11.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/cholesky.jl:301 [inlined]
[5] cholesky! (repeats 2 times)
@ ~/.julia/juliaup/julia-1.11.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/cholesky.jl:295 [inlined]
[6] _cholesky
@ ~/.julia/juliaup/julia-1.11.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/cholesky.jl:411 [inlined]
[7] cholesky(A::Matrix{Float64}, ::NoPivot; check::Bool)
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/cholesky.jl:401
[8] cholesky
@ ~/.julia/juliaup/julia-1.11.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/cholesky.jl:401 [inlined]
[9] cholesky(A::Matrix{Float64})
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/cholesky.jl:401
[10] top-level scope
@ In[25]:3
Cholesky{Float64, Matrix{Float64}}
U factor:
4×4 UpperTriangular{Float64, Matrix{Float64}}:
9.94987 10.0504 10.5529 8.44232
⋅ 9.4863 7.68892 5.91922
⋅ ⋅ 3.93915 -1.42247
⋅ ⋅ ⋅ 5.16398