Code
cond(V) = 1.0000000000000007
Other than solving linear systems (or linear least-squares), the most important problem in numerical linear algebra is the eigenvalue problem.
Exercise: Do you recall the definition of eigenvalue and eigenvector?
Suppose \(\mathbf{A}\) is a matrix, and \(\lambda\) and \(\mathbf{x}\) are an eigenvalue and eigenvector, respectively. What do you know about \(\mathbf{A}\), \(\lambda\), and \(\mathbf{x}\)? What is the relationship between them?
Given a square matrix \(\mathbf{A}\), if
\[\mathbf{A}\mathbf{x} = \lambda \mathbf{x}\]
for a scalar \(\lambda\) and a nonzero vector \(\mathbf{x}\), then \(\lambda\) is an eigenvalue and \(\mathbf{x}\) is an associated eigenvector.We can rewrite the eigenvalue definition as \((\mathbf{A} - \lambda\mathbf{I}) \mathbf{x} = \boldsymbol{0}\). Thus \((\mathbf{A} - \lambda\mathbf{I})\) is singular, and must have a zero determinant.
The determinant \(\det(\mathbf{A} - \lambda \mathbf{I})\) is called the characteristic polynomial. Its roots are the eigenvalues, so we know that an \(n\times n\) matrix has \(n\) eigenvalues, counting algebraic multiplicity.
Suppose that \(\mathbf{A}\mathbf{v}_k=\lambda_k\mathbf{v}_k\) for \(k=1,\ldots,n\). We can summarize these as
\[\begin{align*}\begin{bmatrix} \mathbf{A}\mathbf{v}_1 & \mathbf{A}\mathbf{v}_2 & \cdots & \mathbf{A}\mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \lambda_1 \mathbf{v}_1 & \lambda_2\mathbf{v}_2 & \cdots & \lambda_n \mathbf{v}_n \end{bmatrix}, \\[1mm] \mathbf{A} \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} &= \begin{bmatrix} \mathbf{v}_1 & \mathbf{v}_2 & \cdots & \mathbf{v}_n \end{bmatrix} \begin{bmatrix} \lambda_1 & & & \\ & \lambda_2 & & \\ & & \ddots & \\ & & & \lambda_n \end{bmatrix}, \end{align*}\] which we write as
\[\mathbf{A} \mathbf{V} = \mathbf{V} \mathbf{D}.\]
If \(\mathbf{V}\) is a nonsingular matrix, then we arrive at the eigenvalue decomposition.
Definition: Eigenvalue decomposition (EVD)
An eigenvalue decomposition (EVD) of a square matrix \(\mathbf{A}\) is
\[\mathbf{A} = \mathbf{V} \mathbf{D} \mathbf{V}^{-1}.\]
If \(\mathbf{A}\) has an EVD, we say that \(\mathbf{A}\) is diagonalizable; otherwise \(\mathbf{A}\) is nondiagonalizable (or defective).
One simple example of a nondiagonalizable matrix is
\[\mathbf{B} = \begin{bmatrix} 1 & 1\\0 & 1 \end{bmatrix}.\]
There are instances in which we can guarantee an EVD exists.
Theorem:
If the \(n\times n\) matrix \(\mathbf{A}\) has \(n\) distinct eigenvalues, then \(\mathbf{A}\) is diagonalizable.
Just as linear systems have condition numbers that quantify the effect of finite precision, eigenvalue problems may be poorly conditioned too. Many possible results exist, we will use just one.
Theorem: Bauer–Fike
Let \(\mathbf{A}\in\mathbb{C}^{n\times n}\) be diagonalizable, \(\mathbf{A}=\mathbf{V}\mathbf{D}\mathbf{V}^{-1}\), with eigenvalues \(\lambda_1,\ldots,\lambda_n\). If \(\mu\) is an eigenvalue of \(\mathbf{A}+\mathbf{E}\) for a complex matrix \(\mathbf{E}\), then
\[\min_{j=1,\ldots,n} |\mu - \lambda_j| \le \kappa(\mathbf{V}) \, \| \mathbf{E} \|\,,\]
where \(\|\cdot\|\) and \(\kappa\) are in the 2-norm.
The Bauer–Fike theorem tells us that eigenvalues can be perturbed by an amount that is \(\kappa(\mathbf{V})\) times larger than perturbations to the matrix.
The limiting case of \(\kappa(\mathbf{V})=\infty\) might be interpreted as indicating a nondiagonalizable matrix \(\mathbf{A}\). The other extreme is also of interest: \(\kappa(\mathbf{V})=1\), which implies that \(\mathbf{V}\) is unitary.
Definition: Normal matrix
If \(\mathbf{A}\) has an EVD with a unitary eigenvector matrix \(\mathbf{V}\), then \(\mathbf{A}\) is a normal matrix.
In Math 73, we refer to unitary matrices as orthogonal and normal matrices as orthogonally diagonalizable matrices.
Hermitian and real symmetric matrices are normal. Since the condition number of a unitary matrix is equal to 1, Bauer-Fike guarantees that a perturbation of a normal matrix changes the eigenvalues by the same amount or less.
cond(V) = 1.0000000000000007
7×1 Matrix{Float64}:
2.8621982126143585e-9
6.81385035582749e-10
9.046796029822398e-10
2.789496888225541e-10
6.166751082615822e-10
1.2216561199825988e-9
1.598605774006123e-9
As promised, the perturbation of the eigenvalues does not exceed the norm of the perturbation to the matrix.
Now, we try a triangular matrix.
5×5 Matrix{Float64}:
1.0 1.0 1.0 1.0 1.0
0.0 2.0 2.0 2.0 2.0
0.0 0.0 3.0 3.0 3.0
0.0 0.0 0.0 4.0 4.0
0.0 0.0 0.0 0.0 5.0
If we plot the eigenvalues of many perturbations, we get a cloud of points that roughly represents all the possible eigenvalues when representing this matrix with single-precision accuracy.
Some eigenvalues are far more affected by the perturbation than others!
In practice, root-finding on the characteristic polynomial is not used in numerical methods for eigenvalue computation.
Many numerical methods use matrix powers, and build on the fact that the EVD simplifies matrix powers significantly:
\[\mathbf{A}^2 = \mathbf{V} \mathbf{D} \mathbf{V}^{-1} \mathbf{V} \mathbf{D} \mathbf{V}^{-1} = \mathbf{V} \mathbf{D}^2 \mathbf{V}^{-1}\]
and more generally,
\[\mathbf{A}^k = \mathbf{V} \mathbf{D}^k \mathbf{V}^{-1}.\]
If the eigenvalues have different magnitudes, then as \(k \rightarrow \infty\) the entries on the diagonal of \(\mathbf{D}^k\) will become increasingly well separated and easy to pick out.
5×5 Matrix{Float64}:
1.71884 0.442572 0.401375 0.50617 0.517765
0.442572 -0.0236562 1.19399 3.33321 -2.29173
0.401375 1.19399 1.41403 -3.65011 -1.17993
0.50617 3.33321 -3.65011 -1.50997 -1.05649
0.517765 -2.29173 -1.17993 -1.05649 2.40075
5-element Vector{Float64}:
-6.000000000000005
-1.000000000000001
1.9999999999999987
4.000000000000002
4.999999999999996
Interestingly, if we repeat this transformation many times, the resulting matrix converges to \(\mathbf{D}\):
5×5 Matrix{Float64}:
-6.0 0.0049553 1.25503e-6 -1.26275e-15 2.57233e-16
0.0049553 5.0 0.000171875 -2.44292e-14 5.12839e-16
1.25503e-6 0.000171875 4.0 -3.63019e-10 4.63596e-16
-2.11382e-16 -2.40191e-14 -3.63021e-10 2.0 4.7017e-13
2.24715e-31 -1.35075e-27 8.57213e-23 4.69617e-13 -1.0
The process demonstrated here is known as the Francis QR iteration, and it can be formulated as an \(O(n^3)\) algorithm for finding the EVD. Such an algorithm is the foundation of what the eigen function uses.
The SVD is as fundamental as the EVD and very related to it!
Definition: Singular value decomposition (SVD)
The singular value decomposition of an \(m\times n\) matrix \(\mathbf{A}\) is
\[\mathbf{A} = \mathbf{U} \mathbf{S} \mathbf{V}^*,\]
where \(\mathbf{U}\in\mathbb{C}^{m\times m}\) and \(\mathbf{V}\in\mathbb{C}^{n\times n}\) are unitary and \(\mathbf{S}\in\mathbb{R}^{m\times n}\) is real and diagonal with nonnegative elements.
The columns of \(\mathbf{U}\) and \(\mathbf{V}\) are called left and right singular vectors, respectively. The diagonal elements of \(\mathbf{S}\), written \(\sigma_1,\ldots,\sigma_r\), for \(r=\min\{m,n\}\), are called the singular values of \(\mathbf{A}\) and are ordered so that
\[\sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_r\ge 0, \qquad r=\min\{m,n\}.\]
We call \(\sigma_1\) the principal singular value and \(\mathbf{u}_{1}\) and \(\mathbf{v}_{1}\) the principal singular vectors.
Theorem:
Every \(m\times n\) matrix has an SVD. The singular values of a matrix are unique, but the singular vectors are not. If the matrix is real, then \(\mathbf{U}\) and \(\mathbf{V}\) in {eq}svd can be chosen to be real, orthogonal matrices.
Theorem:
The nonzero eigenvalues of \(\mathbf{A}^*\mathbf{A}\) are the squares of the singular values of \(\mathbf{A}\).
Exercise: Prove this!
Another way to write \(\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^*\) is
\[ \mathbf{A}\mathbf{V}=\mathbf{U}\mathbf{S}. \]
Taken columnwise, this equation means
\[ \mathbf{A} \mathbf{v}_{k} = \sigma_k \mathbf{u}_{k}, \qquad k=1,\ldots,r=\min\{m,n\}. \]
In words, each right singular vector is mapped by \(\mathbf{A}\) to a scaled version of its corresponding left singular vector; the magnitude of scaling is its singular value.
This is like the EVD! The SVD sacrifices having the same basis in both source and image spaces (they may not even have the same dimension).
We saw earlier in this course that a matrix has both full and thin forms of the QR factorization. A similar situation holds with the SVD.
Suppose \(\mathbf{A}\) is \(m\times n\) with \(m > n\) and let \(\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^*\) be an SVD. The last \(m-n\) rows of \(\mathbf{S}\) are all zero due to the fact that \(\mathbf{S}\) is diagonal. Hence
\[\begin{align*} \mathbf{U} \mathbf{S} & = \begin{bmatrix} \mathbf{u}_1 & \cdots & \mathbf{u}_n & \mathbf{u}_{n+1} & \cdots & \mathbf{u}_m \end{bmatrix} \begin{bmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_n \\ & & \\ & \boldsymbol{0} & \\ & & \end{bmatrix} = \begin{bmatrix} \mathbf{u}_1 & \cdots & \mathbf{u}_n \end{bmatrix} \begin{bmatrix} \sigma_1 & & \\ & \ddots & \\ & & \sigma_n \end{bmatrix} \\&= \hat{\mathbf{U}} \hat{\mathbf{S}}, \end{align*}\]
in which \(\hat{\mathbf{U}}\) is \(m\times n\) and \(\hat{\mathbf{S}}\) is \(n\times n\).
This allows us to define the thin SVD
\[\mathbf{A}=\hat{\mathbf{U}}\hat{\mathbf{S}}\mathbf{V}^*,\]
in which \(\hat{\mathbf{S}}\) is square and diagonal and \(\hat{\mathbf{U}}\) is ONC but not square.
The SVD is intimately connected to the 2-norm, as the following theorem describes.
Theorem:
Let \(\mathbf{A}\in\mathbb{C}^{m\times n}\) have an SVD \(\mathbf{A}=\mathbf{U}\mathbf{S}\mathbf{V}^*\). Then:
The 2-norm satisfies
\[\| \mathbf{A} \|_2 = \sigma_1.\]
The rank of \(\mathbf{A}\) is the number of nonzero singular values.
Let \(r=\min\{m,n\}\). Then
\[\kappa_2(\mathbf{A}) = \|\mathbf{A}\|_2\|\mathbf{A}^+\|_2 = \frac{\sigma_1}{\sigma_r},\]
where a division by zero implies that \(\mathbf{A}\) does not have full rank.
4-element Vector{Float64}:
146.69715365883005
5.738569780953702
0.9998486640841027
0.11928082685241923
size(U) = (5, 4)
size(V) = (4, 4)
While the EVD and SVD are very fundamental matrix decompositions and have very clear linear algebraic interpretations, they are not the only decompositions of interest. One very popular decomposition is the nonnegative matrix factorization (NMF) – the nonnegativity of the factors makes it very interpretable.
Definition: Nonnegative matrix factorization (NMF)
Given \(\mathbf{A}\in\mathbb{R}^{m\times n}\), the nonnegative matrix factorization (NMF) of \(\mathbf{A}\) is a decomposition of the form \[\mathbf{A} \approx \mathbf{W} \mathbf{H},\]
where \(\mathbf{W}\in\mathbb{R}^{m\times r}\) and \(\mathbf{H}\in\mathbb{R}^{r\times n}\) are nonnegative matrices, and \(r\) is the user-selected rank of the factorization.
Note that this factorization is not exact – one typically attempts to find \(\mathbf{W}\) and \(\mathbf{H}\) by minimizing some error between the data \(\mathbf{A}\) and the approximation \(\mathbf{W}\mathbf{H}\).
# Visualize first 10 SVD components (U columns)
function show_components(U, title_prefix)
n = 10
plt = plot(layout = (1, n), size = (100 * n, 100))
for i in 1:n
img = reshape(U[:, i], 28, 28)
heatmap!(plt[i], img, axis = false, colorbar = false, title = "$title_prefix $i")
end
display(plt)
end
show_components(U, "SVD")