01.01.12 · foundations / linear-algebra

Singular value decomposition (finite-dim)

shipped3 tiersLean: partial

Anchor (Master): Horn-Johnson — Matrix Analysis Ch. 7; Golub-Van Loan — Matrix Computations Ch. 2 + Ch. 8; Trefethen-Bau — Numerical Linear Algebra Lectures 4–5; Kato — Perturbation Theory for Linear Operators Ch. I §5

Intuition [Beginner]

A linear map sends the unit disk in its input space to an ellipse — possibly squashed, possibly stretched, possibly rotated. The singular value decomposition is the recipe for that ellipse. It says every linear map, no matter how complicated its matrix looks, is built out of three simple ingredients in order: rotate the input, stretch along the perpendicular coordinate axes by independent factors, rotate the output.

The stretch factors are the singular values. They are non-negative numbers, ordered from largest to smallest. The largest singular value is the longest semi-axis of the output ellipse — the biggest stretch the map performs anywhere on the unit disk. The smallest non-zero singular value is the shortest semi-axis — the smallest stretch that does not collapse to a point. Zero singular values record directions the map collapses to a point.

Every matrix, square or rectangular, real or complex, admits this factorisation. The eigenvalue picture from 01.01.08 requires the matrix to be square, and even then the eigenvectors may fail to be orthogonal or fail to span the space. The singular value picture has none of those defects. It applies to every matrix, the singular vectors are always orthogonal, and they always span the input and output spaces.

Visual [Beginner]

The picture shows the unit disk in the plane, then four stages of the map $A = U Σ V^{*}$ . Stage one is the original unit disk with its two coordinate axes drawn. Stage two is the rotation by $V^{*}$ : the disk is unchanged but the axes have been turned. Stage three is the stretch by $Σ$ : the disk has become an ellipse with semi-axes $σ_{1}$ and $σ_{2}$ , the singular values. Stage four is the final rotation by $U$ : the ellipse has been turned to its final orientation. The two singular values are the semi-axis lengths of the final ellipse.

The geometry is the whole story. Any linear map factors as rotate, stretch along perpendicular axes, rotate. The number of non-zero stretch factors is the rank of the map. The largest stretch factor is the biggest number by which the map can magnify a vector — the operator norm — and the ratio of the largest to the smallest non-zero stretch is the condition number, the amplification of relative error in numerical computation.

Worked example [Beginner]

Take the two-by-two matrix

A = (3002) .

This matrix already stretches the horizontal axis by $3$ and the vertical axis by $2$ , with no rotation. Its singular value decomposition is the simplest possible: $A = I \cdot Σ \cdot I^{*}$ with $Σ = A$ itself, $I$ the identity. The singular values are $σ_{1} = 3$ and $σ_{2} = 2$ . The right singular vectors are the columns of the input identity, $(1, 0)$ and $(0, 1)$ ; the left singular vectors are the columns of the output identity, the same two vectors.

A less obvious example. Take

A = (1021) .

To find the singular values, compute $A^{T} A$ :

A^{T} A = (1201) (1021) = (1225) .

The eigenvalues of $A^{T} A$ are the roots of $t^{2} - 6 t + 1 = 0$ , namely $t = 3 \pm 22$ . The singular values are the non-negative square roots: $σ_{1} = 3 + 22 \approx 2.41$ and $σ_{2} = 3 - 22 \approx 0.41$ . The product $σ_{1} σ_{2} = (3 + 22) (3 - 22) = 9 - 8 = 1$ equals the absolute determinant of $A$ , as it must.

What this tells us. The matrix $A$ has eigenvalue $1$ (with algebraic multiplicity $2$ ) but its action on the unit disk is highly non-uniform: it stretches one direction by a factor of $2.41$ and compresses another by a factor of $0.41$ . The eigenvalues hide this asymmetry. The singular values expose it directly.

Check your understanding [Beginner]

Exercise (easy, multiple choice).

Which of the following must be true about the singular values of any matrix $A$ ?

A. They can be any real numbers. B. They are always non-negative real numbers. C. They are always equal to the eigenvalues of $A$ . D. They are always positive real numbers.

Hint

Singular values are the non-negative square roots of the eigenvalues of $A^{*} A$ , a positive semidefinite matrix.

Answer

B. They are always non-negative real numbers. Feedback-correct: correct; singular values are non-negative real numbers because they are the non-negative square roots of eigenvalues of $A^{*} A$ , which is positive semidefinite. Feedback-wrong: option A is too permissive (negative values are excluded by definition); option C confuses singular values with eigenvalues — these agree only for positive semidefinite Hermitian matrices; option D is too restrictive — a matrix of rank less than $min (m, n)$ has some singular values equal to zero.

Formal definition [Intermediate+]

Let $F$ denote $R$ or $C$ , and write $A^{*}$ for the conjugate transpose of a matrix $A$ over $F$ (so $A^{*} = A^{T}$ over $R$ ). A singular value decomposition of an $m \times n$ matrix $A$ over $F$ is a factorisation

A = U Σ V^{*}

with:

$U \in M_{m} (F)$ unitary, i.e., $U^{*} U = U U^{*} = I_{m}$ ;
$V \in M_{n} (F)$ unitary, i.e., $V^{*} V = V V^{*} = I_{n}$ ;
$Σ \in M_{m \times n} (R)$ a rectangular "diagonal" matrix, meaning $Σ_{ij} = 0$ for $i \neq = j$ , with $(Σ_{ii})_{i = 1}^{m i n (m, n)} = (σ_{1}, σ_{2}, \dots, σ_{m i n (m, n)})$ a decreasing sequence of non-negative real numbers $σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0 = σ_{r + 1} = \dots = σ_{m i n (m, n)}$ , where $r = rank A$ .

The numbers $σ_{1}, \dots, σ_{m i n (m, n)}$ are the singular values of $A$ ; the non-zero ones are $σ_{1}, \dots, σ_{r}$ . The columns of $V$ are the right singular vectors of $A$ — an orthonormal basis of the input space $F^{n}$ — and the columns of $U$ are the left singular vectors — an orthonormal basis of the output space $F^{m}$ . The pairing $A v_{i} = σ_{i} u_{i}$ for $i \leq r$ links the two bases ^{[Horn, R. A. & Johnson, C. R. — Matrix Analysis (2nd ed.)]}.

Relation to eigenvalues of $A^{*} A$ and $A A^{*}$ . The Gram matrices $A^{*} A \in M_{n} (F)$ and $A A^{*} \in M_{m} (F)$ are Hermitian and positive semidefinite, hence their eigenvalues are non-negative real numbers. The singular values of $A$ are the non-negative square roots of the eigenvalues of $A^{*} A$ (equivalently of $A A^{*}$ ), counted with multiplicity. Concretely, if $V$ is an orthonormal eigenbasis for $A^{*} A$ with eigenvalues $λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} \geq 0$ , then $σ_{i} = λ_{i}$ , the columns of $V$ are the right singular vectors, and the left singular vectors are obtained by setting $u_{i} = A v_{i} / σ_{i}$ when $σ_{i} > 0$ .

Reduced versus full SVD. The factorisation written above is the full SVD with $U$ of size $m \times m$ and $V$ of size $n \times n$ . The reduced (or thin) SVD discards the columns of $U$ and $V$ that pair with zero singular values: writing $r = rank A$ and letting $U_{r}, V_{r}$ be the matrices whose columns are the first $r$ left and right singular vectors, $Σ_{r} = diag (σ_{1}, \dots, σ_{r}) \in M_{r} (R)$ the diagonal of non-zero singular values, the reduced SVD is

A = U_{r} Σ_{r} V_{r}^{*} = i = 1 \sum r σ_{i} u_{i} v_{i}^{*} .

The dyadic sum on the right exhibits $A$ as a sum of $r$ rank- $1$ matrices, ordered by decreasing singular value. This is the form used by Eckart-Young: truncating the sum at $k < r$ produces the best rank- $k$ approximation to $A$ (master tier).

Uniqueness. The singular values $σ_{1} \geq σ_{2} \geq \dots \geq σ_{m i n (m, n)}$ are uniquely determined by $A$ . The unitary matrices $U$ and $V$ are unique up to (i) replacing $u_{i}, v_{i}$ by $e^{i θ_{i}} u_{i}, e^{i θ_{i}} v_{i}$ for any phases $θ_{i} \in R$ on each pair with $σ_{i}$ a simple non-zero singular value (over $R$ the phase is $\pm 1$ ), and (ii) replacing the columns ${u_{i}, v_{i}}_{i \in I}$ corresponding to a repeated singular value $σ_{i} = σ_{j}$ for $i, j \in I$ by a unitary recombination within that block, and a corresponding unitary recombination on the kernel and cokernel (for $σ = 0$ ).

Operator norm and Frobenius norm. The operator norm (or spectral norm) of $A$ is $∥ A ∥_{op} = sup_{∥ x ∥ = 1} ∥ A x ∥ = σ_{1}$ , the largest singular value. The Frobenius norm is $∥ A ∥_{F} = (\sum_{ij} ∣ A_{ij} ∣^{2})^{1/2} = (\sum_{i} σ_{i}^{2})^{1/2}$ , the root-sum-square of the singular values. Both norms are unitarily invariant: $∥ U A V^{*} ∥ = ∥ A ∥$ for any unitaries $U, V$ , which makes them functions of the singular values alone.

Counterexamples to common slips

Singular values are not eigenvalues. For the Jordan block $A = (1021)$ , the only eigenvalue is $1$ with algebraic multiplicity $2$ , but the singular values are $3 + 22 \approx 2.41$ and $3 - 22 \approx 0.41$ . The two notions coincide for Hermitian positive semidefinite matrices and otherwise differ.
The convention $σ_{1} \geq σ_{2} \geq \dots$ is universal in the modern literature, but the matrix $Σ$ in the factorisation $A = U Σ V^{*}$ is not uniquely determined as a matrix on the nose — only its diagonal entries are (up to the ordering convention). Permutations of singular values force corresponding column permutations on $U$ and $V$ .
The full SVD has $U$ of size $m \times m$ and $V$ of size $n \times n$ , both square unitary. For a tall matrix ( $m > n$ ), the matrix $Σ$ is $m \times n$ with the diagonal entries in its top $n \times n$ block and zero rows below; for a wide matrix ( $m < n$ ), the diagonal entries fill an $m \times m$ block on the left and zero columns on the right. The reduced SVD keeps only the non-zero structure.
The eigenvalue decomposition $A = P D P^{- 1}$ of a square diagonalisable matrix is, in general, not an SVD. The matrix $P$ need not be unitary, and $D$ may contain negative or complex entries. The SVD coincides with the eigendecomposition iff $A$ is Hermitian positive semidefinite, in which case $P$ can be chosen unitary and $D$ has non-negative real diagonal.

Key theorem with proof [Intermediate+]

Theorem (existence of the singular value decomposition). Let $A$ be an $m \times n$ matrix over $F = R$ or $C$ , and let $r = rank A$ . There exist unitary matrices $U \in M_{m} (F)$ and $V \in M_{n} (F)$ and an $m \times n$ rectangular diagonal matrix $Σ$ with real diagonal entries $σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0 = σ_{r + 1} = \dots = σ_{m i n (m, n)}$ such that $A = U Σ V^{*}$ . ^{[Horn, R. A. & Johnson, C. R. — Matrix Analysis (2nd ed.)]}

Proof. The argument has three steps: build the right singular vectors from the spectral theorem on $A^{*} A$ , define the left singular vectors as the images $A v_{i} / σ_{i}$ rescaled, and complete to a full unitary by orthogonal extension.

Step 1: Right singular vectors and singular values. The matrix $A^{*} A \in M_{n} (F)$ is Hermitian, since $(A^{*} A)^{*} = A^{*} (A^{*})^{*} = A^{*} A$ , and positive semidefinite: for any $v \in F^{n}$ , $v^{*} (A^{*} A) v = (A v)^{*} (A v) = ∥ A v ∥^{2} \geq 0$ , with equality iff $A v = 0$ , iff $v \in ker A$ . By the finite-dimensional spectral theorem for Hermitian operators [01.01.08, master tier], $A^{*} A$ has an orthonormal eigenbasis with real non-negative eigenvalues. Let $λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} \geq 0$ be these eigenvalues in decreasing order, and let $v_{1}, v_{2}, \dots, v_{n} \in F^{n}$ be a corresponding orthonormal eigenbasis: $A^{*} A v_{i} = λ_{i} v_{i}$ and $v_{i}^{*} v_{j} = δ_{ij}$ .

Define $σ_{i} = λ_{i}$ for $i = 1, \dots, n$ . The number $σ_{i}$ is the $i$ -th singular value of $A$ . Let $r$ be the largest index with $σ_{r} > 0$ , so $σ_{1} \geq \dots \geq σ_{r} > 0 = σ_{r + 1} = \dots = σ_{n}$ . The rank of $A$ equals $r$ because $ker (A^{*} A) = ker A$ (the equality $v^{*} A^{*} A v = ∥ A v ∥^{2} = 0$ iff $A v = 0$ ) and the rank of $A$ is $n - dim ker A = n - dim ker A^{*} A =$ number of non-zero eigenvalues of $A^{*} A = r$ .

Step 2: Left singular vectors via the image map. For each $i$ with $σ_{i} > 0$ , define the $i$ -th left singular vector by

u_{i} = \frac{1}{σ _{i}} A v_{i} \in F^{m} .

The vectors $u_{1}, \dots, u_{r}$ are orthonormal. To verify: for any $i, j$ with $1 \leq i, j \leq r$ ,

u_{j}^{*} u_{i} = \frac{1}{σ _{i} σ _{j}} (A v_{j})^{*} (A v_{i}) = \frac{1}{σ _{i} σ _{j}} v_{j}^{*} (A^{*} A) v_{i} = \frac{λ _{i}}{σ _{i} σ _{j}} v_{j}^{*} v_{i} = \frac{σ _{i}^{2}}{σ _{i} σ _{j}} δ_{ij} = δ_{ij},

using $A^{*} A v_{i} = λ_{i} v_{i} = σ_{i}^{2} v_{i}$ and the orthonormality of the $v_{j}$ . So ${u_{1}, \dots, u_{r}} \subseteq F^{m}$ is an orthonormal set.

Step 3: Completion and the matrix factorisation. Extend ${u_{1}, \dots, u_{r}}$ to an orthonormal basis ${u_{1}, \dots, u_{m}}$ of $F^{m}$ by Gram-Schmidt or by completing through the orthogonal complement of the span. Form the matrices $U = [u_{1} ∣ u_{2} ∣ \dots ∣ u_{m}] \in M_{m} (F)$ and $V = [v_{1} ∣ v_{2} ∣ \dots ∣ v_{n}] \in M_{n} (F)$ with the singular vectors as columns. Both are unitary because their columns are orthonormal bases. Let $Σ \in M_{m \times n} (R)$ be the rectangular diagonal matrix with $Σ_{ii} = σ_{i}$ for $1 \leq i \leq min (m, n)$ and zero off the diagonal.

Verify $A = U Σ V^{*}$ by computing both sides on the orthonormal basis ${v_{1}, \dots, v_{n}}$ . For $i \leq r$ , the right-hand side gives $U Σ V^{*} v_{i} = U Σ e_{i} = U (σ_{i} e_{i}) = σ_{i} u_{i}$ , and the left-hand side gives $A v_{i} = σ_{i} u_{i}$ by definition of $u_{i}$ . For $i > r$ , the right-hand side gives $U Σ e_{i} = 0$ (the $i$ -th diagonal entry of $Σ$ is zero), and the left-hand side gives $A v_{i} = 0$ because $v_{i} \in ker A^{*} A = ker A$ . The two matrices agree on a basis of $F^{n}$ , so they are equal. $□$

Corollary (uniqueness of singular values). The singular values $σ_{1} \geq σ_{2} \geq \dots$ of $A$ are uniquely determined by $A$ : any two singular value decompositions of $A$ have the same diagonal entries of $Σ$ , listed in the same decreasing order. The proof follows because the $σ_{i}^{2}$ are the eigenvalues of $A^{*} A$ , which are uniquely determined by $A$ , and the singular value convention fixes the order.

Bridge. The singular value decomposition is the universal factorisation of a linear map between inner-product spaces. The existence proof unpacks into four interlocking syntheses, each connecting the SVD to a downstream structure that uses singular values as its building block.

First synthesis: SVD as the orbit decomposition of bi-unitary action. The group $U (m) \times U (n)$ acts on $m \times n$ complex matrices by $(P, Q) \cdot A = P A Q^{*}$ . The orbits of this action are classified by the singular value tuple $(σ_{1}, σ_{2}, \dots)$ : two matrices lie in the same orbit iff they have the same singular values. The SVD itself is the statement that the orbit through $A$ contains a unique representative of the form $diag (σ_{1}, \dots, σ_{m i n (m, n)})$ . This packages the entire theorem as the orbit-classification result for the bi-unitary action, an instance of the more general invariant-theoretic perspective on linear-algebra canonical forms.

Second synthesis: SVD and the Moore-Penrose pseudoinverse. The least-squares problem $min_{x} ∥ A x - b ∥_{2}$ has a unique minimum-norm solution given by $x = A^{+} b$ with $A^{+} = V Σ^{+} U^{*}$ the Moore-Penrose pseudoinverse of $A$ , where $Σ^{+}$ is the rectangular diagonal matrix with reciprocated non-zero singular values. The SVD makes the pseudoinverse explicit and reveals its geometric content: $A^{+}$ inverts the action of $A$ on its row space and annihilates the cokernel. The pseudoinverse underlies linear regression, the normal equations, and the Tikhonov-regularised least-squares used when $σ_{r}$ is small.

Third synthesis: SVD as the building block of principal component analysis. For a centred data matrix $X \in M_{N \times d} (R)$ with $N$ samples in $R^{d}$ , the SVD $X = U Σ V^{T}$ identifies the right singular vectors (columns of $V$ ) as the principal directions of the data: the directions of maximum variance, ordered by decreasing variance $σ_{i}^{2} / N$ . Truncating the SVD at $k$ singular values projects the data onto the $k$ -dimensional subspace capturing the maximum total variance — the principal component projection. The Eckart-Young theorem (master tier) is the optimality statement: this truncation is the best rank- $k$ approximation in Frobenius norm.

Fourth synthesis: SVD as the finite-dimensional Schmidt decomposition. For compact operators $T$ on a separable Hilbert space, the Schmidt decomposition writes $T = \sum_{n} σ_{n} u_{n} \otimes v_{n}^{*}$ with $σ_{n} \geq 0$ decreasing to zero and ${u_{n}}, {v_{n}}$ orthonormal sequences. Finite-rank operators recover the finite-dimensional SVD; general compact operators add the limiting condition $σ_{n} \to 0$ . The singular values are the singular numbers of $T$ in the infinite-dimensional setting, foundational for the theory of trace-class and Hilbert-Schmidt operators and for the quantum-information notion of entanglement entropy (where the squared singular values of a bipartite density matrix are the Schmidt coefficients).

Exercises [Intermediate+]

Exercise 5 (medium, proof).

Let $A$ be an $m \times n$ matrix with SVD $A = U Σ V^{*}$ and singular values $σ_{1} \geq σ_{2} \geq \dots$ . Prove that $∥ A ∥_{op} = sup_{∥ x ∥ = 1} ∥ A x ∥ = σ_{1}$ .

Hint

Use the unitary invariance of the Euclidean norm and the diagonal form of $Σ$ .

Answer

Write $x = V y$ with $y = V^{*} x$ ; since $V$ is unitary, $∥ y ∥ = ∥ x ∥$ . Compute $∥ A x ∥ = ∥ U Σ V^{*} V y ∥ = ∥ U Σ y ∥ = ∥Σ y ∥$ using $V^{*} V = I$ and the unitary invariance $∥ U z ∥ = ∥ z ∥$ for the second step. The matrix $Σ$ acts on $y$ by $(Σ y)_{i} = σ_{i} y_{i}$ for $i \leq min (m, n)$ and zero otherwise, so $∥Σ y ∥^{2} = \sum_{i} σ_{i}^{2} ∣ y_{i} ∣^{2} \leq σ_{1}^{2} \sum_{i} ∣ y_{i} ∣^{2} = σ_{1}^{2} ∥ y ∥^{2} = σ_{1}^{2} ∥ x ∥^{2}$ , hence $∥ A x ∥ \leq σ_{1} ∥ x ∥$ . Equality is achieved at $x = v_{1}$ (the first column of $V$ , equivalently $y = e_{1}$ ), since $∥ A v_{1} ∥ = ∥ σ_{1} u_{1} ∥ = σ_{1}$ . Therefore $∥ A ∥_{op} = σ_{1}$ .

Exercise 6 (medium, proof).

Let $A$ be a square invertible $n \times n$ matrix with SVD $A = U Σ V^{*}$ . Prove that $A^{- 1} = V Σ^{- 1} U^{*}$ , where $Σ^{- 1} = diag (σ_{1}^{- 1}, \dots, σ_{n}^{- 1})$ is the diagonal matrix of reciprocals.

Hint

Multiply $V Σ^{- 1} U^{*}$ on the left by $A = U Σ V^{*}$ and simplify using unitarity.

Answer

Invertibility forces all singular values to be non-zero (otherwise $A$ has a non-zero kernel vector among the right singular vectors), so $Σ$ is an invertible diagonal matrix with $Σ^{- 1} = diag (σ_{1}^{- 1}, \dots, σ_{n}^{- 1})$ . Compute $A \cdot V Σ^{- 1} U^{*} = U Σ V^{*} V Σ^{- 1} U^{*} = U Σ Σ^{- 1} U^{*} = U U^{*} = I_{n}$ using $V^{*} V = I_{n}$ , $Σ Σ^{- 1} = I_{n}$ , and $U U^{*} = I_{n}$ . By symmetry $V Σ^{- 1} U^{*} \cdot A = I_{n}$ , so $A^{- 1} = V Σ^{- 1} U^{*}$ . The singular values of $A^{- 1}$ are the reciprocals of the singular values of $A$ , in reversed order: $σ_{i} (A^{- 1}) = σ_{n + 1 - i} (A)^{- 1}$ .

Exercise 7 (hard, proof).

Let $A$ be an $n \times n$ matrix over $F = R$ or $C$ . Prove that $A$ admits a polar decomposition $A = QP$ with $Q \in M_{n} (F)$ unitary and $P \in M_{n} (F)$ Hermitian positive semidefinite. Show that $P$ is uniquely determined by $A$ , and $Q$ is uniquely determined when $A$ is invertible.

Hint

Use the SVD $A = U Σ V^{*}$ . Insert $V V^{*} = I$ between $U$ and $Σ$ to write $A = U V^{*} \cdot V Σ V^{*}$ and identify the two factors.

Answer

Start from the SVD $A = U Σ V^{*}$ . Insert $V^{*} V = I_{n}$ between $Σ$ and $V^{*}$ , rewriting $A = U (V^{*} V) Σ V^{*} = (U V^{*}) (V Σ V^{*})$ . Set $Q = U V^{*}$ and $P = V Σ V^{*}$ . The factor $Q$ is unitary as the product of two unitaries. The factor $P$ is Hermitian: $P^{*} = (V Σ V^{*})^{*} = V Σ^{*} V^{*} = V Σ V^{*} = P$ since $Σ$ is real diagonal, hence equal to its conjugate transpose. The factor $P$ is positive semidefinite: for any $w \in F^{n}$ , $w^{*} P w = w^{*} V Σ V^{*} w = ∥ Σ^{1/2} V^{*} w ∥^{2} \geq 0$ using the non-negativity of the entries of $Σ$ (where $Σ^{1/2}$ is the entrywise non-negative square root). This proves existence.

For uniqueness of $P$ : any polar decomposition $A = QP$ with $Q$ unitary and $P$ Hermitian positive semidefinite gives $A^{*} A = P^{*} Q^{*} QP = P^{2}$ , so $P$ is the unique Hermitian positive semidefinite square root of $A^{*} A$ (a Hermitian positive semidefinite matrix has a unique Hermitian positive semidefinite square root, via the spectral theorem). For uniqueness of $Q$ when $A$ is invertible: $P^{- 1}$ exists, and $Q = A P^{- 1}$ is then forced.

The polar decomposition generalises the polar form $z = r e^{i θ}$ of a complex number from $n = 1$ to matrices: $∣ A ∣$ in operator theory denotes $P = (A^{*} A)^{1/2}$ , the Hermitian positive semidefinite "modulus" of $A$ , and $Q$ plays the role of the unimodular "phase".

Exercise 8 (hard, proof).

Let $A$ be an $m \times n$ matrix with SVD $A = \sum_{i = 1}^{r} σ_{i} u_{i} v_{i}^{*}$ (reduced form). For $k \leq r$ define $A_{k} = \sum_{i = 1}^{k} σ_{i} u_{i} v_{i}^{*}$ . Prove that $∥ A - A_{k} ∥_{F}^{2} = σ_{k + 1}^{2} + σ_{k + 2}^{2} + \dots + σ_{r}^{2}$ .

Hint

Use the unitary invariance of the Frobenius norm and the diagonal form of $Σ$ .

Answer

Compute $A - A_{k} = \sum_{i = k + 1}^{r} σ_{i} u_{i} v_{i}^{*} = U Σ_{k} V^{*}$ where $Σ_{k}$ is the rectangular diagonal matrix with $(Σ_{k})_{ii} = 0$ for $i \leq k$ and $(Σ_{k})_{ii} = σ_{i}$ for $k < i \leq r$ . The Frobenius norm is unitarily invariant: $∥ U M V^{*} ∥_{F} = ∥ M ∥_{F}$ for any unitaries $U, V$ and any matrix $M$ , because $∥ M ∥_{F}^{2} = tr (M^{*} M)$ and trace is similarity-invariant. Therefore

∥ A - A_{k} ∥_{F}^{2} = ∥ U Σ_{k} V^{*} ∥_{F}^{2} = ∥ Σ_{k} ∥_{F}^{2} = i = k + 1 \sum r σ_{i}^{2} .

The Eckart-Young theorem (master tier) sharpens this: among all matrices of rank $\leq k$ , the matrix $A_{k}$ achieves the minimum Frobenius distance to $A$ . So $A_{k}$ is the best rank- $k$ approximation to $A$ in Frobenius norm, with error exactly $(σ_{k + 1}^{2} + \dots + σ_{r}^{2})^{1/2}$ .

Lean formalization [Intermediate+]

Mathlib packages the constituent ingredients of the SVD — the conjugate transpose Matrix.conjTranspose, the Hermitian and positive-semidefinite predicates Matrix.IsHermitian and Matrix.PosSemidef, the lemma isHermitian_transpose_mul_self showing $A^{*} A$ is Hermitian for every rectangular matrix $A$ , the spectral theorem on Hermitian matrices, and the polar decomposition for square matrices over $C$ . The companion file Codex.Foundations.LinearAlgebra.SVD records the named statements used above.

[object Promise]

This unit is marked lean_status: partial because Mathlib supplies the spectral theorem on $A^{*} A$ and the constituent positivity and Hermitian-symmetry lemmas, but does not package the rectangular SVD theorem under a single name with a uniqueness statement and the Eckart-Young companion. The corresponding statement in the companion module is left as a sorry-gated alias pending that packaging.

Advanced results [Master]

Eckart-Young-Mirsky theorem. Let $A$ be an $m \times n$ matrix with SVD $A = \sum_{i = 1}^{r} σ_{i} u_{i} v_{i}^{*}$ , and let $∥ \cdot ∥$ be any unitarily invariant norm on $M_{m \times n} (F)$ — a norm satisfying $∥ U A V^{*} ∥ = ∥ A ∥$ for every pair of unitaries $U, V$ . The best rank- $k$ approximation to $A$ in this norm is

A_{k} = i = 1 \sum k σ_{i} u_{i} v_{i}^{*}, rank B \leq k min ∥ A - B ∥ = ∥ diag (0, \dots, 0, σ_{k + 1}, \dots, σ_{r}) ∥.

For the operator norm (spectral norm) $∥ \cdot ∥_{op}$ , the minimum is $σ_{k + 1}$ . For the Frobenius norm $∥ \cdot ∥_{F}$ , the minimum is $(σ_{k + 1}^{2} + \dots + σ_{r}^{2})^{1/2}$ . For the nuclear norm (trace norm) $∥ A ∥_{*} = \sum_{i} σ_{i}$ , the minimum is $σ_{k + 1} + \dots + σ_{r}$ . The original Eckart-Young (1936) treated the Frobenius case; Mirsky (1960) extended the theorem to arbitrary unitarily invariant norms via von Neumann's theorem that every such norm is a symmetric gauge function applied to the singular values ^{[Eckart, C. & Young, G. — The approximation of one matrix by another of lower rank]}.

Proof sketch. Unitarily-invariant norms depend only on singular values. The matrix $A - B$ has singular values bounded below by the Cauchy interlacing inequalities for compressions: writing $σ_{i} (M)$ for the $i$ -th singular value of $M$ , the Weyl inequality gives $σ_{k + i} (A) \leq σ_{i} (A - B) + σ_{k} (B)$ , and if $rank B \leq k$ then $σ_{k + 1} (B) = 0$ , forcing $σ_{i} (A - B) \geq σ_{k + i} (A)$ for $i \geq 1$ . Hence the singular value sequence of $A - B$ majorises $(0, \dots, 0, σ_{k + 1}, σ_{k + 2}, \dots, σ_{r})$ , and any monotone unitarily-invariant norm is minimised when this majorisation is tight, which is exactly the truncation $A_{k}$ .

Polar decomposition and the matrix modulus. For a square $n \times n$ matrix $A$ over $F$ , the polar decomposition $A = QP$ has $Q = U V^{*}$ unitary and $P = V Σ V^{*} = (A^{*} A)^{1/2}$ Hermitian positive semidefinite. The factor $P$ is the modulus of $A$ , denoted $∣ A ∣$ in operator theory, and equals the unique Hermitian positive semidefinite square root of $A^{*} A$ . The polar decomposition is the matrix generalisation of the polar form $z = r e^{i θ}$ of a complex number, with $r \geq 0$ replaced by $P$ and $e^{i θ}$ replaced by $Q$ . The decomposition is unique when $A$ is invertible; when $A$ is singular the factor $P$ is still unique but $Q$ is only unique on the row space of $A$ and can be chosen arbitrarily unitary on the orthogonal complement. Polar decomposition exists in infinite dimensions for bounded operators on a Hilbert space (with $Q$ a partial isometry in general) and is the starting point for the theory of polar coordinates in operator theory ^{[Horn, R. A. & Johnson, C. R. — Matrix Analysis (2nd ed.)]}.

Moore-Penrose pseudoinverse. For an $m \times n$ matrix $A$ with SVD $A = U Σ V^{*}$ , the Moore-Penrose pseudoinverse is

A^{+} = V Σ^{+} U^{*} \in M_{n \times m} (F),

where $Σ^{+}$ is the $n \times m$ rectangular diagonal matrix with $(Σ^{+})_{ii} = 1/ σ_{i}$ for $σ_{i} > 0$ and zero otherwise. The pseudoinverse satisfies the four Moore-Penrose conditions: (i) $A A^{+} A = A$ , (ii) $A^{+} A A^{+} = A^{+}$ , (iii) $(A A^{+})^{*} = A A^{+}$ , (iv) $(A^{+} A)^{*} = A^{+} A$ . These four conditions characterise $A^{+}$ uniquely. Application to least-squares: the equation $A x = b$ may have no exact solution; the vector $x^{*} = A^{+} b$ is the unique vector minimising $∥ A x - b ∥_{2}$ among all minimisers $ar g min_{x} ∥ A x - b ∥_{2}$ , namely the minimum-norm minimiser. The vector $A A^{+} b$ is the orthogonal projection of $b$ onto the column space of $A$ , and $I - A^{+} A$ is the orthogonal projection onto the kernel of $A$ .

Singular values and unitarily invariant norms. A symmetric gauge function on $R^{m i n (m, n)}$ is a norm $ϕ$ that is invariant under permutations and sign changes of its arguments. By von Neumann's theorem (1937), unitarily invariant norms on $M_{m \times n} (F)$ are in bijection with symmetric gauge functions on $R^{m i n (m, n)}$ : every unitarily invariant norm $∥ \cdot ∥$ has the form $∥ A ∥ = ϕ (σ_{1} (A), σ_{2} (A), \dots)$ for a unique symmetric gauge function $ϕ$ . The three canonical examples — operator, Frobenius, nuclear — correspond to $ϕ (x) = max_{i} ∣ x_{i} ∣$ , $ϕ (x) = (\sum_{i} x_{i}^{2})^{1/2}$ , $ϕ (x) = \sum_{i} ∣ x_{i} ∣$ . This identifies the lattice of unitarily invariant matrix norms with the lattice of symmetric gauge functions, and underlies the Schatten $p$ -norms $∥ A ∥_{p} = (\sum_{i} σ_{i}^{p})^{1/ p}$ for $1 \leq p \leq \infty$ , interpolating between the nuclear ( $p = 1$ ), Frobenius ( $p = 2$ ), and operator ( $p = \infty$ ) norms.

Schmidt decomposition for compact operators. A bounded linear operator $T : H \to H$ on a separable Hilbert space $H$ is compact if it maps bounded sets to relatively compact sets, equivalently (in a Hilbert space) if it is the norm limit of a sequence of finite-rank operators. For compact $T$ , the operator $T^{*} T$ is compact, self-adjoint, positive semidefinite, with a countable orthonormal eigenbasis ${v_{n}}$ and corresponding eigenvalues $λ_{n} \geq 0$ accumulating only at zero. Setting $σ_{n} = λ_{n}$ and $u_{n} = T v_{n} / σ_{n}$ for $σ_{n} > 0$ , the Schmidt decomposition

T = n : σ_{n} > 0 \sum σ_{n} u_{n} \otimes v_{n}^{*}

converges in the operator norm, with $σ_{n} \to 0$ . The numbers $σ_{n}$ are the singular numbers of $T$ . Trace-class operators are those with $\sum_{n} σ_{n} < \infty$ (the trace norm is finite); Hilbert-Schmidt operators are those with $\sum_{n} σ_{n}^{2} < \infty$ (the Hilbert-Schmidt norm is finite). The Schatten $p$ -class operators have $\sum_{n} σ_{n}^{p} < \infty$ ; these are the operator-algebraic analogues of $ℓ^{p}$ sequences and form the natural framework for non-commutative integration theory ^{[Schmidt, E. — Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil]}.

Numerical computation. The Golub-Kahan SVD algorithm (1965) computes the SVD of an $m \times n$ matrix in $O (min (m n^{2}, m^{2} n))$ floating-point operations. Stage one reduces $A$ to bidiagonal form by a sequence of Householder reflections from the left and the right, costing $O (mn min (m, n))$ operations. Stage two diagonalises the bidiagonal form by a variant of the QR algorithm specialised to bidiagonal matrices, costing $O (min (m, n)^{2})$ operations per sweep with typically $O (lo g (1/ ϵ))$ sweeps for accuracy $ϵ$ . The algorithm is numerically backward stable: the computed SVD is the exact SVD of a nearby matrix $A + E$ with $∥ E ∥_{F} = O (ϵ_{mach} ∥ A ∥_{F})$ . Modern implementations in LAPACK and the BLAS form the computational backbone of every linear-algebra package and underlie the matrix-factorisation methods in numerical optimisation, scientific computing, and machine learning ^{[Golub, G. H. & Kahan, W. — Calculating the singular values and pseudo-inverse of a matrix]}.

Conjugacy classes of the bi-unitary action. The group $U (m) \times U (n)$ acts on $M_{m \times n} (C)$ by $(P, Q) \cdot A = P A Q^{*}$ . The orbits of this action are parametrised by the singular value tuple $(σ_{1}, σ_{2}, \dots, σ_{m i n (m, n)})$ : two matrices lie in the same orbit iff they have the same singular values. The orbit space is the Weyl chamber ${(σ_{1}, \dots, σ_{m i n (m, n)}) \in R_{\geq 0}^{m i n (m, n)} : σ_{1} \geq \dots \geq σ_{m i n (m, n)} \geq 0}$ . The SVD is the orbit-decomposition: every orbit contains a unique representative of diagonal form with decreasing non-negative entries. This perspective places the SVD in parallel with the Jordan canonical form 01.01.11 (orbit decomposition of $GL_{n}$ on $M_{n}$ by conjugation) and the Cartan decomposition $G = K A K$ for a real reductive Lie group $G$ with maximal compact subgroup $K$ and maximal abelian subgroup $A$ : in the case $G = GL_{n} (C)$ , $K = U (n) \times U (n)$ (acting bi-unitarily), and $A$ the positive diagonal matrices, the Cartan decomposition is the SVD.

Synthesis. First synthesis — SVD as the universal generalisation of diagonalisation: every matrix admits an SVD; only Hermitian positive semidefinite matrices admit a unitary eigendecomposition coinciding with the SVD. The eigenvalue picture from 01.01.08 and the Jordan canonical form from 01.01.11 capture intrinsic features of a square operator under conjugation; the SVD captures extrinsic features of a linear map between inner-product spaces under the bi-unitary action. The two perspectives are complementary: similarity vs unitary equivalence.

Second synthesis — SVD as the geometric content of $A$ : the singular values are the semi-axes of the image ellipsoid of the unit ball, and the unitary factors $U, V$ rotate that ellipsoid into standard position. This packages the operator norm (largest singular value), the condition number ( $σ_{1} / σ_{r}$ for invertible $A$ ), the rank (number of non-zero singular values), the kernel (span of right singular vectors with zero singular value), the image (span of left singular vectors with non-zero singular value), and the low-rank structure (truncated SVD) into a single coordinate-free package.

Third synthesis — SVD as the foundation of least-squares and data analysis: the Moore-Penrose pseudoinverse $A^{+} = V Σ^{+} U^{*}$ delivers the minimum-norm least-squares solution to $A x = b$ in one formula. Principal component analysis is the SVD of the centred data matrix. Latent semantic indexing is the truncated SVD of the term-document matrix. Image compression is the truncated SVD of a grayscale pixel matrix. Recommender systems use matrix completion algorithms that fit a low-rank SVD model to sparse observed entries. In each case, the SVD identifies the dominant structure and quantifies the discarded residual.

Fourth synthesis — SVD as the spectral theorem in disguise: the SVD of $A$ encodes two paired spectral theorems, one on $A^{*} A$ and one on $A A^{*}$ , with the operator $A$ as the intertwiner. The right singular vectors diagonalise $A^{*} A$ , the left singular vectors diagonalise $A A^{*}$ , and the singular values appear as the square roots of the shared non-zero eigenvalues. In the infinite-dimensional Hilbert-space setting, this generalises to the Schmidt decomposition for compact operators and to the polar decomposition for arbitrary bounded operators, with the positive part $∣ A ∣ = (A^{*} A)^{1/2}$ playing the role of the singular value spectrum and the partial-isometry phase playing the role of $U V^{*}$ . The SVD is therefore the finite-dimensional shadow of the spectral and polar decompositions of operator theory.

Full proof set [Master]

Existence of the SVD via the spectral theorem on $A^ A$.* Let $A \in M_{m \times n} (F)$ with $F = R$ or $C$ . The Gram matrix $A^{*} A \in M_{n} (F)$ is Hermitian, since $(A^{*} A)^{*} = A^{*} (A^{*})^{*} = A^{*} A$ , and positive semidefinite, since $v^{*} (A^{*} A) v = (A v)^{*} (A v) = ∥ A v ∥^{2} \geq 0$ for every $v \in F^{n}$ . By the finite-dimensional spectral theorem on Hermitian operators on an inner-product space, $A^{*} A$ admits an orthonormal eigenbasis ${v_{1}, v_{2}, \dots, v_{n}} \subset F^{n}$ with real eigenvalues $λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} \geq 0$ . Set $σ_{i} = λ_{i}$ and let $r$ be the number of non-zero $σ_{i}$ .

For $1 \leq i \leq r$ , define $u_{i} = A v_{i} / σ_{i} \in F^{m}$ . The set ${u_{1}, \dots, u_{r}}$ is orthonormal: $u_{j}^{*} u_{i} = (A v_{j})^{*} (A v_{i}) / (σ_{j} σ_{i}) = v_{j}^{*} (A^{*} A) v_{i} / (σ_{j} σ_{i}) = λ_{i} δ_{ij} / (σ_{j} σ_{i}) = δ_{ij}$ . Extend to an orthonormal basis ${u_{1}, \dots, u_{m}}$ of $F^{m}$ via Gram-Schmidt on the orthogonal complement of the span of the first $r$ . Let $U = [u_{1} ∣ \dots ∣ u_{m}]$ , $V = [v_{1} ∣ \dots ∣ v_{n}]$ , and $Σ$ the $m \times n$ rectangular diagonal matrix with $Σ_{ii} = σ_{i}$ for $i \leq min (m, n)$ . Verify on the basis ${v_{i}}$ : for $i \leq r$ , $A v_{i} = σ_{i} u_{i} = U Σ e_{i} \cdot$ (a column-of- $U$ identity), and for $i > r$ , $A v_{i} = 0$ because $v_{i} \in ker (A^{*} A) = ker A$ . Reassembling in matrix form, $A V = U Σ$ , equivalently $A = U Σ V^{*}$ .

Uniqueness of singular values. The squared singular values $σ_{i}^{2}$ are the eigenvalues of the Hermitian matrix $A^{*} A$ , which are uniquely determined by $A$ as the roots of the characteristic polynomial $χ_{A^{*} A} (t)$ , counted with algebraic multiplicity, and the spectral theorem confirms that algebraic multiplicity equals geometric multiplicity for Hermitian matrices. The decreasing-order convention $σ_{1} \geq σ_{2} \geq \dots$ then fixes the singular value sequence. The matrices $U$ and $V$ are unique up to a unitary block on each constant-singular-value subspace: replacing $V$ by $V W$ with $W$ a unitary block-diagonal matrix preserving singular-value-equality blocks, and correspondingly $U$ by $U W^{'}$ on the matched blocks of left singular vectors. For simple non-zero singular values, the freedom reduces to a phase $e^{i θ}$ (a $\pm 1$ sign over $R$ ).

Existence of polar decomposition. Starting from the SVD $A = U Σ V^{*}$ of a square $n \times n$ matrix, factor as $A = (U V^{*}) (V Σ V^{*})$ . Set $Q = U V^{*}$ (unitary) and $P = V Σ V^{*}$ . Then $P$ is Hermitian ( $P^{*} = V Σ^{*} V^{*} = V Σ V^{*} = P$ , since $Σ$ is real diagonal) and positive semidefinite ( $w^{*} P w = w^{*} V Σ V^{*} w = ∥ Σ^{1/2} V^{*} w ∥^{2} \geq 0$ ). The identity $P^{2} = V Σ^{2} V^{*} = V Σ V^{*} V Σ V^{*} = A^{*} A$ (after expanding using $V^{*} V = I$ and $A^{*} A = V Σ^{2} V^{*}$ ) confirms $P = (A^{*} A)^{1/2}$ , the unique Hermitian positive semidefinite square root.

Uniqueness of polar decomposition for invertible $A$ . Suppose $A = Q_{1} P_{1} = Q_{2} P_{2}$ with $Q_{i}$ unitary and $P_{i}$ Hermitian positive semidefinite. Then $A^{*} A = P_{1}^{*} Q_{1}^{*} Q_{1} P_{1} = P_{1}^{2} = P_{2}^{2}$ . The Hermitian positive semidefinite square root is unique, so $P_{1} = P_{2} = (A^{*} A)^{1/2}$ . When $A$ is invertible, $P_{1}$ is invertible and $Q_{1} = A P_{1}^{- 1} = A P_{2}^{- 1} = Q_{2}$ , so the unitary factor is forced as well.

Moore-Penrose pseudoinverse characterisation. Let $A = U Σ V^{*}$ and set $A^{+} = V Σ^{+} U^{*}$ with $Σ^{+}$ the rectangular diagonal matrix with $Σ_{ii}^{+} = 1/ σ_{i}$ for $σ_{i} > 0$ and zero otherwise. Verify the four Moore-Penrose conditions:

$A A^{+} A = U Σ V^{*} V Σ^{+} U^{*} U Σ V^{*} = U Σ Σ^{+} Σ V^{*} = U Σ V^{*} = A$ , using $V^{*} V = I$ , $U^{*} U = I$ , and the diagonal identity $Σ Σ^{+} Σ = Σ$ (the rectangular diagonal matrix satisfies $Σ_{ii} (Σ^{+})_{ii} Σ_{ii} = Σ_{ii}$ whether $Σ_{ii} = 0$ or not).
$A^{+} A A^{+} = V Σ^{+} U^{*} U Σ V^{*} V Σ^{+} U^{*} = V Σ^{+} Σ Σ^{+} U^{*} = V Σ^{+} U^{*} = A^{+}$ by the analogous diagonal identity $Σ^{+} Σ Σ^{+} = Σ^{+}$ .
$(A A^{+})^{*} = (U Σ Σ^{+} U^{*})^{*} = U (Σ Σ^{+})^{*} U^{*} = U Σ Σ^{+} U^{*} = A A^{+}$ , since $Σ Σ^{+}$ is a real diagonal matrix with entries in ${0, 1}$ , hence Hermitian.
$(A^{+} A)^{*} = (V Σ^{+} Σ V^{*})^{*} = V Σ^{+} Σ V^{*} = A^{+} A$ by the same argument.

Uniqueness follows from the four conditions: any $X$ satisfying $A X A = A$ , $X A X = X$ , $(A X)^{*} = A X$ , $(X A)^{*} = X A$ equals $A^{+}$ , because $X = X A X = X (A X)^{*} = X X^{*} A^{*} = X X^{*} (A A^{+} A)^{*} = X (A X)^{*} (A^{+})^{*} A^{*} = (X A)^{*} (A^{+})^{*} A^{*} \dots$ — a sequence of substitutions using the four conditions forces $X = A^{+}$ .

Eckart-Young in Frobenius norm. Let $A = \sum_{i = 1}^{r} σ_{i} u_{i} v_{i}^{*}$ and $A_{k} = \sum_{i = 1}^{k} σ_{i} u_{i} v_{i}^{*}$ . Suppose $B$ has rank at most $k$ ; the goal is $∥ A - B ∥_{F} \geq ∥ A - A_{k} ∥_{F}$ . The kernel of $B$ has dimension $\geq n - k$ , so there exist orthonormal vectors $w_{1}, \dots, w_{n - k}$ in $ker B \cap span {v_{1}, \dots, v_{k + 1}}^{*}$ — by dimension count, $ker B$ intersects $span {v_{1}, \dots, v_{k + 1}}$ in a subspace of dimension at least $(n - k) + (k + 1) - n = 1$ . Pick a unit vector $w \in ker B \cap span {v_{1}, \dots, v_{k + 1}}$ , write $w = \sum_{i = 1}^{k + 1} α_{i} v_{i}$ with $\sum ∣ α_{i} ∣^{2} = 1$ . Then

∥ A - B ∥_{op}^{2} \geq ∥ (A - B) w ∥^{2} = ∥ A w ∥^{2} = i = 1 \sum k + 1 α_{i} σ_{i} u_{i}^{2} = i = 1 \sum k + 1 ∣ α_{i} ∣^{2} σ_{i}^{2} \geq σ_{k + 1}^{2},

using $σ_{i} \geq σ_{k + 1}$ for $i \leq k + 1$ and $\sum ∣ α_{i} ∣^{2} = 1$ . So $∥ A - B ∥_{op} \geq σ_{k + 1} = ∥ A - A_{k} ∥_{op}$ , proving Eckart-Young in operator norm.

For the Frobenius norm, the same dimension-count argument generalises. Choose orthonormal vectors $w_{1}, \dots, w_{r - k}$ in $ker B \cap span {v_{1}, \dots, v_{r}}$ (dimension at least $(n - k) + r - n = r - k$ when $r \leq n$ ). Then $\sum_{j} ∥ (A - B) w_{j} ∥^{2} = \sum_{j} ∥ A w_{j} ∥^{2}$ , and the right-hand side is bounded below by $σ_{k + 1}^{2} + σ_{k + 2}^{2} + \dots + σ_{r}^{2}$ via a Courant-Fischer-type minimax argument on the singular value sum. Hence $∥ A - B ∥_{F}^{2} \geq σ_{k + 1}^{2} + \dots + σ_{r}^{2} = ∥ A - A_{k} ∥_{F}^{2}$ .

The Mirsky extension to arbitrary unitarily invariant norms uses the Ky Fan dominance theorem: a vector $x$ majorises $y$ iff $ϕ (x) \geq ϕ (y)$ for every symmetric gauge function $ϕ$ . Since the singular value sequence of $A - B$ majorises $(0, \dots, 0, σ_{k + 1}, \dots, σ_{r}, 0, \dots)$ , every unitarily invariant norm achieves its minimum on the Eckart-Young truncation $A_{k}$ .

Connections [Master]

Eigenvalue, eigenvector, characteristic polynomial 01.01.08 — supplies the spectral theorem on Hermitian operators that drives the existence proof of the SVD. The right singular vectors of $A$ are the eigenvectors of $A^{*} A$ ; the singular values are the non-negative square roots of the eigenvalues of $A^{*} A$ ; the left singular vectors are the eigenvectors of $A A^{*}$ . The SVD is therefore the linear-algebra construction that converts the spectral theorem on the Hermitian positive semidefinite Gram matrix into a factorisation of $A$ itself, applicable to arbitrary rectangular matrices. The Cayley-Hamilton identity from 01.01.08 specialises on $A^{*} A$ to give finite-dimensional polynomial identities for the singular values, including $det (t I - A^{*} A) = \prod_{i} (t - σ_{i}^{2})$ .
Jordan canonical form and minimal polynomial 01.01.11 — the SVD and the Jordan canonical form are the two universal classifications of matrices, parallel and complementary. The Jordan form is the complete invariant of an $n \times n$ matrix under conjugation by $GL_{n} (C)$ ; the SVD is the complete invariant of an $m \times n$ matrix under the bi-unitary action of $U (m) \times U (n)$ . The Jordan form respects the algebraic structure of the operator (eigenvalues, generalised eigenspaces, nilpotent part); the SVD respects the geometric structure of the linear map between inner-product spaces (singular values as semi-axes of the image ellipsoid). For Hermitian positive semidefinite matrices, the two coincide: the Jordan form is diagonal, the SVD has $U = V$ , and the singular values equal the eigenvalues. For everything else they diverge — and both diverge from the eigendecomposition picture.
Inner-product space: orthogonality, Gram-Schmidt, spectral theorem 01.01.09 pending — supplies the orthonormalisation machinery and the finite-dimensional spectral theorem on which the SVD existence proof depends. The Gram-Schmidt procedure of 01.01.09 pending is used in the existence proof to extend the partial orthonormal set ${u_{1}, \dots, u_{r}}$ of left singular vectors to a full orthonormal basis of $F^{m}$ , completing the unitary $U$ . The spectral theorem on the Hermitian operator $A^{*} A$ is the load-bearing input. The SVD is the rectangular extension of the spectral theorem and reduces to the spectral theorem when $A$ is itself Hermitian positive semidefinite.
Bounded and unbounded operators on Hilbert space 02.11.03 — the infinite-dimensional generalisation in which the finite-dimensional SVD becomes the Schmidt decomposition for compact operators on a Hilbert space, the polar decomposition for bounded operators (with $A = Q ∣ A ∣$ for $Q$ a partial isometry and $∣ A ∣ = (A^{*} A)^{1/2}$ ), and the spectral measure for self-adjoint operators (Stone-von Neumann). The singular numbers of a compact operator $T$ — the non-zero eigenvalues of $∣ T ∣$ — generalise singular values and underlie the Schatten $p$ -class operators, the trace-class operators ( $p = 1$ ), the Hilbert-Schmidt operators ( $p = 2$ ), and the operator algebras of non-commutative integration. The finite-dimensional SVD is the rank-finite, no-limiting-condition shadow of this hierarchy.
Least-squares regression and the normal equations [TODO future unit on regression] — the SVD-based pseudoinverse $A^{+} = V Σ^{+} U^{*}$ solves the general least-squares problem $min_{x} ∥ A x - b ∥_{2}$ in one closed-form expression. The classical normal equations $A^{*} A x = A^{*} b$ recover the same solution when $A$ has full column rank, but break down when $A$ is rank-deficient or ill-conditioned. The pseudoinverse handles both cases uniformly: it returns the minimum-norm least-squares solution, projecting $b$ onto the column space of $A$ and resolving the kernel degree of freedom by minimising $∥ x ∥$ . Tikhonov regularisation $min_{x} ∥ A x - b ∥^{2} + λ^{2} ∥ x ∥^{2}$ is equivalent to filtering the singular values, replacing $σ_{i}$ by $σ_{i}^{2} / (σ_{i}^{2} + λ^{2})$ in the pseudoinverse expansion, and provides numerical stability when $σ_{r}$ is small.
Principal component analysis and data dimensionality reduction [TODO future unit] — for a centred data matrix $X \in M_{N \times d} (R)$ with $N$ samples in $R^{d}$ , the right singular vectors of $X$ (columns of $V$ ) are the principal directions of the data, ordered by decreasing variance $σ_{i}^{2} / N$ . The $k$ -truncated SVD $X_{k} = U_{k} Σ_{k} V_{k}^{T}$ is the best rank- $k$ approximation in Frobenius norm by Eckart-Young; projecting the data onto the span of ${v_{1}, \dots, v_{k}}$ retains the maximum total variance among all $k$ -dimensional linear subspaces. PCA is therefore the SVD of the data matrix, with the singular values quantifying the variance explained. Variants — kernel PCA, sparse PCA, robust PCA — replace the Euclidean inner product, the rank- $k$ constraint, or the assumed Gaussian noise model, but all share the SVD-of-a-data-matrix scaffolding.

Historical & philosophical context [Master]

The singular value decomposition has independent origins in the work of Eugenio Beltrami and Camille Jordan in the early 1870s. Beltrami, in Sulle funzioni bilineari (Giornale di Matematiche ad Uso degli Studenti Delle Università 11, 1873, 98–106), proved the existence of a bilinear-form factorisation of a square real matrix into two orthogonal transformations and a diagonal matrix of non-negative entries ^{[Beltrami, E. — Sulle funzioni bilineari]}. One year later, Camille Jordan, in Mémoire sur les formes bilinéaires (Journal de Mathématiques Pures et Appliquées, 2e série, 19, 1874, 35–54), gave an independent treatment for square matrices using a variational approach, characterising the singular values as critical values of the bilinear form $⟨ A x, y ⟩$ on the product of two unit spheres ^{[Jordan, C. — Mémoire sur les formes bilinéaires]}. James Joseph Sylvester, in Sur la réduction biorthogonale d'une forme linéo-linéaire à sa forme canonique (Comptes Rendus de l'Académie des Sciences 108, 1889, 651–653), extended the decomposition to rectangular matrices and named the diagonal entries the "canonical multipliers" of the form ^{[Sylvester, J. J. — Sur la réduction biorthogonale d'une forme linéo-linéaire à sa forme canonique]}.

The infinite-dimensional generalisation came from Erhard Schmidt in Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil (Mathematische Annalen 63, 1907, 433–476), in the context of Hilbert's programme on integral equations. Schmidt proved that every compact (then called "completely continuous") integral operator on $L^{2}$ admits the decomposition $T = \sum_{n} σ_{n} u_{n} \otimes v_{n}^{*}$ with singular numbers $σ_{n} \to 0$ — the Schmidt decomposition of compact operators ^{[Schmidt, E. — Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil]}. Hermann Weyl, in Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (Mathematische Annalen 71, 1912, 441–479) and subsequent papers, unified the spectral perspectives on Hermitian and non-Hermitian operators and gave the Weyl inequalities relating singular values to eigenvalues of arbitrary square matrices, central to the modern theory of matrix inequalities and majorisation ^{[Weyl, H. — Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen]}.

The low-rank approximation theorem was proved by Carl Eckart and Gale Young in The approximation of one matrix by another of lower rank (Psychometrika 1, 1936, 211–218), motivated by the factor analysis of psychological-test data; the theorem identifies the truncated SVD as the optimal rank- $k$ approximation in Frobenius norm ^{[Eckart, C. & Young, G. — The approximation of one matrix by another of lower rank]}. Leon Mirsky, in Symmetric gauge functions and unitarily invariant norms (Quarterly Journal of Mathematics, Oxford Series, 11, 1960, 50–59), extended Eckart-Young to arbitrary unitarily invariant norms using von Neumann's earlier symmetric-gauge-function theorem ^{[Mirsky, L. — Symmetric gauge functions and unitarily invariant norms]}.

The modern numerical algorithm for computing the SVD was developed by Gene Golub and William Kahan in Calculating the singular values and pseudo-inverse of a matrix (Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis 2, 1965, 205–224) and refined in Golub-Reinsch 1970, with bidiagonalisation by Householder reflections followed by a specialised QR sweep ^{[Golub, G. H. & Kahan, W. — Calculating the singular values and pseudo-inverse of a matrix]}. The Golub-Kahan algorithm and its descendants are implemented in LAPACK and underlie every numerical linear-algebra package. The SVD entered data analysis through factor analysis in psychometrics (Eckart-Young 1936; Tucker 1966), principal component analysis (Hotelling 1933 for the covariance-matrix eigenvalue formulation, recast via SVD in the second half of the twentieth century), and latent semantic indexing (Deerwester et al. 1990 for information retrieval). Recommender systems, image compression (JPEG-2000 uses SVD-adjacent transforms), and large-scale machine learning all rely on truncated-SVD computations.

Bibliography [Master]

Nineteenth-century origins.

Beltrami, E., "Sulle funzioni bilineari", Giornale di Matematiche ad Uso degli Studenti Delle Università 11 (1873), 98–106.
Jordan, C., "Mémoire sur les formes bilinéaires", Journal de Mathématiques Pures et Appliquées, 2e série, 19 (1874), 35–54.
Sylvester, J. J., "Sur la réduction biorthogonale d'une forme linéo-linéaire à sa forme canonique", Comptes Rendus de l'Académie des Sciences 108 (1889), 651–653.

Hilbert-space generalisation.

Schmidt, E., "Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil", Mathematische Annalen 63 (1907), 433–476.
Weyl, H., "Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen", Mathematische Annalen 71 (1912), 441–479.

Low-rank approximation.

Eckart, C. & Young, G., "The approximation of one matrix by another of lower rank", Psychometrika 1 (1936), 211–218.
Mirsky, L., "Symmetric gauge functions and unitarily invariant norms", Quarterly Journal of Mathematics, Oxford Series, 11 (1960), 50–59.
von Neumann, J., "Some matrix inequalities and metrization of matric-space", Tomsk University Review 1 (1937), 286–300.

Numerical computation.

Golub, G. H. & Kahan, W., "Calculating the singular values and pseudo-inverse of a matrix", Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis 2 (1965), 205–224.
Golub, G. H. & Reinsch, C., "Singular value decomposition and least squares solutions", Numerische Mathematik 14 (1970), 403–420.
Golub, G. H. & Van Loan, C. F., Matrix Computations, 4th ed., Johns Hopkins University Press, 2013.
Trefethen, L. N. & Bau, D., Numerical Linear Algebra, SIAM, 1997.

Modern textbook treatments.

Horn, R. A. & Johnson, C. R., Matrix Analysis, 2nd ed., Cambridge University Press, 2013, Ch. 7.
Strang, G., Introduction to Linear Algebra, 5th ed., Wellesley-Cambridge Press, 2016, Ch. 7.
Stewart, G. W., "On the early history of the singular value decomposition", SIAM Review 35 (1993), 551–566.
Bhatia, R., Matrix Analysis, Graduate Texts in Mathematics 169, Springer, 1997.

Applications.

Hotelling, H., "Analysis of a complex of statistical variables into principal components", Journal of Educational Psychology 24 (1933), 417–441, 498–520.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R., "Indexing by latent semantic analysis", Journal of the American Society for Information Science 41 (1990), 391–407.
Candès, E. J. & Recht, B., "Exact matrix completion via convex optimization", Foundations of Computational Mathematics 9 (2009), 717–772.

Prerequisites

01.01.08

Tier anchors

beginner: Rotate-stretch-rotate — Strang, Introduction to Linear Algebra Ch. 7 (informal opening)
intermediate: Horn-Johnson — Matrix Analysis §7.3; Strang — Introduction to Linear Algebra Ch. 7; Axler — Linear Algebra Done Right §7.E
master: Horn-Johnson — Matrix Analysis Ch. 7; Golub-Van Loan — Matrix Computations Ch. 2 + Ch. 8; Trefethen-Bau — Numerical Linear Algebra Lectures 4–5; Kato — Perturbation Theory for Linear Operators Ch. I §5

References

textbooks-extra
Calculus Vol.2 - Multi-Variable Calculus and Linear Algebra with Applications (Tom Apostol).pdf · Ch. 5 §5.6 (principal axes for real symmetric matrices) — symmetric case of the SVD
TODO_REF
Horn, R. A. & Johnson, C. R. — Matrix Analysis (2nd ed.) · Ch. 7, singular value decomposition, polar decomposition, Eckart-Young theorem
TODO_REF
Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.) · Ch. 2 §2.4 (SVD basics) and Ch. 8 §8.6 (Golub-Kahan algorithm)
TODO_REF
Trefethen, L. N. & Bau, D. — Numerical Linear Algebra · Lectures 4 and 5, SVD and its applications
TODO_REF
Beltrami, E. — Sulle funzioni bilineari · Giornale di Matematiche ad Uso degli Studenti Delle Università 11 (1873), 98–106 — first appearance of the SVD for square real matrices
TODO_REF
Jordan, C. — Mémoire sur les formes bilinéaires · Journal de Mathématiques Pures et Appliquées, 2e série, 19 (1874), 35–54 — independent discovery of the SVD
TODO_REF
Sylvester, J. J. — Sur la réduction biorthogonale d'une forme linéo-linéaire à sa forme canonique · Comptes Rendus de l'Académie des Sciences 108 (1889), 651–653 — rectangular case of the SVD
TODO_REF
Schmidt, E. — Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil · Mathematische Annalen 63 (1907), 433–476 — infinite-dimensional / integral-operator generalisation
TODO_REF
Weyl, H. — Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen · Mathematische Annalen 71 (1912), 441–479 — unified spectral framework for the SVD-adjacent eigenvalue inequalities
TODO_REF
Eckart, C. & Young, G. — The approximation of one matrix by another of lower rank · Psychometrika 1 (1936), 211–218 — best low-rank approximation theorem in Frobenius norm
TODO_REF
Mirsky, L. — Symmetric gauge functions and unitarily invariant norms · Quarterly Journal of Mathematics, Oxford Series, 11 (1960), 50–59 — Eckart-Young for arbitrary unitarily invariant norms
TODO_REF
Golub, G. H. & Kahan, W. — Calculating the singular values and pseudo-inverse of a matrix · Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis 2 (1965), 205–224 — numerical SVD algorithm

Lean module

Codex.Foundations.LinearAlgebra.SVD

Mathlib gap

Mathlib packages the constituent ingredients of the singular value decomposition — the conjugate transpose Matrix.conjTranspose, the Hermitian and positive-semidefinite predicates Matrix.IsHermitian and Matrix.PosSemidef with isHermitian_transpose_mul_self showing Aᴴ * A is Hermitian for every rectangular matrix A, the finite-dimensional spectral theorem on Hermitian matrices LinearMap.IsSymmetric.eigenvectorBasis and Matrix.IsHermitian.spectralTheorem, the principal-square-root Matrix.PosSemidef.sqrt, and the polar decomposition Matrix.polarDecomposition for square matrices over ℂ in Mathlib.LinearAlgebra.Matrix.PolarDecomposition. What is not packaged as a single self-contained Mathlib module is the rectangular SVD theorem in its textbook statement A = U Σ Vᴴ with U, V unitary of sizes m × m, n × n and Σ an m × n diagonal matrix of non-negative real entries, together with the named-singular-value uniqueness lemma and the Eckart-Young best-rank-k approximation theorem in Frobenius and operator norms; the Moore-Penrose pseudoinverse Matrix.MoorePenrose.pinv is in development for general RCLike fields. The Schmidt-decomposition generalisation for compact operators on a Hilbert space, with σₙ → 0, requires the trace-class / Hilbert-Schmidt infrastructure that Mathlib is still building out in Mathlib.Analysis.InnerProductSpace.Spectrum and Mathlib.Analysis.NormedSpace.HilbertSchmidt.

Reviewer

TBD

Estimated time

beginner: 20m
intermediate: 45m
master: 90m