01.01.12 · foundations / linear-algebra

Singular value decomposition (finite-dim)

shipped3 tiersLean: partial

Anchor (Master): Horn-Johnson — Matrix Analysis Ch. 7; Golub-Van Loan — Matrix Computations Ch. 2 + Ch. 8; Trefethen-Bau — Numerical Linear Algebra Lectures 4–5; Kato — Perturbation Theory for Linear Operators Ch. I §5

Intuition [Beginner]

A linear map sends the unit disk in its input space to an ellipse — possibly squashed, possibly stretched, possibly rotated. The singular value decomposition is the recipe for that ellipse. It says every linear map, no matter how complicated its matrix looks, is built out of three simple ingredients in order: rotate the input, stretch along the perpendicular coordinate axes by independent factors, rotate the output.

The stretch factors are the singular values. They are non-negative numbers, ordered from largest to smallest. The largest singular value is the longest semi-axis of the output ellipse — the biggest stretch the map performs anywhere on the unit disk. The smallest non-zero singular value is the shortest semi-axis — the smallest stretch that does not collapse to a point. Zero singular values record directions the map collapses to a point.

Every matrix, square or rectangular, real or complex, admits this factorisation. The eigenvalue picture from 01.01.08 requires the matrix to be square, and even then the eigenvectors may fail to be orthogonal or fail to span the space. The singular value picture has none of those defects. It applies to every matrix, the singular vectors are always orthogonal, and they always span the input and output spaces.

Visual [Beginner]

The picture shows the unit disk in the plane, then four stages of the map . Stage one is the original unit disk with its two coordinate axes drawn. Stage two is the rotation by : the disk is unchanged but the axes have been turned. Stage three is the stretch by : the disk has become an ellipse with semi-axes and , the singular values. Stage four is the final rotation by : the ellipse has been turned to its final orientation. The two singular values are the semi-axis lengths of the final ellipse.

On the left, a unit disk with two perpendicular coordinate axes; following three arrows to the right, three stages of the action of A equals U Sigma V-star are shown. First the rotation by V-star turns the input axes. Then the diagonal matrix Sigma stretches the disk into an ellipse with semi-axes of lengths sigma_1 and sigma_2. Finally the rotation by U turns the ellipse to its final orientation. The two singular values sigma_1 greater than or equal to sigma_2 are labelled as the semi-axes of the output ellipse, and the formula A equals U Sigma V-star appears at the top.

The geometry is the whole story. Any linear map factors as rotate, stretch along perpendicular axes, rotate. The number of non-zero stretch factors is the rank of the map. The largest stretch factor is the biggest number by which the map can magnify a vector — the operator norm — and the ratio of the largest to the smallest non-zero stretch is the condition number, the amplification of relative error in numerical computation.

Worked example [Beginner]

Take the two-by-two matrix

This matrix already stretches the horizontal axis by and the vertical axis by , with no rotation. Its singular value decomposition is the simplest possible: with itself, the identity. The singular values are and . The right singular vectors are the columns of the input identity, and ; the left singular vectors are the columns of the output identity, the same two vectors.

A less obvious example. Take

To find the singular values, compute :

The eigenvalues of are the roots of , namely . The singular values are the non-negative square roots: and . The product equals the absolute determinant of , as it must.

What this tells us. The matrix has eigenvalue (with algebraic multiplicity ) but its action on the unit disk is highly non-uniform: it stretches one direction by a factor of and compresses another by a factor of . The eigenvalues hide this asymmetry. The singular values expose it directly.

Check your understanding [Beginner]

Formal definition [Intermediate+]

Let denote or , and write for the conjugate transpose of a matrix over (so over ). A singular value decomposition of an matrix over is a factorisation

with:

  • unitary, i.e., ;
  • unitary, i.e., ;
  • a rectangular "diagonal" matrix, meaning for , with a decreasing sequence of non-negative real numbers , where .

The numbers are the singular values of ; the non-zero ones are . The columns of are the right singular vectors of — an orthonormal basis of the input space — and the columns of are the left singular vectors — an orthonormal basis of the output space . The pairing for links the two bases [Horn, R. A. & Johnson, C. R. — Matrix Analysis (2nd ed.)].

Relation to eigenvalues of and . The Gram matrices and are Hermitian and positive semidefinite, hence their eigenvalues are non-negative real numbers. The singular values of are the non-negative square roots of the eigenvalues of (equivalently of ), counted with multiplicity. Concretely, if is an orthonormal eigenbasis for with eigenvalues , then , the columns of are the right singular vectors, and the left singular vectors are obtained by setting when .

Reduced versus full SVD. The factorisation written above is the full SVD with of size and of size . The reduced (or thin) SVD discards the columns of and that pair with zero singular values: writing and letting be the matrices whose columns are the first left and right singular vectors, the diagonal of non-zero singular values, the reduced SVD is

The dyadic sum on the right exhibits as a sum of rank- matrices, ordered by decreasing singular value. This is the form used by Eckart-Young: truncating the sum at produces the best rank- approximation to (master tier).

Uniqueness. The singular values are uniquely determined by . The unitary matrices and are unique up to (i) replacing by for any phases on each pair with a simple non-zero singular value (over the phase is ), and (ii) replacing the columns corresponding to a repeated singular value for by a unitary recombination within that block, and a corresponding unitary recombination on the kernel and cokernel (for ).

Operator norm and Frobenius norm. The operator norm (or spectral norm) of is , the largest singular value. The Frobenius norm is , the root-sum-square of the singular values. Both norms are unitarily invariant: for any unitaries , which makes them functions of the singular values alone.

Counterexamples to common slips

  • Singular values are not eigenvalues. For the Jordan block , the only eigenvalue is with algebraic multiplicity , but the singular values are and . The two notions coincide for Hermitian positive semidefinite matrices and otherwise differ.

  • The convention is universal in the modern literature, but the matrix in the factorisation is not uniquely determined as a matrix on the nose — only its diagonal entries are (up to the ordering convention). Permutations of singular values force corresponding column permutations on and .

  • The full SVD has of size and of size , both square unitary. For a tall matrix (), the matrix is with the diagonal entries in its top block and zero rows below; for a wide matrix (), the diagonal entries fill an block on the left and zero columns on the right. The reduced SVD keeps only the non-zero structure.

  • The eigenvalue decomposition of a square diagonalisable matrix is, in general, not an SVD. The matrix need not be unitary, and may contain negative or complex entries. The SVD coincides with the eigendecomposition iff is Hermitian positive semidefinite, in which case can be chosen unitary and has non-negative real diagonal.

Key theorem with proof [Intermediate+]

Theorem (existence of the singular value decomposition). Let be an matrix over or , and let . There exist unitary matrices and and an rectangular diagonal matrix with real diagonal entries such that . [Horn, R. A. & Johnson, C. R. — Matrix Analysis (2nd ed.)]

Proof. The argument has three steps: build the right singular vectors from the spectral theorem on , define the left singular vectors as the images rescaled, and complete to a full unitary by orthogonal extension.

Step 1: Right singular vectors and singular values. The matrix is Hermitian, since , and positive semidefinite: for any , , with equality iff , iff . By the finite-dimensional spectral theorem for Hermitian operators [01.01.08, master tier], has an orthonormal eigenbasis with real non-negative eigenvalues. Let be these eigenvalues in decreasing order, and let be a corresponding orthonormal eigenbasis: and .

Define for . The number is the -th singular value of . Let be the largest index with , so . The rank of equals because (the equality iff ) and the rank of is number of non-zero eigenvalues of .

Step 2: Left singular vectors via the image map. For each with , define the -th left singular vector by

The vectors are orthonormal. To verify: for any with ,

using and the orthonormality of the . So is an orthonormal set.

Step 3: Completion and the matrix factorisation. Extend to an orthonormal basis of by Gram-Schmidt or by completing through the orthogonal complement of the span. Form the matrices and with the singular vectors as columns. Both are unitary because their columns are orthonormal bases. Let be the rectangular diagonal matrix with for and zero off the diagonal.

Verify by computing both sides on the orthonormal basis . For , the right-hand side gives , and the left-hand side gives by definition of . For , the right-hand side gives (the -th diagonal entry of is zero), and the left-hand side gives because . The two matrices agree on a basis of , so they are equal.

Corollary (uniqueness of singular values). The singular values of are uniquely determined by : any two singular value decompositions of have the same diagonal entries of , listed in the same decreasing order. The proof follows because the are the eigenvalues of , which are uniquely determined by , and the singular value convention fixes the order.

Bridge. The singular value decomposition is the universal factorisation of a linear map between inner-product spaces. The existence proof unpacks into four interlocking syntheses, each connecting the SVD to a downstream structure that uses singular values as its building block.

First synthesis: SVD as the orbit decomposition of bi-unitary action. The group acts on complex matrices by . The orbits of this action are classified by the singular value tuple : two matrices lie in the same orbit iff they have the same singular values. The SVD itself is the statement that the orbit through contains a unique representative of the form . This packages the entire theorem as the orbit-classification result for the bi-unitary action, an instance of the more general invariant-theoretic perspective on linear-algebra canonical forms.

Second synthesis: SVD and the Moore-Penrose pseudoinverse. The least-squares problem has a unique minimum-norm solution given by with the Moore-Penrose pseudoinverse of , where is the rectangular diagonal matrix with reciprocated non-zero singular values. The SVD makes the pseudoinverse explicit and reveals its geometric content: inverts the action of on its row space and annihilates the cokernel. The pseudoinverse underlies linear regression, the normal equations, and the Tikhonov-regularised least-squares used when is small.

Third synthesis: SVD as the building block of principal component analysis. For a centred data matrix with samples in , the SVD identifies the right singular vectors (columns of ) as the principal directions of the data: the directions of maximum variance, ordered by decreasing variance . Truncating the SVD at singular values projects the data onto the -dimensional subspace capturing the maximum total variance — the principal component projection. The Eckart-Young theorem (master tier) is the optimality statement: this truncation is the best rank- approximation in Frobenius norm.

Fourth synthesis: SVD as the finite-dimensional Schmidt decomposition. For compact operators on a separable Hilbert space, the Schmidt decomposition writes with decreasing to zero and orthonormal sequences. Finite-rank operators recover the finite-dimensional SVD; general compact operators add the limiting condition . The singular values are the singular numbers of in the infinite-dimensional setting, foundational for the theory of trace-class and Hilbert-Schmidt operators and for the quantum-information notion of entanglement entropy (where the squared singular values of a bipartite density matrix are the Schmidt coefficients).

Exercises [Intermediate+]

Lean formalization [Intermediate+]

Mathlib packages the constituent ingredients of the SVD — the conjugate transpose Matrix.conjTranspose, the Hermitian and positive-semidefinite predicates Matrix.IsHermitian and Matrix.PosSemidef, the lemma isHermitian_transpose_mul_self showing is Hermitian for every rectangular matrix , the spectral theorem on Hermitian matrices, and the polar decomposition for square matrices over . The companion file Codex.Foundations.LinearAlgebra.SVD records the named statements used above.

[object Promise]

This unit is marked lean_status: partial because Mathlib supplies the spectral theorem on and the constituent positivity and Hermitian-symmetry lemmas, but does not package the rectangular SVD theorem under a single name with a uniqueness statement and the Eckart-Young companion. The corresponding statement in the companion module is left as a sorry-gated alias pending that packaging.

Advanced results [Master]

Eckart-Young-Mirsky theorem. Let be an matrix with SVD , and let be any unitarily invariant norm on — a norm satisfying for every pair of unitaries . The best rank- approximation to in this norm is

For the operator norm (spectral norm) , the minimum is . For the Frobenius norm , the minimum is . For the nuclear norm (trace norm) , the minimum is . The original Eckart-Young (1936) treated the Frobenius case; Mirsky (1960) extended the theorem to arbitrary unitarily invariant norms via von Neumann's theorem that every such norm is a symmetric gauge function applied to the singular values [Eckart, C. & Young, G. — The approximation of one matrix by another of lower rank].

Proof sketch. Unitarily-invariant norms depend only on singular values. The matrix has singular values bounded below by the Cauchy interlacing inequalities for compressions: writing for the -th singular value of , the Weyl inequality gives , and if then , forcing for . Hence the singular value sequence of majorises , and any monotone unitarily-invariant norm is minimised when this majorisation is tight, which is exactly the truncation .

Polar decomposition and the matrix modulus. For a square matrix over , the polar decomposition has unitary and Hermitian positive semidefinite. The factor is the modulus of , denoted in operator theory, and equals the unique Hermitian positive semidefinite square root of . The polar decomposition is the matrix generalisation of the polar form of a complex number, with replaced by and replaced by . The decomposition is unique when is invertible; when is singular the factor is still unique but is only unique on the row space of and can be chosen arbitrarily unitary on the orthogonal complement. Polar decomposition exists in infinite dimensions for bounded operators on a Hilbert space (with a partial isometry in general) and is the starting point for the theory of polar coordinates in operator theory [Horn, R. A. & Johnson, C. R. — Matrix Analysis (2nd ed.)].

Moore-Penrose pseudoinverse. For an matrix with SVD , the Moore-Penrose pseudoinverse is

where is the rectangular diagonal matrix with for and zero otherwise. The pseudoinverse satisfies the four Moore-Penrose conditions: (i) , (ii) , (iii) , (iv) . These four conditions characterise uniquely. Application to least-squares: the equation may have no exact solution; the vector is the unique vector minimising among all minimisers , namely the minimum-norm minimiser. The vector is the orthogonal projection of onto the column space of , and is the orthogonal projection onto the kernel of .

Singular values and unitarily invariant norms. A symmetric gauge function on is a norm that is invariant under permutations and sign changes of its arguments. By von Neumann's theorem (1937), unitarily invariant norms on are in bijection with symmetric gauge functions on : every unitarily invariant norm has the form for a unique symmetric gauge function . The three canonical examples — operator, Frobenius, nuclear — correspond to , , . This identifies the lattice of unitarily invariant matrix norms with the lattice of symmetric gauge functions, and underlies the Schatten -norms for , interpolating between the nuclear (), Frobenius (), and operator () norms.

Schmidt decomposition for compact operators. A bounded linear operator on a separable Hilbert space is compact if it maps bounded sets to relatively compact sets, equivalently (in a Hilbert space) if it is the norm limit of a sequence of finite-rank operators. For compact , the operator is compact, self-adjoint, positive semidefinite, with a countable orthonormal eigenbasis and corresponding eigenvalues accumulating only at zero. Setting and for , the Schmidt decomposition

converges in the operator norm, with . The numbers are the singular numbers of . Trace-class operators are those with (the trace norm is finite); Hilbert-Schmidt operators are those with (the Hilbert-Schmidt norm is finite). The Schatten -class operators have ; these are the operator-algebraic analogues of sequences and form the natural framework for non-commutative integration theory [Schmidt, E. — Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil].

Numerical computation. The Golub-Kahan SVD algorithm (1965) computes the SVD of an matrix in floating-point operations. Stage one reduces to bidiagonal form by a sequence of Householder reflections from the left and the right, costing operations. Stage two diagonalises the bidiagonal form by a variant of the QR algorithm specialised to bidiagonal matrices, costing operations per sweep with typically sweeps for accuracy . The algorithm is numerically backward stable: the computed SVD is the exact SVD of a nearby matrix with . Modern implementations in LAPACK and the BLAS form the computational backbone of every linear-algebra package and underlie the matrix-factorisation methods in numerical optimisation, scientific computing, and machine learning [Golub, G. H. & Kahan, W. — Calculating the singular values and pseudo-inverse of a matrix].

Conjugacy classes of the bi-unitary action. The group acts on by . The orbits of this action are parametrised by the singular value tuple : two matrices lie in the same orbit iff they have the same singular values. The orbit space is the Weyl chamber . The SVD is the orbit-decomposition: every orbit contains a unique representative of diagonal form with decreasing non-negative entries. This perspective places the SVD in parallel with the Jordan canonical form 01.01.11 (orbit decomposition of on by conjugation) and the Cartan decomposition for a real reductive Lie group with maximal compact subgroup and maximal abelian subgroup : in the case , (acting bi-unitarily), and the positive diagonal matrices, the Cartan decomposition is the SVD.

Synthesis. First synthesisSVD as the universal generalisation of diagonalisation: every matrix admits an SVD; only Hermitian positive semidefinite matrices admit a unitary eigendecomposition coinciding with the SVD. The eigenvalue picture from 01.01.08 and the Jordan canonical form from 01.01.11 capture intrinsic features of a square operator under conjugation; the SVD captures extrinsic features of a linear map between inner-product spaces under the bi-unitary action. The two perspectives are complementary: similarity vs unitary equivalence.

Second synthesisSVD as the geometric content of : the singular values are the semi-axes of the image ellipsoid of the unit ball, and the unitary factors rotate that ellipsoid into standard position. This packages the operator norm (largest singular value), the condition number ( for invertible ), the rank (number of non-zero singular values), the kernel (span of right singular vectors with zero singular value), the image (span of left singular vectors with non-zero singular value), and the low-rank structure (truncated SVD) into a single coordinate-free package.

Third synthesisSVD as the foundation of least-squares and data analysis: the Moore-Penrose pseudoinverse delivers the minimum-norm least-squares solution to in one formula. Principal component analysis is the SVD of the centred data matrix. Latent semantic indexing is the truncated SVD of the term-document matrix. Image compression is the truncated SVD of a grayscale pixel matrix. Recommender systems use matrix completion algorithms that fit a low-rank SVD model to sparse observed entries. In each case, the SVD identifies the dominant structure and quantifies the discarded residual.

Fourth synthesisSVD as the spectral theorem in disguise: the SVD of encodes two paired spectral theorems, one on and one on , with the operator as the intertwiner. The right singular vectors diagonalise , the left singular vectors diagonalise , and the singular values appear as the square roots of the shared non-zero eigenvalues. In the infinite-dimensional Hilbert-space setting, this generalises to the Schmidt decomposition for compact operators and to the polar decomposition for arbitrary bounded operators, with the positive part playing the role of the singular value spectrum and the partial-isometry phase playing the role of . The SVD is therefore the finite-dimensional shadow of the spectral and polar decompositions of operator theory.

Full proof set [Master]

Existence of the SVD via the spectral theorem on $A^ A$.* Let with or . The Gram matrix is Hermitian, since , and positive semidefinite, since for every . By the finite-dimensional spectral theorem on Hermitian operators on an inner-product space, admits an orthonormal eigenbasis with real eigenvalues . Set and let be the number of non-zero .

For , define . The set is orthonormal: . Extend to an orthonormal basis of via Gram-Schmidt on the orthogonal complement of the span of the first . Let , , and the rectangular diagonal matrix with for . Verify on the basis : for , (a column-of- identity), and for , because . Reassembling in matrix form, , equivalently .

Uniqueness of singular values. The squared singular values are the eigenvalues of the Hermitian matrix , which are uniquely determined by as the roots of the characteristic polynomial , counted with algebraic multiplicity, and the spectral theorem confirms that algebraic multiplicity equals geometric multiplicity for Hermitian matrices. The decreasing-order convention then fixes the singular value sequence. The matrices and are unique up to a unitary block on each constant-singular-value subspace: replacing by with a unitary block-diagonal matrix preserving singular-value-equality blocks, and correspondingly by on the matched blocks of left singular vectors. For simple non-zero singular values, the freedom reduces to a phase (a sign over ).

Existence of polar decomposition. Starting from the SVD of a square matrix, factor as . Set (unitary) and . Then is Hermitian (, since is real diagonal) and positive semidefinite (). The identity (after expanding using and ) confirms , the unique Hermitian positive semidefinite square root.

Uniqueness of polar decomposition for invertible . Suppose with unitary and Hermitian positive semidefinite. Then . The Hermitian positive semidefinite square root is unique, so . When is invertible, is invertible and , so the unitary factor is forced as well.

Moore-Penrose pseudoinverse characterisation. Let and set with the rectangular diagonal matrix with for and zero otherwise. Verify the four Moore-Penrose conditions:

  • , using , , and the diagonal identity (the rectangular diagonal matrix satisfies whether or not).
  • by the analogous diagonal identity .
  • , since is a real diagonal matrix with entries in , hence Hermitian.
  • by the same argument.

Uniqueness follows from the four conditions: any satisfying , , , equals , because — a sequence of substitutions using the four conditions forces .

Eckart-Young in Frobenius norm. Let and . Suppose has rank at most ; the goal is . The kernel of has dimension , so there exist orthonormal vectors in — by dimension count, intersects in a subspace of dimension at least . Pick a unit vector , write with . Then

using for and . So , proving Eckart-Young in operator norm.

For the Frobenius norm, the same dimension-count argument generalises. Choose orthonormal vectors in (dimension at least when ). Then , and the right-hand side is bounded below by via a Courant-Fischer-type minimax argument on the singular value sum. Hence .

The Mirsky extension to arbitrary unitarily invariant norms uses the Ky Fan dominance theorem: a vector majorises iff for every symmetric gauge function . Since the singular value sequence of majorises , every unitarily invariant norm achieves its minimum on the Eckart-Young truncation .

Connections [Master]

  • Eigenvalue, eigenvector, characteristic polynomial 01.01.08 — supplies the spectral theorem on Hermitian operators that drives the existence proof of the SVD. The right singular vectors of are the eigenvectors of ; the singular values are the non-negative square roots of the eigenvalues of ; the left singular vectors are the eigenvectors of . The SVD is therefore the linear-algebra construction that converts the spectral theorem on the Hermitian positive semidefinite Gram matrix into a factorisation of itself, applicable to arbitrary rectangular matrices. The Cayley-Hamilton identity from 01.01.08 specialises on to give finite-dimensional polynomial identities for the singular values, including .

  • Jordan canonical form and minimal polynomial 01.01.11 — the SVD and the Jordan canonical form are the two universal classifications of matrices, parallel and complementary. The Jordan form is the complete invariant of an matrix under conjugation by ; the SVD is the complete invariant of an matrix under the bi-unitary action of . The Jordan form respects the algebraic structure of the operator (eigenvalues, generalised eigenspaces, nilpotent part); the SVD respects the geometric structure of the linear map between inner-product spaces (singular values as semi-axes of the image ellipsoid). For Hermitian positive semidefinite matrices, the two coincide: the Jordan form is diagonal, the SVD has , and the singular values equal the eigenvalues. For everything else they diverge — and both diverge from the eigendecomposition picture.

  • Inner-product space: orthogonality, Gram-Schmidt, spectral theorem 01.01.09 pending — supplies the orthonormalisation machinery and the finite-dimensional spectral theorem on which the SVD existence proof depends. The Gram-Schmidt procedure of 01.01.09 pending is used in the existence proof to extend the partial orthonormal set of left singular vectors to a full orthonormal basis of , completing the unitary . The spectral theorem on the Hermitian operator is the load-bearing input. The SVD is the rectangular extension of the spectral theorem and reduces to the spectral theorem when is itself Hermitian positive semidefinite.

  • Bounded and unbounded operators on Hilbert space 02.11.03 — the infinite-dimensional generalisation in which the finite-dimensional SVD becomes the Schmidt decomposition for compact operators on a Hilbert space, the polar decomposition for bounded operators (with for a partial isometry and ), and the spectral measure for self-adjoint operators (Stone-von Neumann). The singular numbers of a compact operator — the non-zero eigenvalues of — generalise singular values and underlie the Schatten -class operators, the trace-class operators (), the Hilbert-Schmidt operators (), and the operator algebras of non-commutative integration. The finite-dimensional SVD is the rank-finite, no-limiting-condition shadow of this hierarchy.

  • Least-squares regression and the normal equations [TODO future unit on regression] — the SVD-based pseudoinverse solves the general least-squares problem in one closed-form expression. The classical normal equations recover the same solution when has full column rank, but break down when is rank-deficient or ill-conditioned. The pseudoinverse handles both cases uniformly: it returns the minimum-norm least-squares solution, projecting onto the column space of and resolving the kernel degree of freedom by minimising . Tikhonov regularisation is equivalent to filtering the singular values, replacing by in the pseudoinverse expansion, and provides numerical stability when is small.

  • Principal component analysis and data dimensionality reduction [TODO future unit] — for a centred data matrix with samples in , the right singular vectors of (columns of ) are the principal directions of the data, ordered by decreasing variance . The -truncated SVD is the best rank- approximation in Frobenius norm by Eckart-Young; projecting the data onto the span of retains the maximum total variance among all -dimensional linear subspaces. PCA is therefore the SVD of the data matrix, with the singular values quantifying the variance explained. Variants — kernel PCA, sparse PCA, robust PCA — replace the Euclidean inner product, the rank- constraint, or the assumed Gaussian noise model, but all share the SVD-of-a-data-matrix scaffolding.

Historical & philosophical context [Master]

The singular value decomposition has independent origins in the work of Eugenio Beltrami and Camille Jordan in the early 1870s. Beltrami, in Sulle funzioni bilineari (Giornale di Matematiche ad Uso degli Studenti Delle Università 11, 1873, 98–106), proved the existence of a bilinear-form factorisation of a square real matrix into two orthogonal transformations and a diagonal matrix of non-negative entries [Beltrami, E. — Sulle funzioni bilineari]. One year later, Camille Jordan, in Mémoire sur les formes bilinéaires (Journal de Mathématiques Pures et Appliquées, 2e série, 19, 1874, 35–54), gave an independent treatment for square matrices using a variational approach, characterising the singular values as critical values of the bilinear form on the product of two unit spheres [Jordan, C. — Mémoire sur les formes bilinéaires]. James Joseph Sylvester, in Sur la réduction biorthogonale d'une forme linéo-linéaire à sa forme canonique (Comptes Rendus de l'Académie des Sciences 108, 1889, 651–653), extended the decomposition to rectangular matrices and named the diagonal entries the "canonical multipliers" of the form [Sylvester, J. J. — Sur la réduction biorthogonale d'une forme linéo-linéaire à sa forme canonique].

The infinite-dimensional generalisation came from Erhard Schmidt in Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil (Mathematische Annalen 63, 1907, 433–476), in the context of Hilbert's programme on integral equations. Schmidt proved that every compact (then called "completely continuous") integral operator on admits the decomposition with singular numbers — the Schmidt decomposition of compact operators [Schmidt, E. — Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil]. Hermann Weyl, in Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (Mathematische Annalen 71, 1912, 441–479) and subsequent papers, unified the spectral perspectives on Hermitian and non-Hermitian operators and gave the Weyl inequalities relating singular values to eigenvalues of arbitrary square matrices, central to the modern theory of matrix inequalities and majorisation [Weyl, H. — Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen].

The low-rank approximation theorem was proved by Carl Eckart and Gale Young in The approximation of one matrix by another of lower rank (Psychometrika 1, 1936, 211–218), motivated by the factor analysis of psychological-test data; the theorem identifies the truncated SVD as the optimal rank- approximation in Frobenius norm [Eckart, C. & Young, G. — The approximation of one matrix by another of lower rank]. Leon Mirsky, in Symmetric gauge functions and unitarily invariant norms (Quarterly Journal of Mathematics, Oxford Series, 11, 1960, 50–59), extended Eckart-Young to arbitrary unitarily invariant norms using von Neumann's earlier symmetric-gauge-function theorem [Mirsky, L. — Symmetric gauge functions and unitarily invariant norms].

The modern numerical algorithm for computing the SVD was developed by Gene Golub and William Kahan in Calculating the singular values and pseudo-inverse of a matrix (Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis 2, 1965, 205–224) and refined in Golub-Reinsch 1970, with bidiagonalisation by Householder reflections followed by a specialised QR sweep [Golub, G. H. & Kahan, W. — Calculating the singular values and pseudo-inverse of a matrix]. The Golub-Kahan algorithm and its descendants are implemented in LAPACK and underlie every numerical linear-algebra package. The SVD entered data analysis through factor analysis in psychometrics (Eckart-Young 1936; Tucker 1966), principal component analysis (Hotelling 1933 for the covariance-matrix eigenvalue formulation, recast via SVD in the second half of the twentieth century), and latent semantic indexing (Deerwester et al. 1990 for information retrieval). Recommender systems, image compression (JPEG-2000 uses SVD-adjacent transforms), and large-scale machine learning all rely on truncated-SVD computations.

Bibliography [Master]

Nineteenth-century origins.

  • Beltrami, E., "Sulle funzioni bilineari", Giornale di Matematiche ad Uso degli Studenti Delle Università 11 (1873), 98–106.
  • Jordan, C., "Mémoire sur les formes bilinéaires", Journal de Mathématiques Pures et Appliquées, 2e série, 19 (1874), 35–54.
  • Sylvester, J. J., "Sur la réduction biorthogonale d'une forme linéo-linéaire à sa forme canonique", Comptes Rendus de l'Académie des Sciences 108 (1889), 651–653.

Hilbert-space generalisation.

  • Schmidt, E., "Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil", Mathematische Annalen 63 (1907), 433–476.
  • Weyl, H., "Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen", Mathematische Annalen 71 (1912), 441–479.

Low-rank approximation.

  • Eckart, C. & Young, G., "The approximation of one matrix by another of lower rank", Psychometrika 1 (1936), 211–218.
  • Mirsky, L., "Symmetric gauge functions and unitarily invariant norms", Quarterly Journal of Mathematics, Oxford Series, 11 (1960), 50–59.
  • von Neumann, J., "Some matrix inequalities and metrization of matric-space", Tomsk University Review 1 (1937), 286–300.

Numerical computation.

  • Golub, G. H. & Kahan, W., "Calculating the singular values and pseudo-inverse of a matrix", Journal of the Society for Industrial and Applied Mathematics, Series B: Numerical Analysis 2 (1965), 205–224.
  • Golub, G. H. & Reinsch, C., "Singular value decomposition and least squares solutions", Numerische Mathematik 14 (1970), 403–420.
  • Golub, G. H. & Van Loan, C. F., Matrix Computations, 4th ed., Johns Hopkins University Press, 2013.
  • Trefethen, L. N. & Bau, D., Numerical Linear Algebra, SIAM, 1997.

Modern textbook treatments.

  • Horn, R. A. & Johnson, C. R., Matrix Analysis, 2nd ed., Cambridge University Press, 2013, Ch. 7.
  • Strang, G., Introduction to Linear Algebra, 5th ed., Wellesley-Cambridge Press, 2016, Ch. 7.
  • Stewart, G. W., "On the early history of the singular value decomposition", SIAM Review 35 (1993), 551–566.
  • Bhatia, R., Matrix Analysis, Graduate Texts in Mathematics 169, Springer, 1997.

Applications.

  • Hotelling, H., "Analysis of a complex of statistical variables into principal components", Journal of Educational Psychology 24 (1933), 417–441, 498–520.
  • Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R., "Indexing by latent semantic analysis", Journal of the American Society for Information Science 41 (1990), 391–407.
  • Candès, E. J. & Recht, B., "Exact matrix completion via convex optimization", Foundations of Computational Mathematics 9 (2009), 717–772.