The SVD as a Classification Theorem
Pre-requisites: Linear algebra.
Introduction
The Singular Value Decomposition (SVD) is an extremely important concept in linear algebra with practical applications ranging from image processing, inverting matrices, machine learning, and statistics. However, despite its importance and (I would argue) how fundamental it is, the SVD is often presented in an exceptionally confusing manner. It took me a long time to appreciate the SVD for what it is because of this. As prime example, Wikipedia defines the SVD as:
[T]he singular value decomposition of an complex matrix is a factorization of the form , where is an complex unitary matrix, is an rectangular diagonal matrix with non-negative real numbers on the diagonal, and is an complex unitary matrix.
What a mouthful! Wikipedia defines the SVD in a computationally useful way, but I think that it misses the point entirely. In this post, I try to explain how I think of the SVD: as a classification of maps between Hilbert spaces.
Conventions
In this post, I will use "vector space" to mean "finite dimensional vector space over or ." or is called the base field of the vector space, and when the base field is not important, I will use to represent the base field.
Some terminology requires explanation. When I use map, I mean "linear function". An isomorphism between two vector spaces is an invertible map. An isomorphism from a vector space to itself is called an automorphism, and is equivalent to a "change of basis". In each section, I talk about classifying maps "up to isomorphism". What it means for two maps to be isomorphic differs from section to section, and will be explained in each section.
Classifying Maps Between Vector Spaces
Linear algebra is ostensibly the study of vector spaces, but really, it's all about maps between vector spaces. For one thing, vector spaces are easy to classify: every -dimensional vector space is isomorphic to . As such, you can use a single integer to completely classify a vector space. On the other hand, maps between vector spaces are spicy. You can add maps together and compose them to get new maps. And, unlike vector spaces, you can't use a single number to classify all vector space maps.
... Right?
Before we can answer that question, I need to discuss what I actually mean when I talk about "classifying" a map. Two maps may behave the same even if they are not equal. For example, consider the following two maps from to :
These two maps are not equal, but they are pretty similar. In fact, if you swap the two basis vectors of , then becomes and becomes . That is, and are the same up to a change of basis of . Informally, if you look at from a "different perspective", then "looks like" . For the purposes of this section, that's good enough for us. Similarly, the following two maps are related by a change of basis of (exercise: what is the change of basis?), and thus behave "pretty much the same".
Definition: For this section only, two maps are considered isomorphic if they are the same up to an automorphism (i.e. change of basis) of and .
In other words, and are isomorphic if there exist automorphisms and such that .
In future sections, we will look at maps with more structure and adopt stricter definitions of "isomorphism".
So, what properties of are invariant under isomorphism? One important property of is its image, but its image is a subspace of , and usually isn't invariant under an automorphism of . However, the dimension of a subspace doesn't change under automorphism (exercise: why not?). The dimension of the image is called the rank of , and is an important invariant of .
Another important subspace associated with is its nullspace in . Once again, the nullspace of is not invariant under isomorphism, but its dimension is. This quantity is called the nullity of . The well-known rank-nullity theorem states that the rank of plus its nullity is equal to , so either one can be computed from the other.
Surprisingly, the rank (or nullity) completely determines a map up to isomorphism.
Theorem: A map is completely determined up to isomorphism by its rank.
Proof:
Let denote the rank of . Find a basis for where through span the nullspace of . Let for . Since no nontrival combination of the 's for are in the nullspace of , these 's are linearly independent. Thus, they can be completed to a basis of . sends the first basis vectors of to the first basis vectors of , and nullifies the other basis vectors of . In other words, has the following form with respect to our chosen basis.
Any other map from to with the same rank as can be brought into the same form. Thus, is completely specified up to isomorphism by its rank.
In the above proof, we showed that is unique up to isomorphism by exhibiting a canonical form for . That is, by writing in a simple form with respect to a particularly nice basis. This strategy is very important, and will show up again when we come to the SVD.
Extensions: This section holds true for any base field, and even for modules over division rings. Closely related ideas such as the structure theorem and Smith normal form extend these concepts to modules over PIDs as well.
Classifying Maps Within a Vector Space
In the last section, we talked about maps between two distinct vector spaces and when those maps should be considered isomorphic. But we can also talk about maps from a vector space to itself. While not directly related to our study of the SVD, I think it is illuminating (and interesting!) to consider this case, too. For example, consider two maps . In the previous section, we considered and to be equivalent if they were related by an automorphism in their domains and codomains. However, since the and have the same domain and codomain, it often makes more sense to ask whether and are the same under a fixed automorphism of .
Definition: For this section only, two maps are isomorphic if they are related by an automorphism of .
In other words, and are isomorphic if there exists an automorphism such that .
Maps from a vector space to itself have more structure than maps between unrelated vector spaces. This is because you can compare vectors in the domain (i.e. input vectors) with vectors in the codomain (i.e. output vectors). For example, consider the following two maps where .
and both have the same rank of 2, but scales every vector by 1, while scales every vector by 2. Thus, and are not isomorphic, even though they would have been considered isomorphic in the last section.
Astute readers will recognize that the preceding observation can actually be phrased in terms of eigenvalues. An eigenvalue of is a number such that for some nonzero called an eigenvector of . The eigenvalues of are clearly invariant under isomorphism. In fact, if is diagonalizable, then is uniquely determined up to isomorphism by its eigenvalues (exercise). However, not every map is diagonalizable, and in general, maps within a vector space are classified by their Jordan normal forms. What exactly the Jordan normal form entails is not particularly important to us, but interested readers should check the Wikipedia article.
Extensions: the Jordan normal form exists for maps over any base field, not just the real or complex numbers. In general, the eigenvalues actually lie in an algebraic closure of the base field, which can complicate things. The rational canonical form is an alternative to the Jordan normal form that does not need an algebraic closure of the base field.
Classifying Maps Between Hilbert Spaces
A Hilbert space is, loosely speaking, a vector space with notions of length and angle. A good example of a Hilbert space is the standard Euclidean space . The length of a vector here is given by its Euclidean norm:
Vector quantities from physics like velocity and displacement always lie in Hilbert spaces, and the length of these vectors (speed and distance, respectively) play an important role. Complex Hilbert spaces are a bit different and may seem odd at first glance, but they are a useful generalization of real Hilbert spaces and show up all the time in quantum mechanics.
Hilbert spaces are vector spaces with extra structure, and this extra structure leads to a different classification of maps between Hilbert spaces. This classification theorem is most commonly known as the Singular Value Decomposition, or SVD.
Rigorously, a (finite-dimensional) Hilbert space is a vector space along with an inner product. An inner product on is a map that satisfies the following properties:
- Linearity/Antilinearity:
-
Conjugate symmetry:
-
Positive-definiteness: is real, , and if
It can be easily checked that the dot product on is an inner product. The length of a vector is . Similarly, we define the length of a vector in a Hilbert space to be , and we write it as . In linear algebra, the length of a vector is often called its norm. Two vectors in are orthogonal if their dot product is zero. Similarly, we call any two vectors in a Hilbert space orthogonal if their inner product is zero.
A Hilbert space basis is called orthonormal if and when . For example, the standard basis on is an orthonormal basis. Orthonormal bases are the basis of choice when working in a Hilbert space.
So, how do our notions of equivalence change when we consider Hilbert spaces instead? To start with, a map between Hilbert spaces should only be considered an equivalence of Hilbert spaces if it preserves the inner product structure. Call a map of Hilbert spaces a Hilbert space isomorphism if it is a vector space isomorphism and for all . Such maps are often called orthogonal (if the base field is real) or unitary (if the base field is complex), but for simplicity's sake, I'll just call all of them "Hilbert space isomorphisms", or if the domain and codomain are the same, "Hilbert space automorphisms". Up to Hilbert space isomorphism, there is only one Hilbert space of each dimension, namely with the standard inner product.
This more restrictive notion of an isomorphism of Hilbert spaces naturally leads to this section's definition of an isomorphism of maps between Hilbert spaces.
Definition: For this section only, two maps are isomorphic if they are related by Hilbert space automorphisms of and .
In other words, and are isomorphic if and only if there exist Hilbert space automorphisms and such that .
What new properties do Hilbert space maps have that maps between vanilla vector spaces don't? For one thing, it is possible to talk about how "large" a map is. Consider the two maps below from to .
never increases the length of a vector, so that . On the other hand, if is the first basis vector of , then , so can increase the length of a vector. In order to capture this property, we define the norm of a map as follows:
With the above examples, and . The norm is an extremely important property of a map between Hilbert spaces. (exercise: the norm of a map is always finite). Note that the norm doesn't make sense for maps between vanilla vector spaces because vanilla vector spaces don't carry a notion of "length".
There are other invariants too, but let's transition to the star players: the singular values of a map.
Theorem (The SVD): Let where and are Hilbert spaces. Let . There exists a basis for and nonnegative real numbers such that and when . The values are called singular values, and the vectors are called right singular vectors. The singular values of are unique and determine up to isomorphism.
In other words, the SVD of gives a special basis for which plays nicely with the inner product structures on and , and classifies by its action on this special basis.
The SVD Theorem can also be interpreted as finding a canonical form for . Each pair of is orthogonal, and hence (if they are non-zero) are linearly independent. In fact, when are the non-zero singular values of , are orthogonal and unit norm. Thus, they can be extended to an orthonormal basis of . Under this basis, has the exact form:
where the first diagonal entries are the first singular values of , and everything else is zero. In a sense, all does is "scale" its right singular vectors by their singular values. It's actually kind of crazy that maps between Hilbert spaces, although a very rich subject, can all be brought into such a simple canonical form! And that, I think, is the beauty of the SVD.
The proof of the SVD theorem is a bit technical, and is not necessary for understanding the SVD in practice. However, I am a big fan of backing up my claims with a proof, so here it is.
Proof:
Let . If , then the SVD exists trivially, so we assume that . The first step to finding the SVD is determining the first right singular vector, which is also the vector whose norm scales the most under .
Let be the set of all vectors with norm less than or equal to 1. It turns out that there exists a vector which maximizes the norm. This is an intuitive fact, but not one with a simple proof. It is true by the multivariable extreme value theorem that any continuous function is maximized by some . However, the multivariable extreme value theorem can be tricky to prove, and involves far more analysis than I want to include in this blog post. I hope you can forgive me if present this one fact on faith.
Let have the maximum norm among all vectors in . In other words, . We will now prove that for any vector orthogonal to , and are also orthogonal. Let where is a yet to be determined small real number. has the following norm.
Since and are orthogonal, . For those who are unfamiliar with big-O notation, this means that and are very close, and their difference is at most a multiple of .
Using big-O notation, we can also simplify our expression for .
So, unless , could have larger norm than for some (small) values of . But, since has unit norm, , so . This would be a contradiction. Thus,
Over the real numbers, this is enough to prove that and are orthogonal since an inner product over the reals is symmetrical. Over the complex numbers, the inner product is conjugate-symmetric, so we find that:
In other words, is purely imaginary. Replacing by in this formula (which is fine because is also orthogonal to ), and then expanding, we obtain the following.
This shows that is purely real. The only purely real and purely imaginary number is 0, so .
Let , and let . Also rename to , and rename to . Define to be the orthogonal complement to in , that is, the subspace of all vectors which are orthogonal to . Since is a subspace of a Hilbert space, is itself a Hilbert space. Let be the restriction of to .
As long as , This process can be repeated with to obtain and , as well as and . Each step, the dimension of decreases by 1 (exercise). Thus, if , this process can be iterated times. Doing so yields vectors and nonnegative real numbers .
Each has unit norm. When , is in the orthogonal complement to and . Since is a restriction of , . As , this shows that . Thus, are singular values of , and are right singular vectors of .
Any two maps with the same singular values can be brought to this common form using their right singular vectors as a basis for . Thus, any two maps with the same singular vectors are isomorphic. It remains only to prove that the singular values of a map are unique. To do this, we need the following technical lemma.
Lemma: Let be one SVD basis for with singular values . Let be another SVD basis for with singular values . If , then .
You can expand in the basis as:
Thus,
and
Similarly, by swapping the roles of the basis and the basis,