site logo
LIAM AXON

The SVD as a Classification Theorem

Nov 12, 2021
tags: math, analysis

Pre-requisites: Linear algebra.

Introduction

The Singular Value Decomposition (SVD) is an extremely important concept in linear algebra with practical applications ranging from image processing, inverting matrices, machine learning, and statistics. However, despite its importance and (I would argue) how fundamental it is, the SVD is often presented in an exceptionally confusing manner. It took me a long time to appreciate the SVD for what it is because of this. As prime example, Wikipedia defines the SVD as:

[T]he singular value decomposition of an m×nm \times n complex matrix M\mathbf{M} is a factorization of the form UΣV\mathbf{U \Sigma V ^{∗}}, where U\mathbf{U} is an m×mm\times m complex unitary matrix, Σ\mathbf {\Sigma } is an m×nm\times n rectangular diagonal matrix with non-negative real numbers on the diagonal, and V\mathbf{V} is an n×nn\times n complex unitary matrix.

What a mouthful! Wikipedia defines the SVD in a computationally useful way, but I think that it misses the point entirely. In this post, I try to explain how I think of the SVD: as a classification of maps between Hilbert spaces.

Conventions

In this post, I will use "vector space" to mean "finite dimensional vector space over R\mathbb{R} or C\mathbb{C}." R\mathbb{R} or C\mathbb{C} is called the base field of the vector space, and when the base field is not important, I will use K\mathbb{K} to represent the base field.

Some terminology requires explanation. When I use map, I mean "linear function". An isomorphism between two vector spaces is an invertible map. An isomorphism from a vector space to itself is called an automorphism, and is equivalent to a "change of basis". In each section, I talk about classifying maps "up to isomorphism". What it means for two maps to be isomorphic differs from section to section, and will be explained in each section.

Classifying Maps Between Vector Spaces

Linear algebra is ostensibly the study of vector spaces, but really, it's all about maps between vector spaces. For one thing, vector spaces are easy to classify: every nn-dimensional vector space is isomorphic to Kn\mathbb{K}^n. As such, you can use a single integer to completely classify a vector space. On the other hand, maps between vector spaces are spicy. You can add maps together and compose them to get new maps. And, unlike vector spaces, you can't use a single number to classify all vector space maps.

... Right?

Before we can answer that question, I need to discuss what I actually mean when I talk about "classifying" a map. Two maps T,Q:VWT, Q: V \rightarrow W may behave the same even if they are not equal. For example, consider the following two maps from V=R3V = \mathbb{R}^3 to W=R2W = \mathbb{R}^2:

P=(100000)P = \begin{pmatrix} 1 & 0 & 0\\ 0 & 0 & 0\\ \end{pmatrix}
Q=(000100)Q = \begin{pmatrix} 0 & 0 & 0\\ 1 & 0 & 0\\ \end{pmatrix}

These two maps are not equal, but they are pretty similar. In fact, if you swap the two basis vectors of WW, then PP becomes QQ and QQ becomes PP. That is, PP and QQ are the same up to a change of basis of WW. Informally, if you look at WW from a "different perspective", then PP "looks like" QQ. For the purposes of this section, that's good enough for us. Similarly, the following two maps are related by a change of basis of VV (exercise: what is the change of basis?), and thus behave "pretty much the same".

P=(100000)P = \begin{pmatrix} 1 & 0 & 0\\ 0 & 0 & 0\\ \end{pmatrix}
R=(001000)R = \begin{pmatrix} 0 & 0 & 1\\ 0 & 0 & 0\\ \end{pmatrix}

Definition: For this section only, two maps P,Q:VWP, Q: V \rightarrow W are considered isomorphic if they are the same up to an automorphism (i.e. change of basis) of VV and WW.

In other words, PP and QQ are isomorphic if there exist automorphisms A:VVA: V \rightarrow V and B:WWB: W \rightarrow W such that Q=APB1Q = A P B^{-1}.

In future sections, we will look at maps with more structure and adopt stricter definitions of "isomorphism".

So, what properties of TT are invariant under isomorphism? One important property of TT is its image, but its image is a subspace of WW, and usually isn't invariant under an automorphism of WW. However, the dimension of a subspace doesn't change under automorphism (exercise: why not?). The dimension of the image is called the rank of TT, and is an important invariant of TT.

Another important subspace associated with TT is its nullspace in VV. Once again, the nullspace of TT is not invariant under isomorphism, but its dimension is. This quantity is called the nullity of TT. The well-known rank-nullity theorem states that the rank of TT plus its nullity is equal to dimV\dim{V}, so either one can be computed from the other.

Surprisingly, the rank (or nullity) completely determines a map up to isomorphism.

Theorem: A map T:VWT: V \rightarrow W is completely determined up to isomorphism by its rank.

Proof:

Let rr denote the rank of TT. Find a basis v1,v2,vnv_1, v_2, \dots v_n for VV where vr+1v_{r+1} through vnv_n span the nullspace of TT. Let wi=Tviw_i = T v_i for iri \leq r. Since no nontrival combination of the viv_i's for iri\leq r are in the nullspace of TT, these wiw_i's are linearly independent. Thus, they can be completed to a basis w1,w2,wmw_1, w_2, \dots w_m of WW. TT sends the first rr basis vectors of VV to the first rr basis vectors of WW, and nullifies the other basis vectors of VV. In other words, TT has the following form with respect to our chosen basis.

T=(100010000)T = \begin{pmatrix} 1 & 0 & \dots & 0 & \dots\\ 0 & 1 & \dots & 0 & \dots\\ \vdots & \vdots & \ddots & \vdots & \ddots\\ 0 & 0 & \dots & 0 & \dots\\ \vdots & \vdots & \ddots & \vdots & \ddots \end{pmatrix}

Any other map from VV to WW with the same rank as TT can be brought into the same form. Thus, TT is completely specified up to isomorphism by its rank.

\square

In the above proof, we showed that TT is unique up to isomorphism by exhibiting a canonical form for TT. That is, by writing TT in a simple form with respect to a particularly nice basis. This strategy is very important, and will show up again when we come to the SVD.

Extensions: This section holds true for any base field, and even for modules over division rings. Closely related ideas such as the structure theorem and Smith normal form extend these concepts to modules over PIDs as well.

Classifying Maps Within a Vector Space

In the last section, we talked about maps between two distinct vector spaces and when those maps should be considered isomorphic. But we can also talk about maps from a vector space to itself. While not directly related to our study of the SVD, I think it is illuminating (and interesting!) to consider this case, too. For example, consider two maps P,Q:VVP, Q: V \rightarrow V. In the previous section, we considered PP and QQ to be equivalent if they were related by an automorphism in their domains and codomains. However, since the PP and QQ have the same domain and codomain, it often makes more sense to ask whether PP and QQ are the same under a fixed automorphism of VV.

Definition: For this section only, two maps P,Q:VVP, Q: V \rightarrow V are isomorphic if they are related by an automorphism of VV.

In other words, PP and QQ are isomorphic if there exists an automorphism A:VVA: V \rightarrow V such that Q=APA1Q = A P A^{-1}.

Maps from a vector space to itself have more structure than maps between unrelated vector spaces. This is because you can compare vectors in the domain (i.e. input vectors) with vectors in the codomain (i.e. output vectors). For example, consider the following two maps VVV \rightarrow V where V=R2V = \mathbb{R}^2.

P=(1001)P = \begin{pmatrix} 1 & 0\\ 0 & 1 \end{pmatrix}
Q=(2002)Q = \begin{pmatrix} 2 & 0\\ 0 & 2 \end{pmatrix}

PP and QQ both have the same rank of 2, but PP scales every vector by 1, while QQ scales every vector by 2. Thus, PP and QQ are not isomorphic, even though they would have been considered isomorphic in the last section.

Astute readers will recognize that the preceding observation can actually be phrased in terms of eigenvalues. An eigenvalue of T:VVT: V \rightarrow V is a number λK\lambda \in \mathbb{K} such that Tv=λvT v = \lambda v for some nonzero vVv \in V called an eigenvector of TT. The eigenvalues of TT are clearly invariant under isomorphism. In fact, if TT is diagonalizable, then TT is uniquely determined up to isomorphism by its eigenvalues (exercise). However, not every map is diagonalizable, and in general, maps within a vector space are classified by their Jordan normal forms. What exactly the Jordan normal form entails is not particularly important to us, but interested readers should check the Wikipedia article.

Extensions: the Jordan normal form exists for maps over any base field, not just the real or complex numbers. In general, the eigenvalues actually lie in an algebraic closure of the base field, which can complicate things. The rational canonical form is an alternative to the Jordan normal form that does not need an algebraic closure of the base field.

Classifying Maps Between Hilbert Spaces

A Hilbert space is, loosely speaking, a vector space with notions of length and angle. A good example of a Hilbert space is the standard Euclidean space Rn\mathbb{R}^n. The length of a vector here is given by its Euclidean norm:

(a1,a2an)=a12+a22+...an2\|(a_1, a_2 \dots a_n)\| = \sqrt{a_1^2 + a_2^2 + ... a_n^2}

Vector quantities from physics like velocity and displacement always lie in Hilbert spaces, and the length of these vectors (speed and distance, respectively) play an important role. Complex Hilbert spaces are a bit different and may seem odd at first glance, but they are a useful generalization of real Hilbert spaces and show up all the time in quantum mechanics.

Hilbert spaces are vector spaces with extra structure, and this extra structure leads to a different classification of maps between Hilbert spaces. This classification theorem is most commonly known as the Singular Value Decomposition, or SVD.

Rigorously, a (finite-dimensional) Hilbert space is a vector space along with an inner product. An inner product on VV is a map ,:V×VK\langle\cdot, \cdot\rangle: V \times V \rightarrow \mathbb{K} that satisfies the following properties:

  1. Linearity/Antilinearity:
  • a,c+d=a,c+a,d\langle a, c + d\rangle = \langle a, c \rangle + \langle a, d\rangle
  • a+b,c=a,c+b,c\langle a + b, c\rangle = \langle a, c \rangle + \langle b, c\rangle
  • a,λc=λa,c\langle a, \lambda c\rangle = \lambda \langle a, c \rangle
  • λa,c=λa,c\langle \lambda a, c\rangle = \overline{\lambda} \langle a, c \rangle
  1. Conjugate symmetry: a,b=b,a\langle a, b \rangle = \overline{\langle b, a\rangle}

  2. Positive-definiteness: a,a\langle a, a\rangle is real, a,a0\langle a, a\rangle \geq 0, and a,a>0\langle a, a\rangle > 0 if a0a \neq 0

It can be easily checked that the dot product on Rn\mathbb{R}^n is an inner product. The length of a vector vRnv \in \mathbb{R}^n is vv\sqrt{v \cdot v}. Similarly, we define the length of a vector vv in a Hilbert space to be v,v\sqrt{\langle v, v\rangle }, and we write it as v\|v\|. In linear algebra, the length of a vector is often called its norm. Two vectors in Rn\mathbb{R}^n are orthogonal if their dot product is zero. Similarly, we call any two vectors in a Hilbert space orthogonal if their inner product is zero.

A Hilbert space basis e1ene_1 \dots e_n is called orthonormal if ei=1\|e_i\| = 1 and ei,ej=0\langle e_i, e_j \rangle = 0 when iji \neq j. For example, the standard basis on Rn\mathbb{R}^n is an orthonormal basis. Orthonormal bases are the basis of choice when working in a Hilbert space.

So, how do our notions of equivalence change when we consider Hilbert spaces instead? To start with, a map between Hilbert spaces should only be considered an equivalence of Hilbert spaces if it preserves the inner product structure. Call a map of Hilbert spaces T:VWT: V \rightarrow W a Hilbert space isomorphism if it is a vector space isomorphism and Tv1,Tv2=v1,v2\langle T v_1, T v_2 \rangle = \langle v_1, v_2 \rangle for all v1,v2Vv_1, v_2 \in V. Such maps are often called orthogonal (if the base field is real) or unitary (if the base field is complex), but for simplicity's sake, I'll just call all of them "Hilbert space isomorphisms", or if the domain and codomain are the same, "Hilbert space automorphisms". Up to Hilbert space isomorphism, there is only one Hilbert space of each dimension, namely Kn\mathbb{K}^n with the standard inner product.

This more restrictive notion of an isomorphism of Hilbert spaces naturally leads to this section's definition of an isomorphism of maps between Hilbert spaces.

Definition: For this section only, two maps P,Q:VWP, Q: V \rightarrow W are isomorphic if they are related by Hilbert space automorphisms of VV and WW.

In other words, PP and QQ are isomorphic if and only if there exist Hilbert space automorphisms A:VVA: V \rightarrow V and B:WWB: W \rightarrow W such that P=AQB1P = A Q B^{-1}.

What new properties do Hilbert space maps have that maps between vanilla vector spaces don't? For one thing, it is possible to talk about how "large" a map is. Consider the two maps below from V=R2V=\mathbb{R}^2 to W=R3W=\mathbb{R}^3.

P=(100100)P = \begin{pmatrix} 1 & 0\\ 0 & 1\\ 0 & 0\\ \end{pmatrix}
Q=(500000)Q = \begin{pmatrix} 5 & 0\\ 0 & 0\\ 0 & 0\\ \end{pmatrix}

PP never increases the length of a vector, so that Pvv\|Pv\| \leq \|v\|. On the other hand, if e1e_1 is the first basis vector of VV, then Qe1=5\|Q e_1\| = 5, so QQ can increase the length of a vector. In order to capture this property, we define the norm of a map as follows:

T=supv1Tv\|T\| = \sup_{\|v\| \leq 1} \|Tv\|

With the above examples, P=1\|P\| = 1 and Q=5\|Q\| = 5. The norm is an extremely important property of a map between Hilbert spaces. (exercise: the norm of a map is always finite). Note that the norm doesn't make sense for maps between vanilla vector spaces because vanilla vector spaces don't carry a notion of "length".

There are other invariants too, but let's transition to the star players: the singular values of a map.

Theorem (The SVD): Let T:VWT: V \rightarrow W where VV and WW are Hilbert spaces. Let n=dimVn = \dim V. There exists a basis e1,e2,ene_1, e_2, \dots e_n for VV and nonnegative real numbers σ1σ2σn\sigma_1 \geq \sigma_2 \geq \dots \sigma_n such that Tei=σi\| T e_i\| = \sigma_i and Tei,Tej=0\langle T e_i, T e_j \rangle = 0 when iji \neq j. The values σi\sigma_i are called singular values, and the vectors eie_i are called right singular vectors. The singular values of TT are unique and determine TT up to isomorphism.

In other words, the SVD of TT gives a special basis for VV which plays nicely with the inner product structures on VV and WW, and classifies TT by its action on this special basis.

The SVD Theorem can also be interpreted as finding a canonical form for TT. Each pair of TeiT e_i is orthogonal, and hence (if they are non-zero) are linearly independent. In fact, when σ1σk\sigma_1 \dots \sigma_k are the non-zero singular values of TT, Te1Te1TekTek\frac{T e_1}{\|T e_1\|} \dots \frac{T e_k}{\|T e_k\|} are orthogonal and unit norm. Thus, they can be extended to an orthonormal basis of WW. Under this basis, TT has the exact form:

T=(σ10000σ20000σ300000)T = \begin{pmatrix} \sigma_1 & 0 & 0 & \dots & 0 & \dots \\ 0 & \sigma_2 & 0 & \dots & 0 & \dots \\ 0 & 0 & \sigma_3 & \dots & 0 & \dots \\ \vdots & \vdots & \vdots & \ddots & \vdots & \ddots \\ 0 & 0 & 0 & \dots & 0 & \dots \\ \vdots & \vdots & \vdots & \ddots & \vdots & \ddots \end{pmatrix}

where the first kk diagonal entries are the first kk singular values of TT, and everything else is zero. In a sense, all TT does is "scale" its right singular vectors by their singular values. It's actually kind of crazy that maps between Hilbert spaces, although a very rich subject, can all be brought into such a simple canonical form! And that, I think, is the beauty of the SVD.

The proof of the SVD theorem is a bit technical, and is not necessary for understanding the SVD in practice. However, I am a big fan of backing up my claims with a proof, so here it is.

Proof:

Let T:VWT: V \rightarrow W. If dimV=0\dim V =0, then the SVD exists trivially, so we assume that dimV>0\dim V > 0. The first step to finding the SVD is determining the first right singular vector, which is also the vector whose norm scales the most under TT.

Let XVX \subset V be the set of all vectors with norm less than or equal to 1. It turns out that there exists a vector vXv \in X which maximizes the norm. This is an intuitive fact, but not one with a simple proof. It is true by the multivariable extreme value theorem that any continuous function f:XRf: X \rightarrow \mathbb{R} is maximized by some vXv \in X. However, the multivariable extreme value theorem can be tricky to prove, and involves far more analysis than I want to include in this blog post. I hope you can forgive me if present this one fact on faith.

Let vXv \in X have the maximum norm among all vectors in XX. In other words, Tv=T\|Tv\| = \|T\|. We will now prove that for any vector vVv' \in V orthogonal to vv, TvT v' and TvT v are also orthogonal. Let u=v+tvv+tvu = \frac{v + t v'}{\|v + tv'\|} where tt is a yet to be determined small real number. TuT u has the following norm.

Tu2=Tv2+Tv,Tvt+Tv,Tvt+Tv2t2v+tv2\|Tu\|^2 = \frac{\|T v\|^2 + \langle T v, T v' \rangle t + \langle T v', T v \rangle t + \|T v'\|^2 t^2}{\|v + tv'\|^2}

Since vv and vv' are orthogonal, v+tv2=v2+O(t2)\|v + t v'\|^2 = \|v\|^2 + \mathcal{O}(t^2). For those who are unfamiliar with big-O notation, this means that v+tv\|v + tv'\| and v\|v\| are very close, and their difference is at most a multiple of t2t^2.

Using big-O notation, we can also simplify our expression for Tu2\|T u\|^2.

Tu2=Tv2v2+Tv,Tv+Tv,Tvv2t+O(t2)\|Tu\|^2 = \frac{\|Tv\|^2}{\|v\|^2} + \frac{\langle T v, T v'\rangle + \langle T v', T v \rangle}{\|v\|^2} t + \mathcal{O}(t^2)

So, unless Tv,Tv+Tv,Tv=0\langle T v, T v'\rangle + \langle T v', T v \rangle = 0, TuTu could have larger norm than TvT v for some (small) values of tt. But, since uu has unit norm, uXu \in X, so TuTv\|T u \| \leq \|T v\|. This would be a contradiction. Thus,

Tv,Tv=Tv,Tv\langle T v, T v' \rangle = -\langle T v', T v\rangle

Over the real numbers, this is enough to prove that TvTv and TvTv' are orthogonal since an inner product over the reals is symmetrical. Over the complex numbers, the inner product is conjugate-symmetric, so we find that:

Tv,Tv=Tv,Tv\langle T v, T v' \rangle = - \overline{\langle Tv, Tv' \rangle}

In other words, Tv,Tv\langle T v, T v' \rangle is purely imaginary. Replacing vv' by ivi v' in this formula (which is fine because vv is also orthogonal to ivi v'), and then expanding, we obtain the following.

iTv,Tv=iTv,Tvi \langle Tv, Tv' \rangle = i \overline{\langle T v, Tv' \rangle}

This shows that Tv,Tv\langle T v, T v' \rangle is purely real. The only purely real and purely imaginary number is 0, so Tv,Tv=0\langle T v, T v' \rangle = 0.

Let e1=xe_1 = x, and let σ1=T\sigma_1 = \|T\|. Also rename VV to V1V_1, and rename TT to T1T_1. Define V2V_2 to be the orthogonal complement to e1e_1 in V1V_1, that is, the subspace of all vectors which are orthogonal to e1e_1. Since V2V_2 is a subspace of a Hilbert space, V2V_2 is itself a Hilbert space. Let T2T_2 be the restriction of T1T_1 to V2V_2.

As long as dimV20\dim V_2 \neq 0, This process can be repeated with T2:V2WT_2: V_2 \rightarrow W to obtain e2e_2 and σ2\sigma_2, as well as V3V_3 and T3T_3. Each step, the dimension of ViV_i decreases by 1 (exercise). Thus, if n=dimVn = \dim V, this process can be iterated nn times. Doing so yields nn vectors e1ene_1 \dots e_n and nn nonnegative real numbers σ1σn\sigma_1 \dots \sigma_n.

Each eie_i has unit norm. When i<ji < j, eje_j is in the orthogonal complement to eie_i and Tei,Tej=0\langle Te_i, Te_j \rangle = 0. Since Ti+1T_{i+1} is a restriction of TiT_i, Ti+1Ti\|T_{i+1}\| \leq \|T_i\|. As σi=Ti\sigma_i = \|T_i\|, this shows that σi+1σi\sigma_{i+1} \leq \sigma_i. Thus, σ1σn\sigma_1 \dots \sigma_n are singular values of TT, and e1ene_1 \dots e_n are right singular vectors of TT.

Any two maps with the same singular values can be brought to this common form using their right singular vectors as a basis for VV. Thus, any two maps with the same singular vectors are isomorphic. It remains only to prove that the singular values of a map are unique. To do this, we need the following technical lemma.

Lemma: Let e1ene_1 \dots e_n be one SVD basis for TT with singular values σ1σn\sigma_1 \dots \sigma_n. Let e1ene_1' \dots e_n' be another SVD basis for TT with singular values σ1σn\sigma_1' \dots \sigma_n'. If σiσj\sigma_i \neq \sigma_j', then ei,ej=0\langle e_i, e_j' \rangle = 0.

You can expand eje_j' in the e1ene_1 \dots e_n basis as:

ej=i=1nei,ejeie_j' = \sum_{i=1}^n \langle e_i, e_j' \rangle e_i

Thus,

Tej=i=1nei,ejTeiT e_j' = \sum_{i=1}^n \langle e_i, e_j' \rangle T e_i

and

Tej,Tei=ei,ejTei,Tei=ei,ejσi2\langle T e_j', T e_i \rangle = \langle e_i, e_j' \rangle \langle T e_i, T e_i \rangle = \langle e_i, e_j' \rangle \sigma_i^2

Similarly, by swapping the roles of the e1ene_1 \dots e_n basis and the e1ene_1' \dots e_n' basis,

Tej,Tej=ei,ejTej,Tej=ei,ejσj2\langle T e_j', T e_j \rangle = \langle e_i, e_j' \rangle \langle T e_j', T e_j' \rangle = \langle e_i, e_j' \rangle \sigma_j^2

If σi2σj2\sigma_i^2 \neq \sigma_j^2, then ei,ej=0\langle e_i, e_j' \rangle = 0. This proves the lemma.

Now assume for contradiction that TT had two singular value decompositions with different singular values. In one SVD, TT has right singular vectors e1ene_1 \dots e_n with singular values σ1σn\sigma_1 \dots \sigma_n. In another, TT has right singular vectors e1ene_1' \dots e_n' with singular values σ1σn\sigma_1' \dots \sigma_n'. Suppose that σi=σi\sigma_i = \sigma'_i for i=1k1i = 1 \dots k-1, but σk>σk\sigma_k > \sigma'_k. Let DD be the set of right singular vectors of TT in the first SVD whose singular values equal σk\sigma_k, and let DD' be the set of right singular vectors of TT in the second SVD whose singular values equal σk\sigma_k,. We must have D>D|D| > |D'|, so there must be some ejDe_j \in D not contained in the span of DD'. As such, there must be some eke_k' with σkσj\sigma_k' \neq \sigma_j whose inner product with eje_j is nontrivial. This contradicts the above lemma.

Thus, the singular values of TT are unique.

\square

Extensions: a lot of this section also holds for Hilbert spaces over the quaternions, although no one appears to use them. There is also a variant of the SVD for compact maps between infinite Hilbert spaces. It seems that general maps between infinite Hilbert spaces are much harder to classify.

Classifying Maps Within a Hilbert Space

Just as with vanilla vector spaces, maps within a Hilbert space have more structure than maps between Hilbert spaces. You know the drill by now: new structure means a new notion of "isomorphism".

Definition: For this section only, two maps P,Q:VVP, Q: V \rightarrow V are isomorphic if they are related by a Hilbert space automorphism of VV.

In other words, PP and QQ are isomorphic if there is some Hilbert space automorphism A:VVA: V \rightarrow V such that Q=APA1Q = A P A^{-1}.

Unfortunately, I don't know a good classification theorem for this category of maps. In any case, such a classification should be at least as complicated as the Jordan normal form, and the Jordan normal form would already take us on too long a diversion. However, there are specific cases when a map within a Hilbert space map can be put into a canonical form, and this canonical forms can be used to classify the map. The most important case is that of self-adjoint maps.

The adjoint of a map T:VWT: V \rightarrow W is the unique map T:WVT^{\dagger}: W \rightarrow V satisfying the following condition for all vVv \in V and wWw \in W.

w,Tv=Tw,v\langle w, Tv\rangle = \langle T^{\dagger} w, v\rangle

A map T:VVT: V \rightarrow V is called self-adjoint if T=TT = T^{\dagger}. Alternatively, v1,Tv2=Tv1,v2\langle v_1, Tv_2\rangle = \langle Tv_1, v_2\rangle for all v1,v2Vv_1, v_2 \in V. In the real case, these maps are often called symmetric because any matrix representation of a symmetric map will be symmetric along the main diagonal. In the complex case, they are sometimes called Hermitian. Self-adjoint maps are very common, and fortunately, they have a nice classification theorem based on eigenvalues.

Theorem (Spectral Theorem): Suppose that T:VVT: V \rightarrow V is self-adjoint. There exists an orthogonal basis e1,e2,ene_1, e_2, \dots e_n for VV and decreasing real numbers λ1λ2λn\lambda_1 \geq \lambda_2 \geq \dots \lambda_n such that Tei=λieiT e_i = \lambda_i e_i. These eigenvalues are unique, and they determine TT up to isomorphism.

There are a lot of really cool connections between the spectral theorem and the SVD. For example, the eigenvectors of a self-adjoint matrix are all singular vectors, and its singular values are the absolute values of its eigenvalues. It's common to prove the singular values decomposition using the spectral theorem, but the spectral theorem can also be proved directly from the SVD (harder exercise). I avoided using eigenvalues in my discussion of the SVD because I wanted to emphasize that the SVD is about maps between different vector spaces, while eigenvalues only apply to maps within a vector space. Besides, I see the SVD as more "fundamental" than the spectral theorem, and deserves its own proof.

Extensions: Much of this section also applies to maps within a quaternion Hilbert space, too. The spectral theorem has a simple extension to normal maps within a Hilbert space. The spectral theorem becomes incredibly important in the theory of infinite-dimensional Hilbert spaces, and can even be extended to some unbounded maps on Hilbert spaces.