site logo
LIAM AXON

A Neat Bound on the Matrix Exponential

Jan 5, 2022
tags: math, analysis

Pre-requisites: Linear algebra, ODEs

Background

The exponential is one of the most fundamental functions in mathematics, and especially in differential equations. Using Taylor series, it is possible to prove that

ex=k0xkk!(1)e^x = \sum_{k \geq 0} \frac{x^k}{k!} \tag{1}

for any xRx \in \mathbb{R}. This sum is absolutely convergent for every xx, so it also makes sense even in situations where xx is not a real number. This is, for instance, how eze^z is defined when zz is a complex number, giving rise to the meme-worthy formula

eπi=1.e^{\pi i} = -1.

This sum is also how you define the exponential of a (square) matrix.

The Matrix Exponential

The exponential of a real number shows up literally everywhere when studying differential equations with a single variable, such as

ddtf(t)=t.\frac{\mathrm{d}}{\mathrm{d}t}f(t) = t.

In the same way, the matrix exponential shows up all over the place when studying systems of differential equations, such as

ddtf0(x)=f1(x)ddtf1(x)=f0(x)\frac{\mathrm{d}}{\mathrm{d}t}f_0(x) = f_1(x)\\ \frac{\mathrm{d}}{\mathrm{d}t}f_1(x) = f_0(x)

In particular, the matrix exponential shows up in linear differential equations. Take for instance the following initial value problem w(t)Rnw(t) \in \mathbb{R}^n, and AMn×n(R)A \in M_{n \times n}(\mathbb{R}).

ddtw(t)=Aw(t)\frac{\mathrm{d}}{\mathrm{d}t} w(t) = A w(t)
w(0)=w0w(0) = w_0

This has the (unique) solution

w(t)=eAtw0.w(t) = e^{A t} w_0.

This solution can be verified by replacing eAte^{A t} with its defining Taylor series, and differentiating term by term. Estimates of the matrix exponential help mathematicians to understand the behavior of vector-valued differential equations.

Preliminary Bounds

My goal in this post is to present a neat bound on the matrix exponential, but first I think it is prudent to go through some alternative bounds on the exponential. Because the following discussion requires us to use eigenvalues and eigenvectors, which may be complex, we naturally will naturally work with matrices in Mn×n(C)M_{n \times n}(\mathbb{C}). For AMn×n(C)A \in M_{n \times n}(\mathbb{C}), let A\|A\| denote its spectral norm.

Within the complex numbers, exponential is very well-understood. Euler's famous formula gives that:

ez=ez(cos(z)+isin(z))e^z = e^{\Re z}\left(\cos(\Im z) + i\sin(\Im z)\right)

In particular,

ez=ezez(2)|e^z| = e^{\Re z} \leq e^{|z|} \tag{2}

Now, consider exponentials in Mn×n(C)M_{n \times n}(\mathbb{C}). We get the basic bound

eA=1+x1!+x22!\|e^A\| = \left\|1 + \frac{x}{1!} + \frac{x^2}{2!} \dots \right\|
1+x1!+x2!ex\leq 1 + \frac{\|x\|}{1!} + \frac{\|x\|}{2!} \dots \leq e^{\|x\|}

Thus,

eAeA(3)\|e^A\| \leq e^{\|A\|} \tag{3}

This bound can be pretty lousy at times, but it is a start. For instance, if A=t1A = -t 1, where 11 is the identity matrix, then eA=et\|e^A\| = e^{-t}. In constrast, the above bound only guarantees that eAet\|e^A\| \leq e^t.

For certain matrices, we can compute eAe^A exactly, and use this to get a much more exact bound. Let AA be a diagonal matrix:

A=(a100a2an)A = \begin{pmatrix} a_1 & 0 & \dots\\ 0 & a_2 & \dots\\ \vdots & \vdots & a_n \end{pmatrix}

Then, the matrix exponential is exactly

eA=(ea100ea2ean)e^A = \begin{pmatrix} e^{a_1} & 0 & \dots\\ 0 & e^{a_2} & \dots\\ \vdots & \vdots & e^{a_n} \end{pmatrix}

and we have the following bound

eA=maxi=1neai(4)\|e^A\| = \max_{i=1 \dots n} e^{\Re a_i} \tag{4}

If AA is diagonalizable, then we can get a bound similar to (4). Suppose that A=TDT1A = T D T^{-1}, where DD is a diagonal matrix. Then,

eA=eTDT1=TeDT1e^A = e^{T D T^{-1}} = T e^D T^{-1}

The diagonal entries of DD are exactly the eigenvalues of AA. Let λ1λn\lambda_1 \dots \lambda_n be the eigenvalues of AA, and define

Λ(A)=maxi=1nλi.\Lambda(A) = \max_{i=1 \dots n} \Re \lambda_i.

It can be shown that Λ\Lambda is a continuous function Mn×n(C)RM_{n \times n}(\mathbb{C}) \rightarrow \mathbb{R}. With this notation, we can write the approximation:

eATT1eΛ(A)(5)\| e^A \| \leq \|T\|\|T^{-1}\| e^{\Lambda(A)} \tag{5}

More generally, any square matrix AA is similar to a matrix JJ in Jordan canonical form, A=TJT1A = T J T^{-1}. The exponential eJe^J can be computed exactly, yielding the bound:

eACnTT1(1+An1)eΛ(A)(6)\|e^A\| \leq C_n \|T\|\|T^{-1}\|(1+\|A\|^{n-1}) e^{\Lambda(A)} \tag{6}

for some constant Cn>0C_n > 0 that depends only on nn, the dimension. This bound, although very useful, is not entirely satisfying because it depends on TT, which may be very hard to compute and may be arbitrarily large.

A Pretty Good Bound

It turns out that:

eACn(1+An1)eΛ(A)(7)\|e^A\| \leq C_n (1+\|A\|^{n-1}) e^{\Lambda(A)} \tag{7}

The constant CnC_n may be quite large, but it is at most exponential in nn. Since Λ\Lambda is a continuous function, this bound is actually continuous in AA, which is why I like it so much.

For a period of time, I was very interested in understanding the matrix exponential, but I had trouble finding any bounds similar to (7). This is one of the reasons why I'm making this blog post. Eventually, with the help of Math Stack Exchange, I received an answer from user Anton Malyshev. The proof that I present here is a paraphrasing of his proof.

Proof: If n=0n = 0, then this bound is clearly true. Proceed by induction on nn.

Let AMn×n(C)A \in M_{n \times n}(\mathbb{C}). Let vv be a unit eigenvector of AA with eigenvalue λ\lambda. Without loss of generality, suppose that Λ(A)=λ\Lambda(A) = \Re \lambda. Let VV be the 1-dimensional subspace of Cn\mathbb{C}^n spanned by vv, and let WW be the orthogonal subspace to VV. Note that dimW=n1\dim W = n-1. Find bVb \in V and D:VVD: V \rightarrow V such that, for wWw \in W,

Aw=(bw)v+DwA w = (b \cdot w) v + D w

Informally, we could write AA in block matrix form as:

A=(λb0D)A = \begin{pmatrix} \lambda & b \\ 0 & D \end{pmatrix}

To compute eAe^A, consider the initial value problem

u(0)=u0u(0) = u_0
ddtu(t)=Au(t)(8)\frac{\mathrm{d}}{\mathrm{d}t}u(t) = A u(t) \tag{8}

The (unique) solution to this differential equation is:

u(t)=eAtu0u(t) = e^{A t} u_0

Thus, to bound eAe^A, it suffices to bound u(1)u(1).

We next project equation (8) onto VV and WW. Let u(t)=uV(t)+uW(t)u(t) = u_V(t) + u_W(t) where uV(t)Vu_V(t) \in V and uW(t)Wu_W(t) \in W. Then, (8) becomes the following:

{ddtuV(t)=λuV(t)+(buW(t))vddtuW(t)=DuW(t)(9)\begin{cases} \frac{\mathrm{d}}{\mathrm{d}t} u_V(t) = \lambda u_V(t) + (b \cdot u_W(t)) v\\ \frac{\mathrm{d}}{\mathrm{d}t} u_W(t) = D u_W(t) \end{cases} \tag{9}

Now, uW(t)u_W(t) satisfies a simple linear equation. We can thus solve for uW(t)u_W(t) exactly.

uW(t)=eDtuW(0)u_W(t) = e^{D t} u_W(0)

By induction, we have the bound:

uW(t)Cn1(1+Dn2tn2)eΛ(D)tuW(0)\|u_W(t)\| \leq C_{n-1} (1 + \|D\|^{n-2} t^{n-2}) e^{\Lambda(D)t} \|u_W(0)\|

Note that DA\|D\| \leq \|A\|, Λ(D)Λ(A)\Lambda(D) \leq \Lambda(A), and uW(0)u0\|u_W(0)\| \leq \|u_0\|. This yields the following, somewhat simpler bound:

uW(t)Cn1(1+An2tn2)eΛ(A)tu0(10)\|u_W(t)\| \leq C_{n-1} (1 + \|A\|^{n-2} t^{n-2}) e^{\Lambda(A) t} \|u_0\| \tag{10}

Bounding uV(t)u_V(t) is trickier, but possible. Since uV(t)u_V(t) is a multiple of VV, we can define α:RC\alpha: \mathbb{R} \rightarrow \mathbb{C} by: uV(t)=α(t)vu_V(t) = \alpha(t) v. Plugging this in to (9) gives us the differential equation:

ddtα(t)=λα(t)+buW(t)(11)\frac{\mathrm{d}}{\mathrm{d}t} \alpha(t) = \lambda \alpha(t) + b \cdot u_W(t) \tag{11}

The solution to to this differential equation can be approximated using integrating factors. Define β(t)=eλtα(t)\beta(t) = e^{-\lambda t} \alpha(t). Some routine calculations follow...

ddtβ(t)=eλtbuW(t)\frac{\mathrm{d}}{\mathrm{d}t} \beta(t) = e^{- \lambda t}b \cdot u_W(t)
β(t)=0teλsbuW(s)ds+β(0)\beta(t) = \int_0^t e^{-\lambda s} b \cdot u_W(s) \mathrm{d} s + \beta(0)
α(t)=eλt0teλsbuW(s)ds+eλtα(0)\alpha(t) = e^{\lambda t} \int_0^t e^{-\lambda s} b \cdot u_{W}(s) \mathrm{d} s + e^{\lambda t}\alpha(0)
uV(t)eΛ(A)t0teΛ(A)sAuW(s)ds+eΛ(A)tu0\|u_V(t)\| \leq e^{\Lambda(A) t} \int_0^t e^{-\Lambda(A) s} \|A\| \|u_W(s)\| \mathrm{d} s + e^{\Lambda(A) t} \|u_0\|

Excellent! We can now use the bounds for uW(t)u_W(t) from equation (10).

uV(t)Cn1u0AeΛ(A)t0t(1+An2sn2)ds+eΛ(A)tu0\|u_V(t)\| \leq C_{n-1} \|u_0\| \|A\| e^{\Lambda(A) t} \int_0^t (1 + \|A\|^{n-2} s^{n-2}) \mathrm{d}s + e^{\Lambda(A) t} \|u_0\|

This can be integrated by hand, and approximated by:

uV(t)C(1+An1tn1)eΛ(A)tu0\|u_V(t)\| \leq C (1 + \|A\|^{n-1} t^{n-1}) e^{\Lambda(A) t} \|u_0\|

This completes the proof. \square.

Final Comments

The bound (7) doesn't always give great results, especially if AA is close to a diagonal or Hertitian matrix. Bound (7) can be improved to:

eACn(1+AAn1)eΛ(A)\|e^A\| \leq C_n(1 + \|A - A^*\|^{n-1}) e^{\Lambda(A)}

or

eACn(1+A+An1)eΛ(A)\|e^A\| \leq C_n(1 + \|A + A^*\|^{n-1}) e^{\Lambda(A)}

Here, AA^* represents the adjoint of AA. These bounds can be further refined, but they become more messy.