Change of Basis

Mar 27 2016

This is a series of notes taken during my review of linear algebra, using Axler's excellent textbook Linear Algera Done Right, which will be heavily referenced. This topic is actually banished to the end of Axler's book, with discussions about trace and determinants. But I like to introduce it early, as it's beneficial to view some of the operator decomposition (esp. eigen-decomposition) as simply a change of basis.

$$ \newcommand{\X}{$X$} \newcommand{\Y}{$Y$} \newcommand{\la}{\langle} \newcommand{\ra}{\rangle} \newcommand{\bv}{\mathbf{v}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bw}{\mathbf{w}} \newcommand{\be}{\mathbf{e}} \newcommand{\bs}{\mathbf{s}} \newcommand{\bff}{\mathbf{f}} $$

A bit of (sloppy) notations/conventions first: in general we work with a linear map \(T \in \mathcal{L}(V, W)\), and finite dimensional vector spaces \(V\) and \(W\). Assume the vector spaces have bases denoted by the corresponding lowercase bold letters, for example \(V\) has the basis \(\bv = v_1,...,v_n\) (assuming \(\operatorname{dim}(V)=n\)). The book uses the notation \(\mathcal{M}(T, (v_1,...,v_n), (w_1,...,w_m) )\) to denote the matrix associated with \(T\) with respect to input basis \(\bv\) and output basis \(\bw\), to emphasize the nature of \(\mathcal{M}\) as a linear map (an isomorphism in fact) and its dependence on the bases, but we'll use the shorthand \([T]_\bv^\bw\) to save space.

In the case of an operator \(T \in \mathcal{L}(V,V)\), since we almost always use the same basis for the input and output spaces (which are both \(V\)), we will use \([T]_\bv\) as a shorthand for \([T]_\bv^\bv\). Additionally, we also use

$$[x]_\bv = \begin{bmatrix} c_1\\ \vdots \\ c_n \end{bmatrix} $$

to denote the n-by-1 (coordinate) matrix of vector \(x\) with respect to basis \(\bv\), so that \(x =c_1 v_1 + ... + c_n v_n\) (see definition 3.62). With this notation we can write (3.65) for linear operators as

$$\forall x \in V, [Tx]_\bv = [T]_\bv [x]_\bv $$

10.4 The matrix of the product of linear maps

This is a convenient restatement of (3.43) applied to two different bases of the same vector space (with \(U\) and \(W\) both equal to \(V\)), and holds because of how we defined matrix multiplication.

Theorem: suppose \(\bv = v_1,...,v_n\), \(\bu=u_1,...,u_n\) and \(\bw=w_1,...,w_n\) are all bases of \(V\); suppose \(S,T \in \mathcal{L}(V)\). Then

$$[ST]_\bu^\bw = [S]_\bv^\bw [T]_\bu^\bv $$

10.5 Matrix of the identity with respect to two bases

The identity operator \(I\) takes any vector in a vector space \(V\) back to itself, that is \(\forall x \in V, Ix=x\). With respect to any basis of \(V\), \(I\)'s matrix always has ones on the diagonal and zeros elsewhere--it's the identity matrix. Things get more interesting when we consider \(I\) to be a "conventional" linear map, one defined over potentially different input space and output space, each with its own basis.

Let \(\bv = v_1,...,v_n\), \(\bu=u_1,...,u_n\) be two bases of \(V\). Then the matrix \([I]_\bu^\bv\) effectively translates the coordinates of a vector \(x\) in \(\bu\) to those in \(\bv\) (but \(x\) stays the same), i.e.

$$ [x]_\bv = [I]_\bu^\bv [x]_\bu $$

To see this, start with the definition of the identity:

$$Ix=x$$

To bring bases and matrices into the picture, consider the matrix representation of the above w.r.t \(\bv\):

$$[Ix]_\bv = [x]_\bv$$

Suppose \(x=a_1 u_1 + ... + a_n u_n\), then the LHS can be rewritten

$$[a_1 I u_1 + ... + a_n I u_n]_\bv = a_1[I u_1]_\bv + ... + a_n[I u_n]_\bv$$

where we used the linearity of the isomorphism \(\mathcal{M}\) (recall our notation \([\cdot]_\bv\) is equivalent to \(\mathcal{M}(\cdot,\bv)\)). Remember that by definition the kth column of the matrix \([I]_\bu^\bv\) consists of the scalars needed to write \(I u_k = u_k\) as a linear combination of \(v_1,...,v_n\)--but this is precisely \([I u_k]_\bv\)! So the above is a linear combination of columns of \([I]_\bu^\bv\), with the coefficients \(a_1,...,a_n\), which is equal to the matrix multiplication \([I]_\bu^\bv [x]_\bu\) (see (3.52)).

Theorem 10.5 then says that \([I]_\bu^\bv\) and \([I]_\bv^\bu\) are matrix inverses of each other; the two associated ``translations" cancel out. But keep in mind the identity operator \(I\) remains the same. See example 10.6.

10.7 Change of basis formula

Suppose \(\bv = v_1,...,v_n\), \(\bu=u_1,...,u_n\) are bases of \(V\). Let \(A=[I]_\bu^\bv\). Then

$$[T]_\bu = [I]_\bv^\bu [T]_\bv^\bv [I]_\bu^\bv = A^{-1} [T]_\bv A $$

The matrix \([I]_\bu^\bv\) is often called a ``change of basis transformation'' from \(\bu\) to \(\bv\).

We also call \([T]_\bu\) and \([T]_\bv\) \(\textit{similar matrices}\), and say they're related by a similarity transform. And since they describe the same operator \(T\) (in two different bases), \([T]_\bu\) and \([T]_\bv\) have the same characteristics, including rank (thus invertibility) and characteristic polynomial (thus eigenvalues, trace, and determinant; but generally not the same eigenvectors). Basically, all the important \(\textit{numbers}\) go with the operators; the exact \(\textit{vectors}\) depend on the particular basis and matrix (indeed, the book first defines the trace and determinant of an \(\textit{operator}\), then defines them for its matrix in arbitrary basis, and show that they actually agree).