Matrix Decompositions

Trace of a matrix

It’s the sum of diagonal elements of a matrix and is the only function satisfying these 4 properties :

Using the last property, we can get tr(R1AR)=tr(A)\text{tr}(\bold{R^{-1}AR}) = \text{tr}(\bold{A}) , which means that similar matrices have the same trace, and thus for any linear map f:VVf:V\to V, no matter in what bases it’s expressed, as long as it has same bases on either side, (before the map and after the map), we get the same trace for the matrix representing the map.

Eigenspace and geometric multiplicity of eigenvalues

For a matrix A\bold{A} , the Eigenspace of the eigenvalue λ\lambda , denoted by EλE_{\lambda} is the null space of AλI\bold{A-\lambda I} .

The geometric multiplicity is the dimension of this null space and the algebraic multiplicity is the number of times this eigenvalue repeats in the roots of characteristic polynomial.

Eigenvalue properties

Positive definite matrices

Any matrix A\bold{A} such that xTAx>0      xV{0}\bold{x^TAx}>0\;\; \forall \;\bold{x}\in V-\{0\} where VV is a vector space is positive definite.

Positive definite matrices always have positive eigenvalues if they are real, since for any such matrix A\bold{A} with any eigenvector v\bold{v} with eigenvalue λ\lambda , we have vTAv>0    vTλv=λv>0    λ>0\bold{v^TAv} >0 \implies \bold{v^T\lambda v} = \lambda ||\bold{v}|| >0 \implies \lambda > 0 .

Also, by putting the basis vectors instead of v\bold{v} , we get that the diagonal entries must also be positive.

Although positive definite matrices don’t have to be symmetric, a lot of texts put the restriction in the definition itself. And I’m going to do the same. So when I say “positive definite”, I actually mean “symmetric positive definite”. Am I being a little bitch here ? Absolutely ! But you’ll just have to deal with it :)

Symmetric matrices

Any matrix which is equal to its transpose is symmetric.

Spectral theorem : if A\bold{A} is a symmetric matrix, then the eigenvectors can be orthonormal and span the whole column space of A\bold{A} .

Proof of orthogonality : Suppose v1,v2\bold{v_1,v_2}  are eigenvectotrs with unequal eigenvalues λ1,λ2\lambda_1,\lambda_2 , then v1TAT=(Av1)T=λ1v1T    v1TATv2=λ1v1Tv2\bold{v_1^TA^T}=(\bold{Av_1})^T = \lambda_1\bold{v_1}^T \implies \bold{v_1^TA^Tv_2} = \lambda_1\bold{v_1^Tv_2} . But AT=A\bold{A^T=A} . Thus, λ1v1Tv2=v1TAv2=v1T(λ2v2)\bold{\lambda_1 v_1^Tv_2 = v_1^TAv_2 =v_1^T(\lambda_2v_2)} . But since λ1λ2\lambda_1 \neq \lambda_2 , so we have v1Tv2=0\bold{v_1^Tv_2=0} and thus v1,v2\bold{v_1,v_2} are perpendicular if they have unequal eigenvalues. In other words, any two eigenvectors from eigenspaces of different eigenvalues are orthogonal. And we already know, that the eigenvectors in the same eigenspace can be chosen such that they are orthogonal. Thus we can create an orthonormal basis using all the eigenvectors.

The other fact that these eigenvectors span the column space is hard to prove. This fact allows us to write A=RDRT=λiviviT\bold{A = RDR^T=\sum \lambda_i v_iv_i^T} where D,R\bold{D,R} are a diagonal matrix and an orthogonal matrix (rectangular or not), and vi,λi\bold{v_i},\lambda_i are the eigenvectors and corresponding eigenvalues for any symmetric matrix A\bold{A} . If A\bold{A} is not symmetric , then R\bold{R} is not an orthogonal matrix, but still has the linearly independent eigenvectors of A\bold{A} as its column vectors, and the equation is still correct. This is called eigenvalue decomposition.

Producing symmetric positive semidefinite matrices

For any matrix M\bold{M} (rectangular, singular, anything..), the matrix MTM\bold{M^TM} is symmetric positive semi-definite because 1) ..(MTM)T=MTM\bold{(M^TM)^T = M^TM} and 2) ….xTMTMx=(Mx)T(Mx)=Mx20\bold{x^TM^TMx=(Mx)^T(Mx)=||Mx||^2}\geq 0 . If the matrix is also non-singular, then the last inequality becomes strict and our composed matrix becomes symmetric definite.

Now you might wonder if the reverse can also be done ? That is, get a matrix M\bold{M} , given the matrix MTM\bold{M^TM} . It is certainly possible and can be done by the Cholesky Factorization

Cholesky Factorization

Given a symmetric matrix A\bold{A} with rank nn , we can write A=LLT\bold{A=LL^T} where L\bold{L} is an upper triangular matrix with nn rows and positive diagonal entries. On paper, this would look like this :

The first column vector on the RHS has the elements l112,l11l21,l11ln1l_{11}^2,l_{11}l_{21},\dots l_{11}l_{n1} and thus first gives the value of l11l_{11} and then all of li1l_{i1}. Now that you know the first column, you can easily form the equations for l22li2l_{22}l_{i2} . Again, get l22l_{22} first and then get all the other values. Keep repeating this procedure and you’ll end up knowing all columns of LT\bold{L}^T .

Singular Value decomposition

For any matrix A\bold{A} , we can write A=UΣVT=σiuiviT\bold{A=U\Sigma V^T} = \sum\bold{\sigma_iu_iv_i^T} where U,V\bold{U,V} are orthogonal square (rotation) matrices with unit column vectors ui,vi\bold{u_i,v_i} and Σ\bold{\Sigma} is a diagonal matrix (with 0 padding where necessary) with entries σi\sigma_i . On paper it looks like this :

We can get U,V\bold{U,V} by eliminating the other matrix like this :

ATA=VΣUTUΣVT=VΣ2VTAAT=UΣVTVΣUT=UΣ2UT\bold{A^TA=V\Sigma U^TU\Sigma V^T=V\Sigma^2V^T} \\ \bold{AA^T = U\Sigma V^TV\Sigma U^T=U\Sigma^2U^T}

and then performing eigenvalue decomposition of the matrices on LHS and assigning values to matrices. Note that the shape of Σ2\bold{\Sigma^2} is possibly different in both equations. this is accounted for by some entries in the bigger version being 0.

Notice that although it looks like it, Σ\bold{\Sigma} is not the diagonal matrix obtained on eigenvalue decomposition of A\bold{A} . But it can be if U=V\bold{U=V} , since then the equation on the top is basicaly the eigenvalue decomposition. In that case, we have AAT=ATA=Σ2\bold{AA^T=A^TA=\Sigma^2} . This is a neccesary and sufficient condition for A\bold{A} to be diagonalisable, or “normal” . In the other case, we say that A\bold{A} is “deficient”.

The summation form in the topmost equation can be used to approximate the matrix A\bold{A} as a sum of lower rank matrices. this is used in image compression. It’s also the value of the spectral norm of the matrix, which is the maximum scaling a vector receives when passed through the matrix. Basically, the spectral norm is :

A2=maxx(Ax2x2)=maxiσi||\bold{A}||_2 = \max_\bold{x}(\bold{\large\frac{||Ax||_2}{||x||_2}}) = \max_i \sigma_i

The last equality is can be proved like this : Ax2=xTATAx=ixT(uiσi2uiT)x=iσi2(uiTx)2σmax2(uiTx)2=σmax2x2\bold{||Ax||^2=x^TA^TAx=\sum_ix^T(u_i\sigma_i^2u_i^T)x=\sum_i\sigma_i^2(u_i^Tx)^2 \leq \sigma_\text{max}^2\sum(u_i^Tx)^2=\sigma_\text{max}^2||x||^2} The last equality is because U\bold{U} is orthogonal

Eckart-Young theorem

Consider a rank-rr matrix ARm×n\bold{A}\in \mathbb{R}^{m\times n} with the singular value decomposition A=UΣVT=i=1rσiuiviT\bold{A=U\Sigma V^T}=\sum_{i=1}^r\sigma_i\bold{u_iv_i^T} and the rank-kk approximation . For any rank-kk matrix BRm×n\bold{B}\in\mathbb{R}^{m\times n} with krk\leq r, we have AB2σk+1\bold{||A-B||_2}\geq\sigma_{k+1}, where equality happens when B=i=1kσiuiviT\bold{B}=\sum_{i=1}^k\sigma_i\bold{u_iv_i^T} .

Proof : Since B\bold{B} has rank k<rk<r , we can construct a vector w\bold{w} that lives in the null space of B\bold{B} using the first (k+1)(k+1) columns of V\bold{V} . Suppose w=α1v1+α2v2++αk+1vk+1=Vka\bold{w=\alpha_1v_1+\alpha_2v_2 + \dots+\alpha_{k+1}v_{k+1}} = \bold{V_ka} where Vk+1\bold{V_{k+1}} is the matrix with vi\bold{v_i} as column vectors and a\bold{a} is the column vector with entries αi\alpha_i . Also, suppose that the kk linearly independent column vectors as stacked up as row vectors to form YT\bold{Y}^T , then the matrix YTVk\bold{Y^TV_k} has shape k×(k+1)k\times (k+1) and thus the equation YTVka=0\bold{Y^TV_ka=0} is solvable (because you can just do Gaussian elimination with some columns not being 0. Then find the specific solution by setting the coordinates in a\bold{a} to 0 for such columns). Thus there is a vector w=Vka\bold{w =V_ka} that is orthogonal to all the kk linearly independent vectors in B\bold{B} and is thus in the null space of B\bold{B} .

AB2w2(AB)w2=Aw2=α12σ12+α22σ22++αk+12σk+12σk+12w2\bold{||A-B||^2||w||^2\geq||(A-B)w||^2=||Aw||^2=\alpha_1^2\sigma_1^2+\alpha_2^2\sigma_2^2+\dots+\alpha_{k+1}^2\sigma_{k+1}^2\geq\sigma_{k+1}^2||w||^2}

The first inequality is due to the definition of spectral.

Mind Map

Yes, yes.. this isn’t my style, and neither is it yours, but since it’s given to us just like that, we might as well ..