Linear algebra

Rn=R×R×R×R\mathbb{R}^n=\mathbb{R}\times \mathbb{R}\times \mathbb{R}\dots \times \mathbb{R} is a the cartesian product of the group (from group theory) of real numbers, nn times.

Vector Space

A vector space is a set VV with with binary operations +,+,\cdot where ++ is between elements (vectors) of VV and \cdot is between elements of VV and elements (scalars) of R\mathbb{R} , such that :

Dimension

If for a vectors space VV , the group (V,+)Rn(V,+) \cong \mathbb{R}^n , then dim(V)=n\text{dim}(V) = n  is the dimension of the vector space.

Thus the set of all matrices in Rm×n\mathbb{R}^{m\times n} over addition and scalar multiplication is also a vector space of dimension mnmn . (In case you still haven’t got it, we are talking about image data) .

Subspaces

A vector subspace is just a subgroup of a vector space.

Affine spaces

For any subspace UVU \sube V , any coset vi+Uv_i+U of the supspace is called an affine space.

Affine space containing affine space

For an affine space v1+U1v_1 + U_1 to be a subset of another affine space x2+U2x_2+U_2 , we require u1U1,  u2U2,  s.t.    v1+u1=v2+u2    u1=(v2v1)+u2    u2U2,  s.t.    0=(v2v1)+u2    u2=v1v2    (v1v2)U2    u1U1,  u1=(v2v1)+u2U2\forall u_1\in U_1, \; \exist u_2 \in U_2, \;s.t.\;\;v_1 + u_1 = v_2 + u_2 \implies u_1 = (v_2-v_1)+u_2 \\ \implies \exist u_2 \in U_2, \;s.t.\;\; 0 = (v_2-v_1)+u_2 \iff u_2 = v_1-v_2 \\\implies (v_1-v_2) \in U_2 \\\implies \forall u_1 \in U_1, \; u_1 = -(v_2-v_1)+u_2 \in U_2  for some u2U2u_2 \in U_2 .

Thus we arrive at U1U2U_1 \sube U_2 .

This condition is equivalent to the bases of U1U_1 being in U2U_2 .

Solving a system of equations with rectangular matrices

Consider the equation Ax=c\bold{Ax} = \bold{c} . To solve this, find a specific solution x0\bold{x_0} and as many linearly independent solutions x1,x2,\bold{x_1,x_2,\dots} to Ax=0\bold{Ax}=\bold{0} as possible. Then the general solution is x=x0+λ1x1+λ2x2+\bold{x} = \bold{x_0} + \lambda_1\bold{x_1} + \lambda_2\bold{x_2} + \dots

To find x1,x2,\bold{x_1,x_2},\dots quickly, first do Guassian elimination to get and then express the collumn vectors with multiple non zero entries, say the jthj^{\text{th}} collumn as a linear combination of collumns with only 1 (and 0s) as entries. Then just put the coefficients as coordinates for xi\bold{x_i} and put 1-1 as the jthj^{\text{th}} coordinate.

Example : Suppose after row operations, we have arrived at this matrix :

Then using our method for the 2nd collumn, we get this vector : x1=[31000]\bold{x_1}= \begin{bmatrix}3\\-1\\0\\0\\0\end{bmatrix}

And doing it for the 5th column vector, we get : x2=[30941]\bold{x_2} = \begin{bmatrix} 3\\0\\9\\-4\\-1\end{bmatrix}

This way you can solve the equation when ARm×n\bold{A} \in \mathbb{R}^{m\times n} is a horizontal matrix (m<nm < n) . When it’s a vertical matrix, we simply chop off the lower rows after Guassian elimination. Another way to solve a system like this is to apply the Moore-Penrose pseudo-inverse on both sides, given by (ATA)1AT(\bold{A}^\text{T}\bold{A})^{-1}\bold{A}^\text{T} . Be warned though..computing ATA\bold{A}^\text{T}\bold{A} is not easy.

This solution is also the one we get when we do the least squares approximation.

Least squares

Consider A\bold{A} to be a list of row vectors ri\bold{r_i} which are really points in some nn dimensional space and c\bold{c} to be a list of values closed to ones assigned to these points by some linear function, say cif(r)=x1r1+x2r2+=xrc_i \approx f(\bold{r}) = x_1r_{1}+x_2r_2+\dots=\bold{x\cdot r}. We want to find the x\bold{x} that gives the best approximation and reduces the quantity E(x)=i(cif(ri))2E(\bold{x})=\sum_{i}(c_i-f(\bold{r_i}))^2 .

We consider (Axc)2(\bold{Ax-c})^2 as a function of coordinates of x\bold{x} and try to minimise it. It is easy to see that for the minima, we have (Axc)T(Ax^i)=0(\bold{Ax-c})^T(\bold{A}\hat x_i) = 0 . These equations written one below other give rise to (Axc)TA=0(\bold{Ax-c})^T\bold{A} = \bold{0} . Taking transpose on both sides, we get AT(Axc)=0    ATAx=ATc    x=(ATA)1ATc\bold{A}^T(\bold{Ax-c}) = \bold{0} \\ \implies \bold{A^TAx = A^Tc} \\ \implies \bold{x=(A^TA)^{-1}A^Tc} .

This method is rather computation heavy. In practice, we set up an iteration of the form : x(k+1)=Cxk+d\bold{x}_{(k+1)} = \bold{C\,x}_{k} + \bold{d} that decreases E(xk)E(\bold{x}_{k}) in every iteration.

linear mapping

A homomorphism (function distributive over group operation (addition here)), say, f:VWf:V\to W , between vector spaces is called a linear mapping if it meets this extra criterion :   λR,xV,    f(λx)=λf(x)\forall\; \lambda \in \mathbb{R},\bold{x}\in V,\;\;f(\lambda \bold{x}) = \lambda f(\bold{x}) .

Because of the linearity, we can represent such a function as a matrix.

Suppose f:VWf:V\to W is a linear map with V,WV,W expressed using the basis B={b1,b2}B=\{\bold{b_1,b_2}\dots\} and C={c1,c2}C=\{\bold{c_1,c_2}\dots\} , then ff can be expressed by a matrix with the ithi^{\text{th}} column vector given by the coordinates of f(bi)f(\bold{b_i}) as expressed using C .

Note that any element of V,WV,W isn’t simply a list of coordinates but a full physical vector that requires a basis to be expressed as coordinates in.

Rank and nullity

Since a linear map f:VWf:V \to W is a homomorphism, according to the first isomorphism theorem, we have V/ker fim(f)^V/_{\text{ker }f} \cong \text{im}(f) and thus Vim(f)×ker fV \cong \text{im}(f) \times \text{ker }f , which tells us dim(V)=dim(im(f))+dim(ker f)\text{dim}(V) = \text{dim}(\text{im}(f)) + \text{dim}(\text{ker }f)

Define dim(im(f))=rk(Af)\text{dim}(\text{im}(f)) = \text{rk}(A_f) as the rank of the matrix that represents ff in some basis and define dim(ker f)=nul(Af)\text{dim}(\text{ker }f) = \text{nul}(A_f) as the nullity (The kernel is also called the null space, since it gets mapped to 0). It’s easy to see that the rank is number of linearly independent column vectors of AfA_f and thus im(f)\text{im}(f) is also called the column space of AfA_f , which is generated by these linearly independent columns. And since dim(V)\text{dim}(V) is basically the number of columns in AfA_f , the nullity is the the number of remaining columns.

Column rank and row rank are the same.

Define column rank to the number of independent columns and row rank to be number of independent rows for any matrix.

Since elementary row operations do not change the column rank or row rank (poof left as exercise), thus after Guassian elimination, these quantities will still be same as of the original matrix. And for a reduced row echolon form, both the column and row ranks are just the number of non zero entries in the marix, and thus they are the same.

Change of basis

If a vector space VV is expressed using two diffent basis B={b1,b2}B=\{\bold{b_1,b_2}\dots\} and B={b1,b2}B'=\{\bold{b'_1,b'_2}\dots\} .

Then to go from representation of a vector using BB' to one using BB, we have to simply multiply by a matrix with column vectors being bi\bold{b_i'} expressed using BB .

Suppose the matrices B,B\bold{B,B'} have column vectors as bi,bi\bold{b_i,b'_i} expressed using a basis GG , then the matrix for changing basis from BB' to BB is given by B1B\bold{B^{-1}B'} .

Now, consider a linear map f:VWf:V\to W given by the matrix A\bold{A} when expressed using basis B,CB,C for V,WV,W and the same map can also be given by the matrix A\bold{A'} when expressed using basis B,CB',C' for V,WV,W respectively. Suppose S\bold{S} is the matrix that changes the basis from BB' to BB and T\bold{T} changes from CC' to CC . Then,

A=T1AS\bold{A'} = \bold{T^{-1}A}\bold{S}

Expressing everything in some universal frame, we can write :

A=((C)1C)  A  (B1B)\bold{A'}=\bold{((C')^{-1}C)\;A\;(B^{-1}B')}

(Note : Read right to left please)