Linear algebra

$\mathbb{R}^n=\mathbb{R}\times \mathbb{R}\times \mathbb{R}\dots \times \mathbb{R}$ is a the cartesian product of the group (from group theory) of real numbers, $n$ times.

Vector Space

A vector space is a set $V$ with with binary operations $+,\cdot$ where $+$ is between elements (vectors) of $V$ and $\cdot$ is between elements of $V$ and elements (scalars) of $\mathbb{R}$ , such that :

$(V,+)$ is an Abelian Group.

$\forall \lambda \in \mathbb{R}, \;x,y \in V, \;\;\lambda\cdot(x+y)=\lambda\cdot x+\lambda\cdot y$

$\forall \lambda,\psi \in \mathbb{R}, \;x \in V, \;\;(\lambda+\psi)\cdot x=\lambda\cdot x+\psi\cdot x$

$\forall \lambda,\psi \in \mathbb{R}, \;x \in V, \;\;\psi\cdot(\lambda\cdot x)=\psi\lambda\cdot x$

$\forall x\in V,\;1\cdot x=x$ .

Dimension

If for a vectors space $V$ , the group $(V,+) \cong \mathbb{R}^n$ , then $\text{dim}(V) = n$ is the dimension of the vector space.

Thus the set of all matrices in $\mathbb{R}^{m\times n}$ over addition and scalar multiplication is also a vector space of dimension $mn$ . (In case you still haven’t got it, we are talking about image data) .

Subspaces

A vector subspace is just a subgroup of a vector space.

Affine spaces

For any subspace $U \sube V$ , any coset $v_i+U$ of the supspace is called an affine space.

Affine space containing affine space

For an affine space $v_1 + U_1$ to be a subset of another affine space $x_2+U_2$ , we require $\forall u_1\in U_1, \; \exist u_2 \in U_2, \;s.t.\;\;v_1 + u_1 = v_2 + u_2 \implies u_1 = (v_2-v_1)+u_2 \\ \implies \exist u_2 \in U_2, \;s.t.\;\; 0 = (v_2-v_1)+u_2 \iff u_2 = v_1-v_2 \\\implies (v_1-v_2) \in U_2 \\\implies \forall u_1 \in U_1, \; u_1 = -(v_2-v_1)+u_2 \in U_2$ for some $u_2 \in U_2$ .

Thus we arrive at $U_1 \sube U_2$ .

This condition is equivalent to the bases of $U_1$ being in $U_2$ .

Solving a system of equations with rectangular matrices

Consider the equation $\bold{Ax} = \bold{c}$ . To solve this, find a specific solution $\bold{x_0}$ and as many linearly independent solutions $\bold{x_1,x_2,\dots}$ to $\bold{Ax}=\bold{0}$ as possible. Then the general solution is $\bold{x} = \bold{x_0} + \lambda_1\bold{x_1} + \lambda_2\bold{x_2} + \dots$

To find $\bold{x_1,x_2},\dots$ quickly, first do Guassian elimination to get and then express the collumn vectors with multiple non zero entries, say the $j^{\text{th}}$ collumn as a linear combination of collumns with only 1 (and 0s) as entries. Then just put the coefficients as coordinates for $\bold{x_i}$ and put $-1$ as the $j^{\text{th}}$ coordinate.

Example : Suppose after row operations, we have arrived at this matrix :

Then using our method for the 2nd collumn, we get this vector : $\bold{x_1}= \begin{bmatrix}3\\-1\\0\\0\\0\end{bmatrix}$

And doing it for the 5th column vector, we get : $\bold{x_2} = \begin{bmatrix} 3\\0\\9\\-4\\-1\end{bmatrix}$

This way you can solve the equation when $\bold{A} \in \mathbb{R}^{m\times n}$ is a horizontal matrix ( $m < n$ ) . When it’s a vertical matrix, we simply chop off the lower rows after Guassian elimination. Another way to solve a system like this is to apply the Moore-Penrose pseudo-inverse on both sides, given by $(\bold{A}^\text{T}\bold{A})^{-1}\bold{A}^\text{T}$ . Be warned though..computing $\bold{A}^\text{T}\bold{A}$ is not easy.

This solution is also the one we get when we do the least squares approximation.

Least squares

Consider $\bold{A}$ to be a list of row vectors $\bold{r_i}$ which are really points in some $n$ dimensional space and $\bold{c}$ to be a list of values closed to ones assigned to these points by some linear function, say $c_i \approx f(\bold{r}) = x_1r_{1}+x_2r_2+\dots=\bold{x\cdot r}$ . We want to find the $\bold{x}$ that gives the best approximation and reduces the quantity $E(\bold{x})=\sum_{i}(c_i-f(\bold{r_i}))^2$ .

We consider $(\bold{Ax-c})^2$ as a function of coordinates of $\bold{x}$ and try to minimise it. It is easy to see that for the minima, we have $(\bold{Ax-c})^T(\bold{A}\hat x_i) = 0$ . These equations written one below other give rise to $(\bold{Ax-c})^T\bold{A} = \bold{0}$ . Taking transpose on both sides, we get $\bold{A}^T(\bold{Ax-c}) = \bold{0} \\ \implies \bold{A^TAx = A^Tc} \\ \implies \bold{x=(A^TA)^{-1}A^Tc}$ .

This method is rather computation heavy. In practice, we set up an iteration of the form : $\bold{x}_{(k+1)} = \bold{C\,x}_{k} + \bold{d}$ that decreases $E(\bold{x}_{k})$ in every iteration.

linear mapping

A homomorphism (function distributive over group operation (addition here)), say, $f:V\to W$ , between vector spaces is called a linear mapping if it meets this extra criterion : $\forall\; \lambda \in \mathbb{R},\bold{x}\in V,\;\;f(\lambda \bold{x}) = \lambda f(\bold{x})$ .

Because of the linearity, we can represent such a function as a matrix.

Suppose $f:V\to W$ is a linear map with $V,W$ expressed using the basis $B=\{\bold{b_1,b_2}\dots\}$ and $C=\{\bold{c_1,c_2}\dots\}$ , then $f$ can be expressed by a matrix with the $i^{\text{th}}$ column vector given by the coordinates of $f(\bold{b_i})$ as expressed using C .

Note that any element of $V,W$ isn’t simply a list of coordinates but a full physical vector that requires a basis to be expressed as coordinates in.

Rank and nullity

Since a linear map $f:V \to W$ is a homomorphism, according to the first isomorphism theorem, we have $^V/_{\text{ker }f} \cong \text{im}(f)$ and thus $V \cong \text{im}(f) \times \text{ker }f$ , which tells us $\text{dim}(V) = \text{dim}(\text{im}(f)) + \text{dim}(\text{ker }f)$

Define $\text{dim}(\text{im}(f)) = \text{rk}(A_f)$ as the rank of the matrix that represents $f$ in some basis and define $\text{dim}(\text{ker }f) = \text{nul}(A_f)$ as the nullity (The kernel is also called the null space, since it gets mapped to 0). It’s easy to see that the rank is number of linearly independent column vectors of $A_f$ and thus $\text{im}(f)$ is also called the column space of $A_f$ , which is generated by these linearly independent columns. And since $\text{dim}(V)$ is basically the number of columns in $A_f$ , the nullity is the the number of remaining columns.

Column rank and row rank are the same.

Define column rank to the number of independent columns and row rank to be number of independent rows for any matrix.

Since elementary row operations do not change the column rank or row rank (poof left as exercise), thus after Guassian elimination, these quantities will still be same as of the original matrix. And for a reduced row echolon form, both the column and row ranks are just the number of non zero entries in the marix, and thus they are the same.

Change of basis

If a vector space $V$ is expressed using two diffent basis $B=\{\bold{b_1,b_2}\dots\}$ and $B'=\{\bold{b'_1,b'_2}\dots\}$ .

Then to go from representation of a vector using $B'$ to one using $B$ , we have to simply multiply by a matrix with column vectors being $\bold{b_i'}$ expressed using $B$ .

Suppose the matrices $\bold{B,B'}$ have column vectors as $\bold{b_i,b'_i}$ expressed using a basis $G$ , then the matrix for changing basis from $B'$ to $B$ is given by $\bold{B^{-1}B'}$ .

Now, consider a linear map $f:V\to W$ given by the matrix $\bold{A}$ when expressed using basis $B,C$ for $V,W$ and the same map can also be given by the matrix $\bold{A'}$ when expressed using basis $B',C'$ for $V,W$ respectively. Suppose $\bold{S}$ is the matrix that changes the basis from $B'$ to $B$ and $\bold{T}$ changes from $C'$ to $C$ . Then,

\bold{A'} = \bold{T^{-1}A}\bold{S}

Expressing everything in some universal frame, we can write :

\bold{A'}=\bold{((C')^{-1}C)\;A\;(B^{-1}B')}

(Note : Read right to left please)