Analytic Geometry

Norm

A norm on a vector space $V$ is a function $|| \cdot ||:V \to \mathbb{R}$ such that $\forall \lambda \in \mathbb{R},\;x,y\in V$ , we have :

$||\lambda x|| = |\lambda|||x||$

$||x+y|| \leq||x||+||y||$

$||x||\geq0$

$||x||=0 \iff x=0$

Manhattan Norm (L1 norm)

||x||_1 = \sum_{i=1}^n|x_i|

Euclidian Norm

||x||_2=\sqrt{\sum_{i=1}^nx_i^2}

When we say $||x||$ without any subscript ,we usually refer to the Euclidian norm.

General inner product

An inner product is any mapping $<\cdot,\cdot>:V\times V \to W$ such that $\forall\; x,y,z \in V, \;p,q\in \mathbb{R}$ , we have

$<px+qy,\;z> =p<x,z>+q<y,z>$

$<z,\;px+qy> =p<z,x>+q<z,y>$

$<x,y>=<y,x>$

$<1,1>$ is positive.

Suppose $\bold{x} = \sum x_i \bold{b}_i$ and $\bold{y} = \sum y_i \bold{b}_i$ , then $<\bold{x,y}> = \sum_i x_i<\bold{b_i,y}> = \bold{x^Tz}$ where $z_i = <\bold{b_i,y}> = <\bold{b_i},\sum_jy_j\bold{b_j}> = \sum_j<\bold{b_i,b_j}>y_j = \sum_j\bold{r_i}^T\bold{y}$ where $\bold{r_i}$ is the column vector with the $j^\text{th}$ entry being $<\bold{b_i,b_j}>$ . And $\bold{r_i}^T$ is the row vector.

From here, it’s easy to see that $<\bold{x,y}> = \bold{x^TAy}$ where $A_{ij} = <\bold{b_i,b_j}>$ is the matrix with the $i^\text{th}$ row being $\bold{r_i}^T$ .

Since $A_{ij} = \;<\bold{b_i,b_j}> \; =\; <\bold{b_j,b_i}> \;=\; A_{ji}$ , thus $\bold{A}$ is a symmetric matrix.

Moreover because $\forall\bold{x}\in V,\;<\bold{x,x}>$ is positive, except for $\bold{x=0}$ when it’s 0, thus $\forall \bold{x}\in V-\{\bold{0}\},\;\;\bold{x^TAx}>0$ , which is what we call a positive definite matrix.

So finally, an inner product is an operation on $V\sube\mathbb{R}^n$ given by $<\bold{x,y}>=\bold{x^TAy}$ where $\bold{A} \in \mathbb{R}^{n\times n}$ is a symmetric, positive definite matrix.

Induced norms

Any definition of a norm that can be expressed using an inner product as $||\bold{x}||=\sqrt{<\bold{x,x}>}$ is called an induced norm.

For any general inner product, the Cauch-Schwarz inequality guarantees that

$|<\bold{x,y}>| \leq ||\bold{x}||\;||\bold{y}||$ , and thus, there is always a way of defining an angle between two vectors.

Angle between vectors

For any 2 vectors $\bold{x,y} \in V$ , if the angle between them is $\theta$ , then :

\cos(\theta)=\frac{|<\bold{x,y}>|}{||\bold{x}||\;||\bold{y}||}

And thus, we also have a notion of orthogonality, which is when $\cos(\theta)=0$ and thus $<\bold{x,y}>\;\;=0$ .

Orthonormal bases

A bases $B=\{\bold{b_1,b_2,\dots,b_n}\}$ is called orthonormal iff $\forall i,j\in \mathbb{N},\; i, j \leq n, \; i\neq j$ , we have $<\bold{b_i,b_i}> = 1$ and $<\bold{b_i,b_j}>=0$ . The inner product for such a the vector space generated by this bases is basically the dot product.

Orthogonal Projections

Suppose $U$ is a vector subspace of $V \sube \mathbb{R}^n$ with bases $B=\{\bold{b_1,b_2,\dots,b_m}\}$ , then for any $\bold{x} \in V$ , the projection $\bold{x}_U \in U$ is the vector in $U$ with the least distance $||\bold{x-x}_U||$ as given by a norm induced from inner product $<\cdot,\cdot>$ .

It can be shown that it is the case when $\bold{x-x}_U$ is perpendicular to every vector in $B$ because then, using the triangle inequality, it gives less distance than any other case.

To find the coordinates of $\bold{x}_U$ as expressed in $B$ , say $\bold{\lambda}$ , we write the equations

$0 = \;<\bold{b_i},\bold{x-x}_U> = <\bold{b_i},\bold{x-B\lambda}> = \bold{b_i^TA(x-B\lambda)}$ . Stacking up these equation for all $i$ till $m$ , we get : $\bold{0 = B^TA(x-B\lambda)} \implies \bold{B^TAB\lambda = B^TAx} \implies \bold{\lambda = (B^TAB)^{-1}B^TAx} \implies \bold{x_U = B(B^TAB)^{-1}B^TAx}$

Special case : If the default bases that everything is expressed in is an orthonormal bases , then $\bold{A=I}$ and we basically have $\bold{\lambda = (B^TB)^{-1}B^Tx}$ and $\bold{x_U = B(B^TB)^{-1}B^Tx}$ . This is what we’ll use usually. This is basically the least squares solution for the problem $\bold{B\lambda\approx x}$ .

Special Special case : If the default bases as well as the bases $B$ is orthonormal. In this case $\bold{B^TB=I}^{m\times m}$ and thus $\lambda = \bold{B^Tx}$ and $\bold{x}_U=\bold{BB^Tx}$ .

Note that here $\bold{BB^T}$ is not (always) equal to $\bold{I}^{n\times n}$ .

Projection of affine spaces

To project a vector is to find the point in a subspace closest to that vector, viewed as a point. Thus, being able to project a point on any general affine space (hyperplane) enables us to do things like SVC and SVM .

The projection of a point $\bold{x}$ on a hyperplane $\bold{x_0}+U$ is given by $\bold{x_0}+(\bold{x-x_0})_U$

Gram-Schmidt Orthogonalization

For a basis $B = \{\bold{b_1,b_2\dots,b_n}\}$ , define

$\bold{u_1=b_1}$ and $\bold{u_k=b_k-(b_k)_{\text{span}(\bold{u_1,u_2,\dots,u_{k-1}})}}$ for $1<k\leq n$ .

The basis $U = \{\bold{u_1,u_2,u_3\dots,u_n}\}$ is an orthonormal basis.

Here $(\bold{b_k})_{\text{span}(\bold{u_1,u_2,u_3\dots,u_{k-1}})}$ is the projection of $\bold{b_k}$ on the span of the $k-1$ elements of $U$ that we have already calculated.