A norm on a vector space V is a function ∣∣⋅∣∣:V→R such that ∀λ∈R,x,y∈V , we have :
∣∣λx∣∣=∣λ∣∣∣x∣∣
∣∣x+y∣∣≤∣∣x∣∣+∣∣y∣∣
∣∣x∣∣≥0
∣∣x∣∣=0⟺x=0
Manhattan Norm (L1 norm)
Euclidian Norm
When we say ∣∣x∣∣ without any subscript ,we usually refer to the Euclidian norm.
General inner product
An inner product is any mapping <⋅,⋅>:V×V→W such that ∀x,y,z∈V,p,q∈R , we have
<px+qy,z>=p<x,z>+q<y,z>
<z,px+qy>=p<z,x>+q<z,y>
<x,y>=<y,x>
<1,1> is positive.
Suppose x=∑xibi and y=∑yibi , then <x,y>=∑ixi<bi,y>=xTz where zi=<bi,y>=<bi,∑jyjbj>=∑j<bi,bj>yj=∑jriTy where ri is the column vector with the jth entry being <bi,bj> . And riT is the row vector.
From here, it’s easy to see that <x,y>=xTAy where Aij=<bi,bj> is the matrix with the ith row being riT.
Since Aij=<bi,bj>=<bj,bi>=Aji , thus A is a symmetric matrix.
Moreover because ∀x∈V,<x,x> is positive, except for x=0 when it’s 0, thus ∀x∈V−{0},xTAx>0 , which is what we call a positive definite matrix.
So finally, an inner product is an operation on V⊆Rn given by <x,y>=xTAy where A∈Rn×n is a symmetric, positive definite matrix.
Induced norms
Any definition of a norm that can be expressed using an inner product as ∣∣x∣∣=<x,x> is called an induced norm.
For any general inner product, the Cauch-Schwarz inequality guarantees that
∣<x,y>∣≤∣∣x∣∣∣∣y∣∣ , and thus, there is always a way of defining an angle between two vectors.
Angle between vectors
For any 2 vectors x,y∈V , if the angle between them is θ , then :
And thus, we also have a notion of orthogonality, which is when cos(θ)=0 and thus <x,y>=0 .
Orthonormal bases
A bases B={b1,b2,…,bn} is called orthonormal iff ∀i,j∈N,i,j≤n,i=j , we have <bi,bi>=1 and <bi,bj>=0 . The inner product for such a the vector space generated by this bases is basically the dot product.
Orthogonal Projections
Suppose U is a vector subspace of V⊆Rn with bases B={b1,b2,…,bm} , then for any x∈V , the projection xU∈U is the vector in U with the least distance ∣∣x−xU∣∣ as given by a norm induced from inner product <⋅,⋅> .
It can be shown that it is the case when x−xU is perpendicular to every vector in B because then, using the triangle inequality, it gives less distance than any other case.
To find the coordinates of xU as expressed in B , say λ, we write the equations
0=<bi,x−xU>=<bi,x−Bλ>=biTA(x−Bλ) . Stacking up these equation for all i till m, we get : 0=BTA(x−Bλ)⟹BTABλ=BTAx⟹λ=(BTAB)−1BTAx⟹xU=B(BTAB)−1BTAx
Special case : If the default bases that everything is expressed in is an orthonormal bases , then A=I and we basically have λ=(BTB)−1BTx and xU=B(BTB)−1BTx . This is what we’ll use usually. This is basically the least squares solution for the problem Bλ≈x .
Special Special case : If the default bases as well as the bases B is orthonormal. In this case BTB=Im×m and thus λ=BTx and xU=BBTx.
Note that here BBT is not (always) equal to In×n .
Projection of affine spaces
To project a vector is to find the point in a subspace closest to that vector, viewed as a point. Thus, being able to project a point on any general affine space (hyperplane) enables us to do things like SVC and SVM .
The projection of a point x on a hyperplane x0+U is given by x0+(x−x0)U
Gram-Schmidt Orthogonalization
For a basis B={b1,b2…,bn} , define
u1=b1 and uk=bk−(bk)span(u1,u2,…,uk−1) for 1<k≤n .
The basis U={u1,u2,u3…,un} is an orthonormal basis.
Here (bk)span(u1,u2,u3…,uk−1) is the projection of bk on the span of the k−1 elements of U that we have already calculated.