Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Matrix

Table of contents
  1. Introduction
    1. Diagonals
    2. Traces
    3. Norms
    4. Orthogonal
    5. Ranks
      1. Column and Row Ranks
      2. Matrix Ranks
    6. Tensors
    7. Matrix intuition
  2. Special matrices
  3. Matrix operations
    1. Transpose
    2. Matrix-Tensor addition
      1. Matrix-Scalar addition
      2. Matrix-Matrix addition
    3. Matrix-Tensor product
      1. Matrix-Scalar product
      2. Matrix-Vector product
      3. Matrix-Matrix product
  4. References

Objectives

1. Concepts

  • Matrix definition: Matrices, Tensors, and their applications
  • Zero Matrix
  • Identity Matrix

2. Operations

  • Transpose
  • Inverse
  • Matrix Arithmetics
  • Matrix-Tensor products

Introduction

A matrix, ARn×m\boldsymbol{A} \isin \mathbb{R}^{n\times m} is a two-dimensional array of scalars: [a1,1a1,2a1,ma2,1a2,2a2,man,1an,2an,m] \begin{bmatrix} a_{1,1} & a_{1,2} & \dots & a_{1,m} \\ a_{2,1} & a_{2,2} & \dots & a_{2,m} \\ \vdots & \vdots & \vdots & \vdots \\ a_{n,1} & a_{n,2} & \dots & a_{n,m} \end{bmatrix}

Diagonals

DONE

Every matrix has 2 diagonals:

  • Main diagonal (or principal diagonal, primary diagonal, leading diagonal, major diagonal) is the list of entries Ai,j(i=j)\boldsymbol{A}_{i,j}\left(\text{i=j}\right) while the other entries are zeroes.

[100010101] \begin{bmatrix} \textcolor{red}{1} & 0 & \dots & 0 \\ 0 & \textcolor{red}{1} & \dots & 0 \\ \vdots & \ddots & \textcolor{red}{1} & \vdots \\ 0 & \dots & \dots & \textcolor{red}{1} \end{bmatrix}

  • Antidiagonal (or Harrison diagonal, secondary diagonal, trailing diagonal), of a square matrix BRN×NB \isin \mathbb{R}^{N\times N}, is a collection of entries Bi,j:i+j=N+1,1i,jNB_{i,j}: i+j=N+1, \forall 1\leq i,j\leq N

[001010110] \begin{bmatrix} 0 & 0 & \dots & \textcolor{red}{1} \\ 0 & \dots & \textcolor{red}{1} & 0 \\ \vdots & \textcolor{red}{1} & \ddots & \vdots \\ \textcolor{red}{1} & \dots & \dots & 0 \end{bmatrix}

Traces

WIP

The trace of a square matrix ARn×n\boldsymbol{A}\isin\mathbb{R}^{n\times n}, denoted as tr(A)tr(\boldsymbol{A}) (or trAtr\boldsymbol{A}), is the sum of diagonal elements:

tr(A)=i=1nai,i tr(\boldsymbol{A}) = \sum_{i=1}^{n}{a_{i,i}}

Norms

WIP

Similar to Vectors, Matrices can also possess Norms, such as Frobenius norm.

AF=i=1mj=1nAi,j2=tr(ATA) \lVert A\rVert_F = \sqrt{\sum_{i=1}^m{\sum_{j=1}^n{A_{i,j}^2}}} = \sqrt{tr(\boldsymbol{A}^T\boldsymbol{A})}

Orthogonal

WIP

Just like Vectors, a square matrix ARn×n\boldsymbol{A}\isin\mathbb{R}^{n\times n} is orthogonal if all of its columns are orthogonal to each other and are normalized.

As a result,

ATA=I=AAT \boldsymbol{A}^T\boldsymbol{A} = \boldsymbol{I} = \boldsymbol{A}\boldsymbol{A}^T

note: The columns of A\boldsymbol{A} are referred to as being orthonormal (different from orthogonal)

In other words, the inverse of an orthogonal matrix is its transpose.


Question: Why A\boldsymbol{A} must be a square matrix
Answer:
Even if the columns of ARn×m,nm\boldsymbol{A}\isin\mathbb{R}^{n\times m}, n\neq m are orthonormal, I=ATAAAT \boldsymbol{I} = \boldsymbol{A}^T\boldsymbol{A} \neq \boldsymbol{A}\boldsymbol{A}^T Hence the term orthogonal is only used when A\boldsymbol{A} is square.

  • Operating on a vector with an orthogonal matrix will not change its Euclidean norm: Ax2=x2\lVert \boldsymbol{A}\boldsymbol{x}\rVert_2 = \lVert \boldsymbol{x}\rVert_2

Ranks

WIP

Column and Row Ranks

Suppose there is a matrix ARm×n\boldsymbol{A} \isin \mathbb{R}^{m\times n}:

Column rank
Let CC be the largest set of linearly independent column vectors of A\boldsymbol{A}
column rank=C\text{column rank} = \lVert C\rVert
Row rank
Let RR be the largest set of linearly independent row vectors of A\boldsymbol{A}
row rank=R\text{row rank} = \lVert R\rVert

(Kolter & Do, 2015)

Matrix Ranks

For any matrix ARn×m\boldsymbol{A}\isin\mathbb{R}^{n\times m}: column rank=row rank \text{column rank} = \text{row rank}

Hence, both ranks are collectively referred as the Rank of A\boldsymbol{A}, denoted as rank(A)rank(\boldsymbol{A})

Properties of A\boldsymbol{A}’s rank are:

  1. rank(A)min(m,n)rank(\boldsymbol{A}) \leq \min{(m,n)}
  2. Ais full rank    rank(A)=min(m,n)\boldsymbol{A} \quad \text{is full rank} \iff rank(\boldsymbol{A}) = \min{(m, n)}
  3. rank(A)=rank(AT)rank(\boldsymbol{A}) = rank(\boldsymbol{A}^T)
  4. rank(AB)min(rank(A),rank(B)BRm×prank(\boldsymbol{A}\boldsymbol{B})\leq\min{(rank(\boldsymbol{A}), rank(\boldsymbol{B})}\quad\forall \boldsymbol{B}\isin\mathbb{R}^{m\times p}
  5. rank(A+B)rank(A+B)BRn×mrank(\boldsymbol{A} + \boldsymbol{B})\leq rank(\boldsymbol{A} + \boldsymbol{B})\quad \forall \boldsymbol{B}\isin\mathbb{R}^{n\times m}

Tensors

Both of Vector and Matrix could be generalized as a Tensor, which is an algebraic object possessing an order or rank (the dimension of the array) (Linear Algebra for Deep Learning, 2021):

Tensor object Order (Rank) Sets Represent of
Scalars 0th order N,Z,Q,R,etc.\mathbb{N}, \mathbb{Z}, \mathbb{Q},\mathbb{R},\text{etc.} Magnitude
Vectors 1st order Rn,nN\mathbb{R}^n, \quad n\isin\mathbb{N} Direction and Magnitude
Matrices 2nd order Rn×m,n,mN\mathbb{R}^{n\times m}, \quad n,m\isin\mathbb{N} Linear map

Matrix intuition

WIP

Matrices are used as an encoder for geometric operations (e.g. rotations, reflections, and transformations, etc.)


Special matrices

WIP

Identity Matrix: denoted as I\boldsymbol{I}, of size n×nn\times n has only ‘1’ in the main diagonal. Ii,j={1i=j0ij \boldsymbol{I}_{i,j}=\begin{cases} 1 \quad i=j \\ 0 \quad i\neq j \end{cases}

Zero Matrix: denoted as O\boldsymbol{O}, has all of its matrix elements equal to ‘0’.

Symmetric Matrix: a square matrix which is A=AT\boldsymbol{A} = \boldsymbol{A}^T


Matrix operations

Transpose

Todo

The transpose of a matrix, written ATRn×m\boldsymbol{A}^T\isin\mathbb{R}^{n\times m}, has entries given by:

Ai,jT=Aj,i \boldsymbol{A}^T_{i,j} = \boldsymbol{A}_{j,i}

Properties of the transposes are:

  • (AT)T=A\left(\boldsymbol{A}^T\right)^T = \boldsymbol{A}
  • (AB)T=BTAT\left(\boldsymbol{A}\boldsymbol{B}\right)^T = \boldsymbol{B}^T\boldsymbol{A}^T
  • (A+B)T=AT+BT\left(\boldsymbol{A} + \boldsymbol{B}\right)^T = \boldsymbol{A}^T + \boldsymbol{B}^T

Matrix-Tensor addition

Matrix-Scalar addition

Todo

Matrix-Matrix addition

WIP

Suppose ARn×m,xR\boldsymbol{A}\isin\mathbb{R}^{n\times m}, x\isin\mathbb{R}, the addition between A\boldsymbol{A} and x\boldsymbol{x} is another matrix BRn×mB\isin\boldsymbol{R}^{n\times m}:

B=A+xwithbi,j=ai,j+x \boldsymbol{B} = \boldsymbol{A} + x \quad \text{with}\quad b_{i,j} = a_{i,j} + x

Although A\boldsymbol{A} and xx doesn’t share the same size, the addition is possible by the broadcasting mechanism.

Properties of the Matrix-Matrix addition includes:

  • Commutative
  • Associative

Matrix-Tensor product

Matrix-Scalar product

Todo

Matrix-Vector product

WIP

Suppose ARn×m,xRm\boldsymbol{A}\isin\mathbb{R}^{n\times m}, \boldsymbol{x}\isin\mathbb{R}^m, the product of Ax\boldsymbol{A}\boldsymbol{x} is a vector y\boldsymbol{y} - a same result by which could be described two different views:

Row viewpoint:y=Ax=[a1Ta2TanT]x=[a1Txa2TxanTx]Column viewpoint:y=Ax=[a1am][x1xm]=[a1]x1+[am]xm \begin{aligned} \text{Row viewpoint:} &\quad \boldsymbol{y}&=\boldsymbol{A}\boldsymbol{x}&= \begin{bmatrix} \text{---} & a_{1}^T & \text{---} \\ \text{---} & a_{2}^T & \text{---} \\ & \vdots & \\ \text{---} & a_{n}^T & \text{---} \\ \end{bmatrix} \boldsymbol{x} \\ &&&= \begin{bmatrix} \text{---} & a_{1}^T\boldsymbol{x} & \text{---} \\ \text{---} & a_{2}^T\boldsymbol{x} & \text{---} \\ & \vdots & \\ \text{---} & a_{n}^T\boldsymbol{x} & \text{---} \\ \end{bmatrix}\\ \text{Column viewpoint:} &\quad \boldsymbol{y}&=\boldsymbol{A}\boldsymbol{x}&= \begin{bmatrix} \mid & & \mid \\ a_{1} & \dots & a_{m}\\ \mid & & \mid \end{bmatrix} \begin{bmatrix} x_{1} \\ \vdots\\ x_{m} \end{bmatrix} \\ &&&= \begin{bmatrix} \mid \\ a_{1}\\ \mid \end{bmatrix} x_{1} + \dots \begin{bmatrix} \mid \\ a_{m}\\ \mid \end{bmatrix} x_{m} \end{aligned}

In the column form metioned above, the matrix y\boldsymbol{y} is a linear combination of the columns of A\boldsymbol{A}

Matrix-Matrix product

WIP

Suppose ARn×m,BRm×p\boldsymbol{A}\isin\mathbb{R}^{n\times m}, \boldsymbol{B}\isin\mathbb{R}^{m\times p}:

C=AB,withci,j=k=1mai,kbk,j \boldsymbol{C} = \boldsymbol{A}\boldsymbol{B}, \quad\text{with}\quad c_{i,j} = \sum_{k=1}^{m}{a_{i,k}b_{k,j}}

Properties of Matrix-Matrix product:

  • Non-commutative: ABBA\boldsymbol{A}\boldsymbol{B} \neq \boldsymbol{B}\boldsymbol{A}
  • Associative: (AB)C=A(BC)(\boldsymbol{A}\boldsymbol{B})\boldsymbol{C} = \boldsymbol{A}(\boldsymbol{B}\boldsymbol{C})
  • Distributive: (A+B)C=AB+AC(\boldsymbol{A}+\boldsymbol{B})\boldsymbol{C} = \boldsymbol{A}\boldsymbol{B} + \boldsymbol{A}\boldsymbol{C}
Question: Why Matrix-Matrix multiplication is not commutative?
Answer:

As aforementioned, Matrices represent linear map functions between two vector spaces. The idea is similar to the composition between functions, which is also not commutative:

fggff\circ g \neq g \circ f (Linear Algebra for Deep Learning, 2021)

(Computational Linear Algebra for Coders, 2021)

References

  1. Kolter, Z., & Do, C. (2015). Linear Algebra Review and Reference. http://cs229.stanford.edu/section/cs229-linalg.pdf
    [Online; accessed 2021-08-13]
  2. Linear Algebra for Deep Learning. (2021). https://www.quantstart.com/articles/scalars-vectors-matrices-and-tensors-linear-algebra-for-deep-learning-part-1/
    [Online; accessed 2021-08-13]
  3. Computational Linear Algebra for Coders. (2021). https://github.com/fastai/numerical-linear-algebra
    [Online; accessed 2021-08-13]