Matrix

Table of contents

Introduction
1. Diagonals
2. Traces
3. Norms
4. Orthogonal
5. Ranks
  1. Column and Row Ranks
  2. Matrix Ranks
6. Tensors
7. Matrix intuition
Special matrices
Matrix operations
References

Objectives

1. Concepts

Matrix definition: Matrices, Tensors, and their applications

Zero Matrix

Identity Matrix

2. Operations

Transpose

Inverse

Matrix Arithmetics

Matrix-Tensor products

Introduction

A matrix, $\boldsymbol{A} \isin \mathbb{R}^{n\times m}$ is a two-dimensional array of scalars: $\begin{bmatrix} a_{1,1} & a_{1,2} & \dots & a_{1,m} \\ a_{2,1} & a_{2,2} & \dots & a_{2,m} \\ \vdots & \vdots & \vdots & \vdots \\ a_{n,1} & a_{n,2} & \dots & a_{n,m} \end{bmatrix}$

Diagonals

DONE

Every matrix has 2 diagonals:

Main diagonal (or principal diagonal, primary diagonal, leading diagonal, major diagonal) is the list of entries $\boldsymbol{A}_{i,j}\left(\text{i=j}\right)$ while the other entries are zeroes.

$\begin{bmatrix} \textcolor{red}{1} & 0 & \dots & 0 \\ 0 & \textcolor{red}{1} & \dots & 0 \\ \vdots & \ddots & \textcolor{red}{1} & \vdots \\ 0 & \dots & \dots & \textcolor{red}{1} \end{bmatrix}$

Antidiagonal (or Harrison diagonal, secondary diagonal, trailing diagonal), of a square matrix $B \isin \mathbb{R}^{N\times N}$ , is a collection of entries $B_{i,j}: i+j=N+1, \forall 1\leq i,j\leq N$

$\begin{bmatrix} 0 & 0 & \dots & \textcolor{red}{1} \\ 0 & \dots & \textcolor{red}{1} & 0 \\ \vdots & \textcolor{red}{1} & \ddots & \vdots \\ \textcolor{red}{1} & \dots & \dots & 0 \end{bmatrix}$

Traces

WIP

The trace of a square matrix $\boldsymbol{A}\isin\mathbb{R}^{n\times n}$ , denoted as $tr(\boldsymbol{A})$ (or $tr\boldsymbol{A}$ ), is the sum of diagonal elements:

$tr(\boldsymbol{A}) = \sum_{i=1}^{n}{a_{i,i}}$

Norms

WIP

Similar to Vectors, Matrices can also possess Norms, such as Frobenius norm.

$\lVert A\rVert_F = \sqrt{\sum_{i=1}^m{\sum_{j=1}^n{A_{i,j}^2}}} = \sqrt{tr(\boldsymbol{A}^T\boldsymbol{A})}$

Orthogonal

WIP

Just like Vectors, a square matrix $\boldsymbol{A}\isin\mathbb{R}^{n\times n}$ is orthogonal if all of its columns are orthogonal to each other and are normalized.

As a result,

$\boldsymbol{A}^T\boldsymbol{A} = \boldsymbol{I} = \boldsymbol{A}\boldsymbol{A}^T$

note: The columns of $\boldsymbol{A}$ are referred to as being orthonormal (different from orthogonal)

In other words, the inverse of an orthogonal matrix is its transpose.

Question: Why

\boldsymbol{A}

must be a square matrix

Answer:
Even if the columns of

\boldsymbol{A}\isin\mathbb{R}^{n\times m}, n\neq m

are orthonormal,

\boldsymbol{I} = \boldsymbol{A}^T\boldsymbol{A} \neq \boldsymbol{A}\boldsymbol{A}^T

Hence the term orthogonal is only used when

\boldsymbol{A}

is square.

Operating on a vector with an orthogonal matrix will not change its Euclidean norm: $\lVert \boldsymbol{A}\boldsymbol{x}\rVert_2 = \lVert \boldsymbol{x}\rVert_2$

Ranks

WIP

Column and Row Ranks

Suppose there is a matrix $\boldsymbol{A} \isin \mathbb{R}^{m\times n}$ :

Column rank: Let $C$ be the largest set of linearly independent column vectors of $\boldsymbol{A}$; $\text{column rank} = \lVert C\rVert$
Row rank: Let $R$ be the largest set of linearly independent row vectors of $\boldsymbol{A}$; $\text{row rank} = \lVert R\rVert$

(Kolter & Do, 2015)

Matrix Ranks

For any matrix $\boldsymbol{A}\isin\mathbb{R}^{n\times m}$ : $\text{column rank} = \text{row rank}$

Hence, both ranks are collectively referred as the Rank of $\boldsymbol{A}$ , denoted as $rank(\boldsymbol{A})$

Properties of $\boldsymbol{A}$ ’s rank are:

$rank(\boldsymbol{A}) \leq \min{(m,n)}$
$\boldsymbol{A} \quad \text{is full rank} \iff rank(\boldsymbol{A}) = \min{(m, n)}$
$rank(\boldsymbol{A}) = rank(\boldsymbol{A}^T)$
$rank(\boldsymbol{A}\boldsymbol{B})\leq\min{(rank(\boldsymbol{A}), rank(\boldsymbol{B})}\quad\forall \boldsymbol{B}\isin\mathbb{R}^{m\times p}$
$rank(\boldsymbol{A} + \boldsymbol{B})\leq rank(\boldsymbol{A} + \boldsymbol{B})\quad \forall \boldsymbol{B}\isin\mathbb{R}^{n\times m}$

Tensors

Both of Vector and Matrix could be generalized as a Tensor, which is an algebraic object possessing an order or rank (the dimension of the array) (Linear Algebra for Deep Learning, 2021):

Tensor object	Order (Rank)	Sets	Represent of
Scalars	0th order	$\mathbb{N}, \mathbb{Z}, \mathbb{Q},\mathbb{R},\text{etc.}$	Magnitude
Vectors	1st order	$\mathbb{R}^n, \quad n\isin\mathbb{N}$	Direction and Magnitude
Matrices	2nd order	$\mathbb{R}^{n\times m}, \quad n,m\isin\mathbb{N}$	Linear map

Matrix intuition

WIP

Matrices are used as an encoder for geometric operations (e.g. rotations, reflections, and transformations, etc.)

Special matrices

WIP

Identity Matrix: denoted as $\boldsymbol{I}$ , of size $n\times n$ has only ‘1’ in the main diagonal. $\boldsymbol{I}_{i,j}=\begin{cases} 1 \quad i=j \\ 0 \quad i\neq j \end{cases}$

Zero Matrix: denoted as $\boldsymbol{O}$ , has all of its matrix elements equal to ‘0’.

Symmetric Matrix: a square matrix which is $\boldsymbol{A} = \boldsymbol{A}^T$

Matrix operations

Transpose

Todo

The transpose of a matrix, written $\boldsymbol{A}^T\isin\mathbb{R}^{n\times m}$ , has entries given by:

$\boldsymbol{A}^T_{i,j} = \boldsymbol{A}_{j,i}$

Properties of the transposes are:

$\left(\boldsymbol{A}^T\right)^T = \boldsymbol{A}$
$\left(\boldsymbol{A}\boldsymbol{B}\right)^T = \boldsymbol{B}^T\boldsymbol{A}^T$
$\left(\boldsymbol{A} + \boldsymbol{B}\right)^T = \boldsymbol{A}^T + \boldsymbol{B}^T$

Matrix-Tensor addition

Matrix-Scalar addition

Todo

Matrix-Matrix addition

WIP

Suppose $\boldsymbol{A}\isin\mathbb{R}^{n\times m}, x\isin\mathbb{R}$ , the addition between $\boldsymbol{A}$ and $\boldsymbol{x}$ is another matrix $B\isin\boldsymbol{R}^{n\times m}$ :

$\boldsymbol{B} = \boldsymbol{A} + x \quad \text{with}\quad b_{i,j} = a_{i,j} + x$

Although $\boldsymbol{A}$ and $x$ doesn’t share the same size, the addition is possible by the broadcasting mechanism.

Properties of the Matrix-Matrix addition includes:

Commutative
Associative

Matrix-Tensor product

Matrix-Scalar product

Todo

Matrix-Vector product

WIP

Suppose $\boldsymbol{A}\isin\mathbb{R}^{n\times m}, \boldsymbol{x}\isin\mathbb{R}^m$ , the product of $\boldsymbol{A}\boldsymbol{x}$ is a vector $\boldsymbol{y}$ - a same result by which could be described two different views:

$\begin{aligned} \text{Row viewpoint:} &\quad \boldsymbol{y}&=\boldsymbol{A}\boldsymbol{x}&= \begin{bmatrix} \text{---} & a_{1}^T & \text{---} \\ \text{---} & a_{2}^T & \text{---} \\ & \vdots & \\ \text{---} & a_{n}^T & \text{---} \\ \end{bmatrix} \boldsymbol{x} \\ &&&= \begin{bmatrix} \text{---} & a_{1}^T\boldsymbol{x} & \text{---} \\ \text{---} & a_{2}^T\boldsymbol{x} & \text{---} \\ & \vdots & \\ \text{---} & a_{n}^T\boldsymbol{x} & \text{---} \\ \end{bmatrix}\\ \text{Column viewpoint:} &\quad \boldsymbol{y}&=\boldsymbol{A}\boldsymbol{x}&= \begin{bmatrix} \mid & & \mid \\ a_{1} & \dots & a_{m}\\ \mid & & \mid \end{bmatrix} \begin{bmatrix} x_{1} \\ \vdots\\ x_{m} \end{bmatrix} \\ &&&= \begin{bmatrix} \mid \\ a_{1}\\ \mid \end{bmatrix} x_{1} + \dots \begin{bmatrix} \mid \\ a_{m}\\ \mid \end{bmatrix} x_{m} \end{aligned}$

In the column form metioned above, the matrix $\boldsymbol{y}$ is a linear combination of the columns of $\boldsymbol{A}$

Matrix-Matrix product

WIP

Suppose $\boldsymbol{A}\isin\mathbb{R}^{n\times m}, \boldsymbol{B}\isin\mathbb{R}^{m\times p}$ :

$\boldsymbol{C} = \boldsymbol{A}\boldsymbol{B}, \quad\text{with}\quad c_{i,j} = \sum_{k=1}^{m}{a_{i,k}b_{k,j}}$

Properties of Matrix-Matrix product:

Non-commutative: $\boldsymbol{A}\boldsymbol{B} \neq \boldsymbol{B}\boldsymbol{A}$
Associative: $(\boldsymbol{A}\boldsymbol{B})\boldsymbol{C} = \boldsymbol{A}(\boldsymbol{B}\boldsymbol{C})$
Distributive: $(\boldsymbol{A}+\boldsymbol{B})\boldsymbol{C} = \boldsymbol{A}\boldsymbol{B} + \boldsymbol{A}\boldsymbol{C}$

Question: Why Matrix-Matrix multiplication is not commutative?

Answer:

As aforementioned, Matrices represent linear map functions between two vector spaces. The idea is similar to the composition between functions, which is also not commutative:

f\circ g \neq g \circ f

(Linear Algebra for Deep Learning, 2021)

(Computational Linear Algebra for Coders, 2021)

References

Kolter, Z., & Do, C. (2015). Linear Algebra Review and Reference. http://cs229.stanford.edu/section/cs229-linalg.pdf
[Online; accessed 2021-08-13]

LINK
Linear Algebra for Deep Learning. (2021). https://www.quantstart.com/articles/scalars-vectors-matrices-and-tensors-linear-algebra-for-deep-learning-part-1/
[Online; accessed 2021-08-13]

LINK
Computational Linear Algebra for Coders. (2021). https://github.com/fastai/numerical-linear-algebra
[Online; accessed 2021-08-13]

LINK