Matrices are 2D arrays of numbers, grouped as such to enable higher level operations, just like vectors. In fact the columns and rows in matrices are frequently thought of as vectors and constructed as such. They have strong roots in linear algebra, representing linear transformations and solving systems of linear equations.
Matrices in computer graphics, in particular 4×4 homogeneous matrices, are frequently used to represent transformations between different vector spaces. An important one is the projection matrix, used to define virtual cameras. A transformation is a mapping of one coordinate system into another, or finding the coordinates to a point in space from another perspective. For example finding a vertex position in pixels within a rendered image from its object space position in a triangle mesh.
This page introduces matrices used for linear transformations, beginning with rotations and relying heavily on knowledge of vectors, vector spaces, basis vectors and the scalar/dot product.
Operations
The matrix multiply is the most used operation and is summarized here. Each new element is the dot product of its position’s row in $A$ with its position’s column in $B$. The operation is non-commutative, i.e. $AB \not= BA$.
The transpose of a matrix flips it along the diagonal, i.e. $A_{x,y}$ becomes $A_{y,x}$:
When multiplying a vector by a matrix, it is implicitly transposed to match the matrix multiply operation. Also $\mathbf{v} \times A \equiv A^\top \mathbf{v}$.
Others, particularly the matrix inverse $A^{-1}$ are important, but beyond the scope of this page. The inverse of an orthonormal matrix (discussed later) is its transpose, which is often a particularly helpful shortcut for avoiding expensive computation.
Rotation Matrices
Multiplying a point by a rotation matrix computes its rotated coordinates. The original coordinates are in the point’s local space. Then visualizing from the perspective of the new coordinates, the original space is now rotated. Again, rather than imagining a sweeping animated rotation, think of this purely as computing the result — finding coordinates of points in a new space.
A simple example is a 2D 180 degree rotation, as shown below. The different spaces are visualized by drawing their axes, or the basis vectors.
The new coordinates $b$ for a point $a$ are simply $b=(-a_x, -a_y)$. This is easy to see, ignoring the path points follow during the rotation. $b$ is made from a combination of $a$, specifically $-1$ and $-1$ amounts of $a_x$ and $a_y$ respectively. This transformation can be written in matrix form:
This matrix can be seen as a 180 degree rotation matrix, but also one that reflects in both $x$ and $y$ and one that applies a uniform scale by $-1$.
Now for a more complex example with an arbitrary rotation. Below, a transform is applied to rotate a vector space $O$ to give $W$. Initially, $O$ is viewed as the frame of reference. Then with the basis vectors of $W$ are added, still relative to $O$, to provide the relation between the two spaces. Finally, $W$ becomes the frame of reference showing the now-rotated space $O$.
The basis vectors of $W$ in the space of $O$, $W_{x_O}$ and $W_{y_O}$ are known, discussed shortly. Scalar projection can then be used to find $p$ in $W$. The portion of $p$ along each vector $W_{x_O}$ and $W_{y_O}$ provides $p_{W_x}$ (shown) and $p_{W_y}$ respectively:
These dot products can be written as a single matrix multiply, with $X=W_{x_O}$ and $Y=W_{y_O}$. $\overrightarrow{OW}$ denotes a matrix to transform a point in $O$ to a point in $W$. Its inverse is the reverse: $\overrightarrow{OW}^{-1} = \overrightarrow{WO}$.
This hinges on knowing $W$’s basis vectors in $O$. These provide the relationship between the spaces and are the transformation, becoming vectors in the transformation matrix. To construct a rotation matrix which rotates by $\theta$ radians, the basis vectors are generated as follows by computing the Cartesian coordinates from polar coordinates. However this matrix needs to create basis vectors in $O$, i.e. that have been transformed by $\overrightarrow{WO}$, so $-\theta$ is used rather than $\theta$.
A purely rotational matrix is orthonormal, being orthographic, where all basis vectors are perpendicular to one another, and of unit length. It can be inverted by taking the transpose.
This introduction of 2D transformations with 2×2 matrix multiplies as dot products translates directly to 3D transforms with 3×3 as 3D vector dot products. Beyond rotation, matrices discussed in this way can also be used to represent scaling and shearing (shearing being less common in computer graphics). For translation, a new construct is needed — homogeneous matrices.
Homogeneous Matrices
Homogeneous (ho-mo-jee-nee-us) matrices are just regular matrices, but with values such as translation placed in specific locations so that the matrix multiply computes a certain result. To include this extra information an additional row and column is used making them 4×4 matrices for 3D transformations. Grouping lots of information into a single matrix avoids having to keep track of many different kinds of transforms, such as keeping the translation transform as a vector. Another important, and defining feature of matrices is that $A \times (B \times \mathbf{v}) \equiv (A \times B) \times \mathbf{v}$. This means many transformations can be consolidated into a single matrix which can then be applied to all the vertices of a model, rather than performing a matrix–vector multiply for every transform for each vertex.
Below are the three most common base transformation matrices in computer graphics, excluding projection matrices. The rotation matrix $R$ is as described above, placed into the top left of an otherwise 4×4 identity matrix. The right hand column forms the axis offsets in the translation matrix $T$. A scale matrix $S$ is created by setting the first three elements along the diagonal. To combine any transformations, simply multiply.
A 3D vector cannot be transformed by a 4×4 matrix. As discussed in vectors, an additional component is added, a 1 or a 0, making them 4D. Often this is done implicitly to avoid having to store the value. A vector $(x, y, z, 1)$ is a position vector, having the translational component of the matrix applied, while a vector $(x, y, z, 0)$ is a directional vector and is rotated and scaled only. The multiplication of a translation matrix with the former is as follows.
This shows how the translation component actually translates a vector. When combining transformations, such as a rotation or scale with a translation matrix, the column vector $(T_x, T_y, T_z, 1)$ ends up being transformed by the 3×3 component. In this way, all transformations accumulate and rotating a translated position is possible, rather than keeping a separate translation vector which would need to be updated manually when combining transformations.
Homogeneous matrices are also used to apply a perspective projection and provide non-linear depth values. See projection matrix.