The math behind transforms
In my previous post I talked about the transforms involved in rotating a view around an external point but I also said that you don’t need to understand matrices to work with transforms. If you didn’t accept that and still wanted to know how matrices make transforms work then this post is for you.
This post does not aim to cover all of linear algebra, just enough to understand transforms.
We have objects on our screen that are visually defined by the four corners of the rectangle they lie within. Even objects that are not rectangles themselves can be contained by a rectangle. We know that applying a transform to this object on screen causes it to change its position, size or rotation on screen. The transform is used to calculate the new positions of the four corners and everything inside the rectangle stretches to fill the rectangle just as before the transform. Our goal is to understand how the new positions are being calculated for different kinds of transforms.
Math, a lot of math
Don’t let that section title scare you. You came here to learn, remember?
Math in more than one dimension
In “normal” math, that everyone should be familiar with, we have numbers. Sometimes when we want to make calculations but don’t know all the numbers beforehand we substitute them with letters and are able to make calculations with those letters instead. If we drew a line we could represent these numbers on that line as the distance to the right of “origo”, the point representing zero. The “to the right”-part enables us to represent negative numbers by placing them left of origo.
Addition of two numbers can be illustrated as drawing an arrow with the length of the first number from origo and another arrow with the length of the second number beginning at the end of the first arrow. The sum of the two numbers is the arrow that goes from origo to the end of the second arrow. This can be seen in the image below (with a = 2 and b = 3).
The same works for negative numbers (for example a = 4 and b = –6) as can be seen below.
In a similar fashion we can represent multiplication as taking either of the arrows and adding it to the end of itself the same number of times as the length of the other arrow.
It gets a bit silly drawing numbers like this when we all know how to add simple numbers so let’s take things one step further. If we call the line that we drew arrows along an axis and we add another axis perpendicular to the one we already had then we get a plane. Any point on this plane can now be defined by two coordinates, one for each axis. We call the horizontal axis the x-axis and the vertical axis the y-axis. Now we can describe any point on this plane by its x and y value. This plane is just like the screen of our devices.
The arrow in this plane is unlike the arrows along the line above since they require two numbers to describe instead of one. We call these new arrow vectors and the old arrows scalars. If we want to we can draw the same scalars as we did before along the x-axis but when that arrow is represented in this plane it is no longer a scalar. Instead it is a vector with a y-coordinate of 0.
Just as we did with the scalars we can represent the addition of two vectors as placing either of them at the point of the other. The result of the addition is the vector that points to the same point as the second vector now does. You may notice that vector addition works by adding the two x components together into a new x component and the y components together into a new y component. It works in both directions, adding either vector to the other, you can try it on a piece of paper if you want to.
Adding one vector to itself over and over is just like the multiplication we did with scalars. In a plane like this we call it scalar multiplication since we are multiplying our vector with a scalar.
Multiplying a vector with another vector gets a little bit trickier to wrap our heads around. What does it really mean to multiply a vector with another vector? Considering that vector addition added the two x and y components it shouldn’t surprise us that the two x and y components are multiplied. What likely is a surprise however, is that the result of these multiplications in then added into a scalar. That is right, the multiplication of two vectors is a scalar. This kind of multiplication is most often called the “dot product“ since a dot is used for the multiplication sign.
There are other very powerful operations that we can do with vectors like calculating the cross product. While this is a very important part of linear algebra and 3D computer graphics, it is not necessary for understanding transforms so we will skip it for this article.
We build upon our 2D plane by adding another axis that is perpendicular to both axes and call it the z-axis. We now have a full Cartesian coordinate system that can be used to represent any point in 3D space using its x, y and z coordinates.
Unlike last time we added a dimension to our space we still call the arrow that points to a coordinate in space a vector. Sometimes for differentiation we call these vectors 3D vectors, but a vector can really have any amount of values as long as they are in a single row or single column but never both.
If a vector were to have both rows and columns it wouldn’t be a vector any more. It would be a matrix. Matrices are as far as we will go here. In other scientific areas there are things with one more dimension than a matrix, called a tensor.
There is no good way of visually representing a matrix in a plane or space like we did for a vector so we will skip that for now. Instead let us focus on how they add and multiply.
Addition of two matrices is simply a new matrix with every value being the addition of the corresponding values for that row and column in the original two matrices. The only interesting thing to note is that two matrices can only be added together of they have the same size, i.e. the same number of rows and columns.
Matrix multiplication on the other hand is where things get interesting. In matrix multiplication the values of a row in the first matrix (hereafter called MA) is multiplied with the values of a column in the other matrix (hereafter called MB). Just like with the dot product (vector multiplication) this produces a scalar value which is the value for that specific row and column in the resulting matrix. Since the multiplication is done for every row, column pair of the two matrices the resulting matrix will have the same number of rows as MA and the same number of columns as MB. Important to note is that order matters. MA × MB ≠ MB × MA.
The best way to understand matrix multiplication is to start with the empty result and fill in the value for each row and column one at a time. The resulting value for the first row and first column is the same as the dot product of the first row of MA and the first column of MB. Since we are calculating the dot product of MA’s rows and MB’s columns they need to have the same number of elements, i.e. MA needs to have the same number of columns and MB has rows. (No, that was not a typo. The number of elements in each row is the same as the number of columns and vice versa.)
If we think of a vector as a matrix with only one column then we can multiply a matrix with that vector and transform it into a new vector. This means that we have a mathematical way of transforming the points on our screen into other points.
The identity matrix
Before actually changing the points of our view, let us figure out how we can multiply a matrix with a vector and have it be transformed into the exact same vector. We could just multiply it with the scalar 1 but we really want to use a matrix for this. For each row-column combination in the matrix we put a value at i, j where i is the row and j is the column. Since our vector has three rows we need to have three columns and since we want the result to have three rows we need to have three rows as well leaving us with a matrix with three rows and three columns, often referred to as a 3×3 matrix.
This matrix multiplication can also be described by these three equations.
|xnew||=||M1,1 ⋅ x||+||M1,2 ⋅ y||+||M1,3 ⋅ z|
|ynew||=||M2,1 ⋅ x||+||M2,2 ⋅ y||+||M2,3 ⋅ z|
|znew||=||M3,1 ⋅ x||+||M3,2 ⋅ y||+||M3,3 ⋅ z|
We quickly see that the values on the diagonal, where i is equal to j, are ones and that all other values are zeroes. This special matrix is called the identity matrix and is often represented as the letter “I”. Now that we know how to construct a matrix that doesn’t modify the multiplied vector we can start looking at matrixes that do.
There are three basic kinds of transforms: scaling, translating and rotating. No matter which one we are talking about we can look at the transformed x, y, and z values separately by looking at each row of the matrix.
Scaling a rectangle is very easy to describe: the rectangle changes its size by the scale factor without moving or losing its aspect ratio. This means that every corner increases its distance to the center without changing the angle. For this to happen, all components of the vector must increase by the same factor, the scale factor. If we were to make the rectangle twice as big we would want the resulting vector have all it’s x, y and z values twice that of the original vector.
Going back to our three equations from above we see that all the values along the diagonal are now twos instead of ones (the zeroes are still zeroes). If we put different values along the diagonal (for example 2 for the first row and 1 for the other) we will see that the rectangle stretches and loses its aspect ratio, a fun scaling effect and a perfectly valid scaling transform along that axis.
No matter how we change the values along the diagonal the rectangle never moves so let’s look at how to move the rectangle.
One way of moving the rectangle would be to add a vector to all four points defining the corners which would create new vectors pointing to the new points. This works just fine but we really want to make it work with matrices because matrices can be multiplied with each other which will later prove to be very powerful. Looking at the equations for our 3×3 matrix we eventually realize that something is missing. We only have a means of specifying x, y and z multiplicands but we have no way to use a constant.
So we add a constant for each of the three equations and see how that affects the matrix. Since we now have four elements being added in the equation we must have four columns in the matrix and thus also four rows in our vector.
Don’t run off being scared that we introduced a fourth dimension or something. The fourth value of our vector is quite harmless, it’s 1. Our vector is now (x, y, z, 1). The fourth value isn’t used to represent our point in 3D space, it’s only used for matrix multiplication. We are not quite finished yet. Our resulting vector would lose its fourth row unless we made sure that the matrix also had a fourth row.
Now we have a matrix of ones along the diagonal (we still want the fourth value of the vector to keeps its value after the multiplication) and three constants along the very right edge of the matrix. These three constants reflect the translation along each of the three axis.
|x + Cx|
|y + Cy|
|z + Cz|
Let’s revisit scaling to ensure that it still works for us with our new 4×4 matrix.
We don’t want to move when we are scaling so the rightmost column is set to zeroes (except for the diagonal). We quickly realize that scaling still works with our new 4×4 matrix. This means that it’s time to go on to rotation.
Just like we could scale along only one axis we can also rotate around only one axis. In fact, we most often do. If you ask someone to describe a rotation on a screen without telling them which axis they will probably describe a flat rotation like the hands of a clock does. What axis would that be?
A neat trick to figure out rotation along an axis is to take your right hand and curl your fingers without closing your hand and then point the thumb straight up. If you now point your thumb along the axis you are rotating along and turn your wrist your fingers will curve in the direction of the rotation. Alternatively you could place your hand so that your fingers curve like the rotation you had in mind to figure out the axis. You may notice that a clockwise and counter-clockwise rotations is done in the negative or positive direction along that axis.
A 2D rotation
The common flat rotation is done around the z-axis so z-values for the rotated points will remain unchanged but x- and -values may change. I said “may change” since a 360º rotation around any angle takes us to the same point as before.
To figure out how the x and y values change in a rotation around the z-axis we look at the two vectors (1,0,0) and (0,1,0). If we draw a circle in the center of our x,y-plane with the same radius as the distance to our points, we expect the points to move along the edge of this circle. We can easily imagine a counter-clockwise rotation of θ for both of these vectors and draw two new vectors that point to our expected end result. Basic trigonometry (sine and cosine) helps us express how the new points relate to the old points.
The transformed x-only-vector gives the values for the first column of our rotation matrix and the transformed y-only-vector gives us the values for the second column of our rotation matrix. The third or fourth columns don’t alter the transformed vector so these are the same as for an identity matrix. The resulting rotation matrix is thus:
|cos θ||-sin θ||0||0|
|sin θ||cos θ||0||0|
|cos θ⋅x||-||sin θ⋅y|
|sin θ⋅x||+||cos θ⋅y|
To verify that this matrix works for a vector with both x and y components is left as an exercise for the reader. Pick a new vector with both x and y components and use the above matrix to calculate the rotated vector. Finally draw the rotated vector in a 2D plane to that the end result meets our expectations.
3D rotations and perspective
By applying the same techniques to rotations around the x-axis and y-axis we can figure out their rotation transforms (seen below).
|0||cos θ||-sin θ||0|
|0||sin θ||cos θ||0|
|cos θ||0||sin θ||0|
|-sin θ||0||cos θ||0|
|cos θ||-sin θ||0||0|
|sin θ||cos θ||0||0|
While the point is correctly transformed in 3D space it doesn’t look like a 3D rotation at all. This is because the 3D point is projected to the 2D screen without perspective. If you would go back and scale or translate the z-value you would experience the same problem (though there you would see no difference at all). This is not how we expect 3D objects to look. We expect objects far away to appear smaller and objects up close to appear bigger.
It turns out that computer graphics has one more trick up its sleeve. We always make sure that the fourth value of our vector is 1. If it isn’t then we divide the whole vector with that value so that it becomes 1 (we use scalar division so every value is divided individually). That means that the transformation matrix with 1, 1, 1, 2 on the diagonal will scale x,y,z by 0.5 (since it’s 1/2). To create the illusion of perspective we want to get a larger-than-one fourth value of our vector for distant points and a smaller-than-one value for nearby points. To achieve that we want some constant value in the third row and fourth column of our transformation matrix, since it’s going to be multiplied with our z-value. What constant value? The short answer is: a value that makes the perspective look good.
One way of thinking about it is that the views we are transforming are a few hundred points wide/high so a rotation is going to cause the far-off points to be a few hundred points away from us. Since a change from 1 to 2 halved the size of the view on screen and we are talking about c⋅z where z is a few hundred we probably want c to be 1/a few hundred. To no surprise a typical value for this constant could be –1/500. The minus sign comes from he fact that the z-axis points into the screen instead of out of the screen.
Another way of thinking about this is to image that the screen is a certain distance, d, from where we are looking (abstractly speaking). The distance is in some made up unit which is not at all related to how far our eyes are from the screen in real life. Since the screen is two dimensional everything needs to be projected onto that screen. The point we are looking from and the point we are looking at stays fixed in 3D space but we can decide how far off the screen is. If we move the screen closer to us then we get more perspective and if we move it away from us then we get less perspective.
The constant for our perspective depends on the distance to our screen as –1/d. This is just the same as we had before. It is only another explanation of what that value means. A greater denumerator means a greater distance to the screen which means less perspective. As mentioned above, it turns out that 500 units is a suitable distance to the “screen”.
Combining multiple transforms
One of the truly powerful things with transformation matrices is how they can be applied one after another and how they can be combined (also known as concatenated). You may remember from the previous post that the order of the transformations matter. Rotating and then translating is not the same as translating and then rotating. By now this should sound very familiar to you. Matrix multiplication works the exact way and a transform is just a matrix, remember. It turns out that you can take two transformation matrixes and multiply them and you will get a new transformation matrix that describes the total transformation in a single multiplication.
Lets take an example. We want to translate, then rotate and then translate the four corners of a small view (the same transform we did in the previous post). The resulting matrix of the multiplication is
|cos θ||-sin θ||0||c⋅cos θ - c|
|cos θ||sin θ||0||c⋅sin θ|
Given some arbitrary angle, translation distance and corners we can calculate the new points for our corners after the transformation. By drawing the new corners in our coordinate system we can see that the rectangle ends up where we expect it to. Inputting values into the matrix and drawing the transformed corners are left as an exercise for the reader.
In just the same way we can take any number of transforms and multiply them in the order they should be applied to pre-calculate the one transformation matrix that encapsulates the total transformation of all the others.
Thanks Richard Turton for correcting my English.
After reading all of that I hope that you no longer feel that transforms are little pieces of black magic being applied to your views. If you have any feedback, comments or corrections I would love to hear them. I’m @davidronnqvist in Twitter and @ronnqvist on ADN.