Now that we have, the n x n blocks for each one of the plains, the YCbC, we are ready to go into the forward transform. So the first thing we need to understand is why are we going to do a forward transform. Let us explain first of all how do we measure the error that we create in images when we're doing compression if we're doing a lossy compression. And the basic idea is that error is measure with what's called the mean squared error, MSE, or mean square error. And it's computed as following. The mean squared error. Okay. MSE = 1 / the number of pixels. That just a normalization. We sum overall the pixels and what we do is we basic compute the square of the error. We have, we're going to have the reconstructed image, let's call it f, reconstruct it minus the original image squared. We go over every single pixel. So we're going to do f. So let's say the, the original f was 100 and we reconstructed 98, so we have the difference is 2 we square it, it's 4. And we do that for every single pixel. We sum that, we normalize that by the number of pixels, and most of the time, because we have square here, we take a square root there. So it's called a root mean square error. That's how we measure the error. Now, why are we going to do a transform? Let's imagine that we take, remember, we are talking n x n blocks now. In general, we're talking 8 x 8, but we're talking n x n. So we have n x n pieces, because we're working on each one of the blocks independently. Imagine that, although, these are n x n, 64 pixels, I want to transmit only one. I'm going to just give you the option to transmit only one of them. So, if we take just a regular image, and we transmit only one, we are going to get a tremendous error. But, we might be able to do a transform into a different 8 x 8, the transform has to be reversable, I should be able to go back. And we do this transform, we have a new 8 x 8, and maybe in that transform domain, we can take only the first one and still do a reasonable job. It turns out that there is a transform that allow us to do that and that's called the Karhunen-Loeve transform. [SOUND] That transformer has something very particular. You go into that transform domain, I'm going to write down in a second what do I mean by a transform? But think is nothing else than a multiplication by a matrix, nothing else than that. So I have my 8 x 8. I do something. I go into a new 8 x 8 block or 8 x 8 image. And then, if what I did is a Karhunen-Loeve transform, that's also sometimes written as KLT, Karhunen-Loeve transform. If I'm in that domain, if I take the first of those, just the first out of the 64 and I compute the mean square error, remember, I basically did a reconstruction with only one. So I take my image, I do a Karhunen-Loeve transform. I remove 63 of the coefficients. I come back and I compute the mean square error. I got the smallest possible mean square error I could have gotten with only one basically coefficient, just transmitting only one. If I want three, I can do actually exactly the same. ANd the interesting part with the Karhunen-Loeve transform is not only that with one I did the best error, but actually that one is the first one. So you go, you have 64, you're gong to a new 64. You pick the first one, that's the best you can do if you're only allowed to use one coefficient. If you're allowed to, to use three, you pick the first three. So it's a great transform, but it has only one major problem. The problem of this transform is that it's image dependent. I don't know how to do it. So I told you, it's going to be a matrix multiplication, but I don't know the coefficients of the matrix until I don't see the image. You give me the image or you give me some information about the image, then I will be able to compute the coefficients of that matrix. And that's not very practical, it's very slow, and also, I cannot do things on the fly as things come. So to replace that we count what's called a discrete cosine transform, which is what is actually used in JPEG. So let's just right down the equations for the discrete cosine transform and we are going to understand a bit more of what a transform is. But let's keep in our minds that what we actually want is a Karhunen-Loeve transform. What the Karhunen-Loeve is doing is decorrealating the image, it's putting a lot of information the first coefficient, and a bit more of information in the second, which is independent of the first and so on in such a way, that if we want to compute the means squared error, we are optimal. But we're going to be suboptimal with the discrete cosine transform, but it's going to have a fixed matrix, fixed coefficient, so it's universal, we're going to be able to use for every image without having to recompute it. So what is a transform? The idea is very simple. We have an image and we can call this image, let's say f(x, y) and this is a transform. We're going to go into a new domain T(u, v). Okay? And those of you that are familiar with Fourier, Fourier is one of the transform. So we're going to sum overall the pixels in our image. Remember, we basically have n. So we are going to go from 0 to n - 1. We're going to go from 0 to n - 1 and we're going to take our image f and multiply by transform coefficient. This is what will become, I have to give you the formulas for what are these numbers for DCT, for the discrete cosine transform. This will be completely image dependent for the Karhunen-Loeve. We not only depend on the positions we are, it will actually depend on the actual gray values on the actual statistics of your image. So this is a transforms. I sum over x and y, which means that x and y will, let's say, disappear after my summation and I get a new image, which is again 8 x 8 or n x n. So this is n x n and it just in a transform domain and you can write this in matrix notation. Now, of course, a transform is only useful if you actually can invert it. It won't be very nice that you give me an image. I go into a transform domain that then visually I don't understand what's going on there. So, there's also an inverse formula so I can get back f(x, y), it will be the sum from u = zero to n - 1, the sum, from v = 0 to n - 1 of now my transform coefficients, T(u, v). And then the inverse of the transform, which we are going to write sometimes is equal to r sometimes is not, (x, y, u, v). So, I take an image, I do this operation, I get a new image in a transform domain, and then I go and invert it. That's a transform. So, to specify the transform, I basically have to tell you, what are these guys? What's r and what's s? And once again, it's not hard to see and you can just do this as an optional homework to write these in the matrix notation. It's not, not hard at all. So, I need to specify these two guys, these two functions, r and s, which as I say ideally to minimize the mean square to be optimal, we would like it to be Karhunen-Loeve transform but we can't, so we are going to use the DCT, which I'm going to write next. So what's the DCT? In the DCT, r(x, y, u, v) is actually equal to s(x, y, u, v). And this is actually equal to some normalization coefficient alpha (u) alpha (v) this v and this v are the same. I, I apologize there for just writing two different vs. Cosine of the following, [(2x + 1) u pi / 2 * n], this is the size. Times cosine, exactly the same, but for the other coordinate. 2y, [(2y + 1) v pi / 2n]. That's a general formulation. And I have to basically write for you what's this, it's just a normalization. It's very common in this type of transforms, that the square root of 1 / n for u = 0 and the square root of 2 / n for u, different than 0. So here it is. I just gave you the formulas for DCT or the discrete cosine transform. Those that are familiar with Fourier, this is basically the, the, basically the, the real part of a Fourier transform. It's, it's almost that part. So, once again, the forward transform and the backwards transform are exactly the same formulas. Very nice, because you need to build one piece of software or one piece of hardware to do both the forward and the backward and then it's just a cosine transform. Okay? So it's the cosine here and the cosine here and these are just normalization coefficients. This is the discrete cosine transform. Okay? Let's just see it. What, I want to see, I want to show you these functions. These are the matrices that will multiply your image as we have seen in the previous slide, in order to get the transform. So here, you see actually, in this case just to make our illustration easy, we use n = 4. So, these are, each one of the bases, if I go back to the formulas and basically, I put u 0, = 0, I get a cosine which is a function of only x and y. The formulas were functions of u, v, x and y, but now I fix u and v, I make both of them 0, I get a formula of x and y. I get actually a 4 x 4 image which would be this. If I basically fix v = 1 and u = 0, I get this and so on. So basically this r and s are nothing else than what's called basis functions. And what we're trying to do is we're trying to decompose our small n x n image into the linear combination of this basis functions. So if your image is flat, it's going to have your T(u, v) is going to only be known 0 for this. If your image has a lot of variations, you're going to have coefficients also for T(3, 3). So each one of the T(u, V) tells you how much of this type of component you have. Okay? So basically you have represented your n x n image into a linear combination of this type of images. The important part is that these images are constant. Again, in the Karhunen-Loeve, they won't be, and that's very difficult to have a very efficient how our implementation to be able to do that in real time. Here, this basis is fixed. You give me an image, I'm going to represent it as a linear combination of this basis. And T(u, v) tells me how much of each one of these I have in my image, in my timing, n x n image. So, that's a transform. Basically, nothing else than going into coefficient that tell us how much that each one of these components. So I decompose my image into these components which are kind of frequency components. You see that there is more variation as I increase v and as I increase u in the vertical or horizontal direction. There's one thing I haven't told you. I say, okay, I cant do Karhunen-Loeve transform, I cannot compute bases for a Karhunen-Loeve because its too expensive, then I go and do a discrete cosine transform. Why a discrete cosine transform? Why not a Fourier transform or any other transform? How the mark transform? There are many transforms out there. So there is a couple of reasons why we use the DCT. One of them is that, although I want to do a Karhunen-Loeve transform, I need to do a DCT. It turns out that the DCT is for particular cases, actually exactly equal to the Karhunen-Loeve transform. Those particular cases are when the images what's called Markovian. So a pixel depends basically on the pixel next to it in a very particular form, and again, the pixel next to the pixel next to it. So those are images that are called Markovian. So if you assume something about your image, you can show that the Karhunen-Loeve transform ends up being the DCT. So it, at the end, it is a constant basis like we see here. That's one reason and to basically decide to use the DCT. The other reason that we use the DCT is because of the following. Remember that I'm working from the whole image and basically having n x n blocks. Okay? Someone may wonder why not to use for example Fourier transform? If we go back into the theory of Fourier transform, when you do a discrete Fourier transform, you're making an underlying assumption of periodicity. So you are basically assuming that your image repeats itself as we can see here. So this comes back here and this comes back here. Okay? So there is a repetition of your image. That's the underlying mathematical assumption to be able to formally do a Fourier transform of this 8 x 8 block. And here, I'm drawing just one line of the 8 x 8 block, but the same assumption basically goes in the other direction and then in the two-dimensional. Now this this basically means that when your block ends, you're expecting the pixel in the next block to basically be identical to the pixel that it was here. Okay? So you expect this block to basically repeats here and repeats here. That's your underlying assumption. It's not very probable that the pixel that was here is the same pixel value that was here, not even close in an 8 x 8 block maybe. On the other hand, DCT assumes a different type of periodicity. Basically DCT assumes that at the boundary, you have mirror symmetry. So basically, your image basically repeats, but it kind of false. So, the assumption says, okay, I'm assuming that the pixel here is similar to the pixel here right next to it. Note that the pixel is identical to the one that comes 8 pixels later, but just the pixel right next to it and that's a much more reasonable assumption. So, those are two of the main reasons why we use a DCT. One is because for certain types of images the DCT is the Karhunen-Loeve. The other is because the periodicity, the type of the periodicity assumption is much more applicable to images where we started by this n x n or 8 x 8 blocks. So now, we have the DCT. We started with an image. We divided n x n blocks. We do a transform into a discrete cosine transform, where every coefficient tells us how much of this basis are we using. I keep talking about 8 x 8. How we came to 8 x 8? Actually there was a lot of studies before jpeg was defined to get 8 x 8, but let me just illustrate what happens here. This is an image that basically, we took the whole image. In this case, we don't divide. This is already a block of that image, Lina, that we talked before. We did a discrete cosine transform of this entire block and we took 25% of the coefficient. So we took basically only 1/4 of the coefficients, and we made everybody else 0, and then we did the inverse, and that's the result that we got. We have introduced error and remember, basically, the DCT is, what it's going to try to do is going to try to bring us into a domain that introduce in error, basically killing some of the coefficients making them 0 or making errors on them, won't be as noticeable. But in this case, we did it very simple. We just took 25% of the DCT coefficients. Which ones is going very clear soon? But just imagine that we took the, the 1/4 of the basically largest coefficients. Now, first we did it for the whole image. Here, we actually took only 2 x 2 blocks. So we went here and we divided in 2 x 2 blocks and we need a DCT of every 2 x 2 and we took 25%.. So be is, basically ended with only one and we came back. And you see this very, very annoying blocking artifact. Then we did the same for 4 x 4 and we do the same for 8 x 8, And we see that basically its gets better and better. So, you might wonder, okay, 2 x 2 is too little, but why to stop at 8 x 8? Why not to go to 16 x 16 and 32 x 32? And, and you know, why not to just do the DCT for the entire image? There's a couple of reasons for that. One is computational, the, basically it's cheaper to do multiple small DCTs than one large DCT. So if you have a, let's say a 16 x 16 image, I'd rather do four 8 x 8 DCTs than doing just one 16 x 16 DCT. That's one of the reasons. Another reason is this magic approximation of DCT to the Karhunen-Loeve. I say that happens under certain conditions, a Markovian condition. If you're in a small region at 8 x 8 block, that condition, that Markovian condition will probably apply. But if you are on a very large image, the whole image won't have a Markovian condition. So basically, in small regions you expect that assumption to hold, but it doesn't hold in large regions and it's not hard to see that. If you're just, let's say, looking at me right now, you see that there's a lot of correlation here,, more or less the same gray values. But if you actually start moving away and you say, is there a correlation between the pixel value here and the pixel value here that they are very far away? There's actually no correlation. I can actually change my shirt and without prior information. So there is no correlation when you get far way in your image, pictures that are far away, basically they start being decorrelated. So you need to get to a compromise. On one side you want very small blocks because you want to be computation very efficient. on, on the other hand basically you don't want too small blocks, because you get into, into this problem of not having enough information in the block, enough correlation to be able to decorrelate with the transform. And if you get to two large blocks, you start paying computations, and you start paying that you violated the assumptions for which the DCT is good. So 8 x 8 is a good that I say was, was, was significantly studied and it got into a very, very good compromised and that's why it used today for JPEG. So, to conclude this part of the DCT, I just want to show you once again this image, Lina. And these are just two different cases, but don't worry about that for the moment. Here, basically we took 8 x 8 plots, we did a DCT, and we kept 12.5% of their coefficients, and then we came back. Not really quantization, which is going to be our next step, just a DCT and the inverse DCT. And it looks pretty good than is, it just when we just look for the first time at the image, we actually have achieved compression, because we only kept 12.5% of the pixels and we got an image which looks almost identical to the original one and that's because we did that in the DCT domain. If we were to actually flow basically a lot of pixels here and keep only 12.5%,. we will not be able to get this quality of the image. This is just the error between the original and the reconstructed. And as you see, it's relatively small error. So this is pretty good compression by nothing else than doing a DCT, basically removing a lot of coefficients and coming back. So now, we have the front end and we have the back end. What's missing is the middle. I'm in the DCT domain, can I achieve even more compression by doing quantization in a smart fashion? And that's going to be the subject of our next video clip.