That was a fairly long lesson. Hope you were able to stay with us. Maybe I will break longer ones into shorter lessons in future. So in this lesson, we look at audio processing and how we can leverage some of the concepts we talked about as a use case. So this figure shows the hearing thresholds with frequency on the x-axis and sound pressure level on y-axis. The conversational speech covers about 50 dB dynamic range or say about from 30 to 80 dB SPL depending how loud you talk. The music usually covers about 100 dB plus or minus dynamic range, going from 10 to 20 dB, almost up to 110, 120 dB. So, we need the noise to be 100,000 times lower than the signal, if we want a very clean representation. Because even a very light amount of noise, you're able to hear in the audio room. So the miracle 13dB demo based on psychoacoustics shows that you can actually have much worse signal-to-noise ratio than 100 dB if you play the game right with psycho acoustics. You won't even hear the noise. So 13dB means, between the signal and the noise, you have only about a factor of 20 in the voltage domain. [COUGH] Actually I saw this demo about 25 years ago and wanted to recreate it here for this course. So the first you will hear, a stereo original content and Jimmy. [MUSIC] And now you can hear the 13dB of flat noise goes in white noise added to each window in the signal, and you're supposed to hear lots of noise. Take and listen. [MUSIC] Did you hear the noise? Especially towards the end in the quiet regions. So this time, we take exactly the same amount of white noise in each window, but we shape it in such a way that it is right below the signal peaks, and 13dB below where the signal is. So the overall signal to noise ratio is exactly the same in what you just heard and what you're going to hear now. [MUSIC] Even if you put on headphones and listen to this very carefully, you should hear much less or no noise at all. Now, here's the interesting thing. If we take these two signals A and C where we did not hear the difference and look at the difference, you will see all that crud that we introduced in the signal with 13dB SNR. And you didn't hear in the other case but when you hear to it on its own, you can actually hear all the mess that we did with the signal. Let's listen to the 5D, Jimmy. [MUSIC] So there are two main ideas in taking advantage of this in the psychoacoustics. The figure on the left shows what we call simultaneous masking. We talked about critical bands and the work in AT&T in 1930s, that is the Fletcher's work. So, within a critical band, if you have a very strong component, that is called the masker. It will mask all the neighboring components in that critical band. And these are called the masked sums. And, also, if there is a loud explosion, like a boom or a symbol. It's as if the ears goes deaf for a few milliseconds. This is shown on the right hand picture. This is called post masking. When you have a loud sound, you don't hear anything for about 30 to 80 milliseconds and then you start hearing the other sounds. So you can put any garbage in that region or not code the signals use very few bits and still you will have very faithful quality. There is also this concept of pre-masking that has been found in psychoacoustics, but from signal processing perspective audio coding, I haven't seen much data rated actions, as far as I know. So for audio coding networks, this is how we take advantage of the psychoacoustic model. This is not an exact block diagram. You will find many different block diagrams in the literature. Actually I have an article by Painter and Spanius in the further reading section. You should take a look at that. But essentially, using our own diagram, the yellow block on the top we do a parallel, very height resolution time frequency decomposition and we use that to implement a psychoacoustic model, and the output of the psychoacoustic model are the masking curse. That's what really drives the quantization process and that's really where we get most of the data rate reductions in MP3 and ASE. And then we use Huffman coding after that to reduce the data rates further.