Welcome again to the course on audio signal processing for music applications. This week we're talking about applications. We're talking about how to use the models we have been studying throughout the course for the application of sound transformation. So we aim at manipulating sounds and changing the different aspects of it. In the first demonstration class we exemplified the idea of morphing using the short time transfer. In the last class, we talked about time scaling, how to change the duration of the sound using the sinusoidal model. And in this class, I want to talk about pitch changes, how to change the frequencies of a sound. And we will use the harmonic plus stochastic model. So we'll be basically change pitch related information of harmonic sounds. In order to use the harmonic model we need to understand a little bit this sound. So for example we will start with this saxophone sound. Let's listen to this [MUSIC]. Okay, in order to define, especially the window size, we need to know the ranges of fundamental frequencies that are present here. So a good way to do that is to look at the spectrogram of the sound. And basically zoom in to the first harmonic so that we see basically the fundamental frequency which is the first line of this harmonic series. And kind of see which is the highest and lowest values in here. So it's better to use a bigger window size so we see a mold of refined line. And with this okay, we have a pretty good view of this line. And clearly the lowest sound would be this note here more or less. So that's around if we see let's say around 450 hertz and then the highest is this node here which is around 600 and something hertz. Okay, so this is good information for now defining the parameters of the harmonic plus stochastic model. So let's go to the SMS tools model GUI and let's go directly to the harmonic plus stochastic model. And we will open this same file, which is this sax phrase, the short version of it. Okay, now we have to choose the parameters. So the window, blackman is a good choice because it has a good low level sight lobes. And now in order to decide the window size, well it's good to basically go to terminal and from a Python we can just quickly do calculations. So for example we can just say, okay the blackman window has a six advancing the main node, we multiply by 44,100. And we said that the lowest frequency was around 400 something hertz, so in order to be safe, let's say okay, 400 hertz. So 400, and this is the window size that is appropriate for a frequency of 400, the lowest which is the meaningful one because it's the longest window that we will need. Okay so we will put as window size let's say 661 our size. FFT size let's make a big one so we have zero padding let's so 2048. The threshold it really doesn't need to be that low, but let's leave it. So we have a lot of harmonics there. The minimum duration of sinusoidal tracks 41 that's fine. The maximum number of harmonics. The maximum number of harmonics that there will be will be 44,100 divided by 400, okay that would be, if it had all the harmonics it's 110 but of course this is the lowest frequency and this is really if we would have harmonics all the way through. So 100 would be fine, then we need to define the range of the fundamental frequency so we can put the one we set. It was around 400 and the other was around 600 and something, so to be safe, let's say 650. This is the nearest threshold to identify the fundamental frequency. Maybe let's be a bit more flexible and put seven. And this deviation, that's fine like this, and the stochastic approximation for the residual, we don't need to be too smooth to do data compression. So let's say If we have 0.4, that should be okay. Okay, let's compute it now. Okay, so this is the result, we have the original signal, the analyzed, the harmonics plus the stochastics and the synthesized. Let's listen to the different components of it, the sinusoidal component. [MUSIC] It clearly captures most of the sound. Then let's listen to this stochastic. [SOUND] Well, it's very soft, but it's there so it's a relevent component. And of course, the sound of the tool. [MUSIC] Okay, so this is a good starting point to now run the transformation. So let's go to, let's quit this. Let's try to remember this parameters. And now we will go to the audit directory where we have that transformations interface. And let's type python and the transformation GUI. Okay, so this is the GY for the transformations. And let's go directly to the HPS model with the transformations. And well, it's already by default the sax phases here. So let's use the parameters that we use. If I remember, it was 661, we did FFT of 2048. The threshold was minus 100, minimum sine duration was that, number of harmonics 100, these minimum frequency was at 400 and maximum was 650. F0 detection, the F0 error threshold was seven and the stochastic factor we put 0.4. Okay now we can analyze And this we'll definitely do the same thing that we did before. So we can check that the analysis is correct. [MUSIC] And that's exactly the same sound that we heard before. So now we can start playing around with the transformations. And we have two Possibilities for changing the frequencies and one for changing the time. So for the time, we're not interested in changing the time so let's say the time as 0, 0, 1, 1. So that means that it's not changing anything. Okay, now in frequency scaling, we have two frequency transformations given that we are in a harmonic sound. We know where the harmonics are, and that's a great advantage compared with the sinusoidal models. In fact, these type of changes could be done with the sinusoidal model but of course then, we are restricted to some transformations. And for example the frequency stretching is not possible with the sinusoidal model because we don't know which sign should correspond to the which harmonic. Okay let just first maybe let's just use the scaling first so let's have here again without any transformation. So if you put 0111 that means that there is a frequency stretching of one so it means no where stretching at the beginning and at the end. And then in the frequency scaling let's start with by downloading or sort of decreasing the pitch of this sound. For example, 0.8 and so at time zero we will have 0.8 and at time one we'll have also 0.8, okay? And a very important parameter is this temper preservation. This temper preservation what it does is it tries to preserve the shape of the spectrum of the harmonics. If we put one, it preserves the harmonic shape. So it should sound more natural than if we put zero, in which zero would just transpose everything and so the magnitudes will be affected. So let's apply like this. Okay, so we have clearly transpose the harmonics down. So now they're closer together. So, let's listen to the result. [MUSIC] So it sounds quite natural even though we have transpose. Mainly because of this timbre preservation, we have maintain quite a bit this quality of the saxophone. And then just to finish, let's make some frequency stretching. So frequency stretching is kind of to convert a sound into an enharmonic type of spectrum in which we are adding an exponential factor to the harmonic value, let's say. So we have at time 01, let's say, let's start with one. And then at the end, let's stretch everything to let's say, 1.1. Okay, so we will have a stretching factor then, and not at the beginning so progressively the stretching will increase. So let's see what that does. Okay, so we see clearly here that the fundamental has remained the same. And the harmonics, while we still have maintained the scaling factor. So we are scaling everything down to 0.8 but the harmonics keep getting apart from each other more and more as the time goes on. And clearly at the end they are not equally spaced, so that's a, enharmonic spectrum. Let's listen to that. [MUSIC]. Okay, so clearly the low frequency is the same but as time progresses the sound sounds more enharmonic, kind of more metallic because the harmonics have been stretched. Of course we can do a lot of things. So feel free to play around with these parameters and of course with time scaling. Time scaling is also very powerful once we have been able to analyze the sound with the harmonic plus a stochastic model. That's all I wanted to say. So basically, we have talked about the idea of changing the pitch or the frequencies of a sound. First, we use SonicVisualiser to understand the sound. And then we use the SMS tools UI with the harmonic plus the stochastic model to change the pitch or the frequencies of a sound. And so we have been talking about pitch change. Of course, pitch change can be done with the sinusoidal model, can be done with the harmonic plus stochastic, or the sinusoidal plus stochastic. Or with quite a few of the models we have been talking about. And in Audacity also there is some implementations for that. So anyway, so we just presented a little bit of that, an example using the harmonic plus the stochastic and the potential for this type of transformations. So I hope you got an idea of that and now we'll have still another demonstration class and we'll be talking about the harmonic plus stochastic model. But in another type of possibility of transforming sounds which will be of morphing to sounds, interpolating the two representations of tools sounds. So I hope to see you all in next class. Bye-bye.