0:00

Welcome back to the course on Audio Signal Processing for Music Applications.

In the previous demonstration class,

we talked about the harmonic plus residual model.

We actually analyze the sound with the model, with the SMS tools, and

we are able to identify the harmonics, subtract them from the original sound and

obtain a residual which then we

combine with the harmonics to obtain the original signal.

0:29

In this demonstration class we want to go a little bit beyond that by

approximating the residual with the stochastic model.

And in the case that residual is close to a stochastic signal,

then this will be a good representation, the harmonic plus,

the stochastic component to represent a sound.

So let's start, and let's open the GUI of SMS Tools.

And we will start with the sound of the flute sound, okay?

So, let's listen to this flute sound.

1:10

[SOUND] Okay, so, it's a very stable note,

quite clearly define, the pitch, in fact it's an A4.

And it clearly has some breathing quality that will be relevant for

this idea of the stochastic component.

In order to decide what window size, well, again, let's use the blackman window.

Since it's stable sound, blackman will be a good choice for

being able to get lobe, the side lobes and

getting a good kind of signal to noise ratio in terms of the window.

1:52

So in order to compute what is the best window size, we'll have to just do 6

which is the number of bins of the blackman window times the sampling rate.

44,100 divide by the fundamental frequency of this sound and

this is an A4, so it's 440 Hertz, okay.

So 601 would be a good window to use.

Given that we really want to get a good resolution of the harmonics and

try to minimize the rest of the component and it's quite stable,

we can afford to take a longer window, maybe so let's take 801 samples.

The FFT size, again, we can take a bigger FFT so

that we get a smoother spectrum, so 2048, okay.

And well, let's analyze just in the middle of the sound.

[SOUND] This is a quite a long sound so let's do it like that.

Okay, so this is the samples of the input signal.

In the spectrum, we see quite well the low frequency harmonics

3:11

and similar to the organ sound that we analyze before.

Yeah, in the high frequency areas,

there is quite a bit of, kind of unstable or

kind of very stochastic type of components that are not so

clearly defined as partials.

So they maybe, they might be some harmonics or partials but

they're kind of masked and difficult to identify here.

Anyways, it's okay, so now let's just go directly to the harmonic model and

let's apply the same parameters.

So 801, 2048, now we have to choose the parameters for

the pitch detection and the harmonic detection.

I don't think we need to go very much down in the spectrum for

the harmonics because as we saw, there's not that many.

In terms of the duration of the harmonics,

I think it's good to make sure that they're long as a stable note.

So we might want to put 0.2 in terms of the minimum duration.

In terms of number of harmonics, there are clearly not that many,

so I'm sure with 40 should be plenty.

And we have to set the minimum and maximum fundamental so that 440 fits here.

So I'm sure if we put 300 and 500, that should be plenty.

And the error for the pitch detection, well, let's take the default as 7.

And the deviation, let's start with these values, see what happens.

4:54

And of course, we need to get the flute sound.

Okay, now we can compute it.

Okay, so clearly we got quite a few harmonics and

some here that appear, disappear.

Of course, we have to realize that here we're just plotting the first 5,000 Hertz.

If we want to see more and what's going on in the higher frequencies,

we should display it a little bit differently.

Okay, so let's just listen to the result.

5:27

[SOUND] Okay, that's pretty good.

That's a quite accurate rendition of that.

Of course, it's different from the original one.

In fact, if we hear back the original.

[SOUND] The original has more brighter type of quality and

this is because it has all these other components.

Of course, we can play around with these parameters for example,

to allow for more harmonics to appear.

So for example, if we allow, let's say put 0.1 here, and we compute it again.

Yeah, now, we are allowing these high harmonics

which are very unstable to appear sort of more, but

of course, all these jumps are not really that good.

So that means that these harmonics are not really stable,

so they are really buried into the,

kind of the noise, or this breath that we also hear.

Okay, now we can go ahead and use the harmonic plus residual model so

we can listen to the residual.

And let's again use the same parameters.

The flute sound.

Let's put 800, let's put 2048,

and so the threshold -80.

The minimum duration, let's just put 0.2.

Number of harmonics, yeah, we don't need that many.

And well this will be okay, 350, 700.

And this I think 7 was all right, and

yeah let's make it not as open in terms of the deviation as before.

Let's just put maybe 0.2, see what happens.

7:25

Okay, so here we see the original and the harmonics.

And maybe we see a few more than before and synthesize.

So, let's listen to the residual.

[SOUND] Okay, that's a pretty nice residual.

Very [SOUND], we hear the attack and the attack in fact,

that is this red thing here that during the attack it's clear,

louder this breath and then it just a, sort of gets attenuated and

we hear a very clear breath noise throughout.

Okay, so now, so this is very much on a stochastic signal, it's very noisy,

so that means that we can apply the stochastic analysis to that.

So now let's use the harmonic plus stochastic model and

let's get the same type of parameters.

Of course, we can play around these parameters to get better values.

But the ones we chose, they looked okay.

So let's again, let's put 0.2 here.

Let's get the number of harmonics, yeah, I think 40 harmonics.

And this was all right, this was 7, and this was 0.02.

Okay, now the parameter that is specific for

the stochastic analysis is the stochastic approximation factor.

And here by default is 0.1.

So it means that it reduces the whole spectrum

9:25

Okay, so clearly now, well, we see differently because in this one

we are showing the range from 0 to 14,000 Hertz.

So in fact, we're seeing quite a bit more, so

we see more of the stochastic component.

But as we see the harmonics are very much

on the lower side above 5,000 Hertz, that is not that much.

There is, some of these, but maybe even this line should not be

considered a harmonics and maybe they should be discarded.

But let's listen to the stochastic component.

10:09

Okay, of course, we have lost like a little bit of

the details that the residual had but

it definitely keeps this breathy noise.

Of course, it's not that loud so

when we put it together with the original signal, with the harmonics.

[SOUND] Okay, so it sounds good but clearly the harmonics are taking over and

they are kind of masking quite a bit this stochastic component.

We have to listen to it quite carefully if we want to be able to

distinguish the stochastic component of this type of sound.

And that's basically all I wanted to say.

10:58

Let's go back, so we have talked about

harmonic plus stochastic model, and we have used SMS tools,

the interface that has allowed us to play around with this model.

And of course, the sound, this flute sound, is a free sound.

So hopefully that has given you a view of the potential of the harmonic plus

stochastic model, it's a little bit different from the harmonic plus residual.

But the main difference is that now with the stochastic representation

of the residual we will be able to do quite a bit of things.

Next week, in fact, we're going to be doing transformations to these sounds and

the stochastic representation will allow us to do that whether in