0:39

This is the Short-time Fourier Transform equation,

basically a modified version of the DFT.

With few but important differences.

So for example, the input to the equation,

the input signal, is not just x of n but

is the multiplication of w which is our

analysis window, by a fragment of x of n.

Okay, so here x has an argument that has n, our time index,

but also has a frame number and a hop-size.

So, l is the frame number and this is our time index so

we will be Iterating over l.

So, we will be skipping thru time this way and capital H is our hop-size.

How much we going to hop from one time instances to the next.

So, basically x is going to be changing in time according to l and

H and then at every time instance, at every error,

it's going to be multiplied by this analysis window, w of n.

The rest is the DFT.

So, the only thing that changes is that the input signal changes.

And therefore, the output is also is not a single spectrum but a sequence of spectra.

There is the x sub l, so the variable l is the frame number.

So that means, that the output of the Short-time Fourier

transform is going to be a sequence of spectra.

Each one of the same size and having magnitudes and fades but

each one differently because the input will be a different fragment of the sound,

stepping through the sound in a progressive manner.

So, to emphasize the idea of zero phase windowing that

we already have talked about.

From now on, we generally specify the timing next to go

from minus over N over 2 to N over 2 minus 1.

Ok, so it's always centered around zero we do not have any phase changes.

We don't have any kind of shifting in the time,

and therefore in the spectrum.

The windowing is a way to step through the sound, as I mentioned.

So, here we can see a depiction of that and if we use the analogy of image and

video we could relate a spectrum with a photograph, a static image and

then the short time for a transfer with video, a time varied image.

So, here we see in this picture that the whole time for

a sound at the bottom and how we are basically stepping

through the sound by windowing the sound with this analysis window.

And therefore, being able to get all the sound

as a sum of basically sound fragments.

Okay, to better understand the fact of windowing a sound, let's put an example of

what happens when we window a real sinusoid and then compute its spectrum.

So, if we start from a real sinusoid we already have seen that.

So, it's a cosine with a frequency index,

k subzero, and an amplitude, A subzero,

which can be expressed as the sum of two complex sinusoids.

One with a positive frequency, another with a negative frequency.

Then, if we substitute into the Short-time Fourier

transform equation, this signal and we window it.

We can step through these different steps

in which we first put X of N in the equation.

Then, we are substituting by the sum of this two complex exponentials.

Therefore, because of the linearity of the DFT,

we can split these into two equally equations,

equal equations in which, in each one we have

a complex exponential as the input signal and

of the amplitudes can be move outside, and

basically what we get back to is the sum 2 DFT's of the window.

And with frequency shifting operation.

So, basically, and then here we see that

the result is the spectrum of the window.

Of course, frequency shifted by the frequency of the input signal,

and multiplied by the amplitude,

by half of the amplitude of the input signal plus of course

the other window at the other complex exponential frequency.

One is the minus frequency and the other is the plus frequency.

So, this will be the result of these cosine, so

which is basically the transform of the window.

Shifted to the frequency of the input signal and

multiply that with the amplitude of the input signal.

When we this plot, we can understand this windowing process a little bit better.

So, on the top, we have the window.

Underneath, is the windowed,

sinusoid that we have as our input signal.

And then the transform of the window can be shown on the top in

which the transform of this window within this case is a hanning

window Is that magnitude spectrum centered around zero and

with the symmetry and with a given phase.

And now if we take the DFT off the windowed sinusoid,

well what we are seeing is basically the same shape

than the window but at the frequency of the sinusoid.

The two frequencies of the sinusoid, the positive and

the negative frequencies and at the phase of the sinusoid, too.

So we have the two values for the two phases with this

anti-symmetry that this analysis results into.

So, from this discussion, we can realize the importance of the analysis

window in the spectrum of a sinusoid and that's of any sound.

It's clear that we have to spend some time explaining the windows.

So an analysis window is generally a real function, and

is asymmetric around the origin.

And this is the simplest window, the rectangular window.

Its time domain is nothing too particular, but

it's magnitude spectrum is much more interesting.

So time domain, it just has value of one, for the duration of the window,

in this case 64, and the spectrum, the magnitude spectrum,

has a shape which we call it as a sinc, because the transform is a sinc function.

And it basically could be described in many different ways.

But we focus on two main aspects,

on what we call the main lobe, the peak at the center and

we'll be talking about the width of the main lobe mainly.

And then we talk about the side-lobes which are these small lobes next to it.

And we basically focused on the level of the highest of these side lobes.

So, we were talking about the highest side-lobe level.

Okay, so there are many windows used in audio signal processing.

And this is the list of windows available in the scipy module of Python.

So we can go through them and we can see quite a variety of windows.

Some of them we are not going to pay much attention to, but

for example, we will be talking about the Blackman window.

We'll be talking about the Hamming window, the Hamming window,

we'll be talking about, for example, the triangular window, etc.

Some others are not so much used in audio.

Each window can be distinguished from the others by measuring the main lobe

width and the side lobe level.

And each windows offers a different compromise with respect to these two

values.

So, let's show some of them.

So, the first one is rectangular window, and

the equation shows how it's computed.

And, the spectrum is what we call a sinc function,

it's the sine Pi k where k is the frequency index

divided by another sine function.

So, if we look in the plots, the spectrum could be part is,

well it's the manage of the spectrum so

it's the absolute value of this WK so

basically is a sign function with a kind of a thin

waited function applied to them at the boundaries.

So, it resolves into this, This shape here,

the characteristic shape that's going to be called this sync function.

And talking about how to describe it,

we mentioned about the width of the main lobe and these has two bins and

two bins means two samples and in this, we have to be careful because

this is measured When the DFT is the same size than the window.

So, if we take a window size of the same size of the window,

let's say ten samples, then it's going to be two bins.

But generally since we do zero padding, then the number of bins is higher.

But this is because of the zero padding and

we normally do it in order to better visualize the shapes.

In fact, this shape has been generated by a lot of zero padding so

we can have this as smooth visualization that ,strictly speaking,

the number of bins that we refer to is two.

And the side lobe level, the highest side lobe level, is minus 13.3 decibels.

So, the distance between the center peak and the first side lobe level.

Maybe the most popular window is the Hamming window, which is a raised cosine.

So, the equation is, we do .5 + .5 of the cosine so this raises the cosine.

So it's just one cycle of that cosine.

And if we compute this spectrum it's also going to be expressed as sums

of the syncfunctions.

In fact all the windows can be expressed In the time domain by sums of cosines and

in the frequency domain by sums of this sync function.

So, in this case is the sum,

in the frequency domain of three sync functions, okay.

And, again, the two values that characterize this shape,

these frequencies that main shape, is the width of the main lobe which is four bins,

so twice as much as the rectangular function.

And the side lobe level is minus 31 point 5 decibels, so which is lower.

Okay, now the main lobe width wider And the side-lobe level is lower.

The Hamming window is very similar to the honing, but with a small and

insignificant difference.

It's a raised cosine with a step in the side.

By having these small steps into the sides.

We get a m spectrum that maintains the same main look width.

So that's good it doesn't get wider but

in exchange we get much lower site lobe level -42.7 decimals and

this is, as we are going to see an important thing.

They ideally used to have the lowest side-lobe level and

the narrowest possible main-lobe.

So this a good window.

Of course, nothing comes for free, so the side-lobe levels

do not decrease so abruptly as they go away from the main log.

The Blackman window is the sum of two sinusoids and with that we accomplish

a significant improvement in terms of the side-lobe level measure.

Okay, so we see the magnitude spectrum which the main lobe Is wider,

is 6 bins, but the side-lobe level is lower, is 58 decibels.

And that's good because that's starting to be quite useful

value at the side-lobe level for many audio applications.

And we'll come back to that.

And then finally the window I want to and I'm talking about

is the blackman-harris window is a very special one.

Because you can basically say it has no side lobes.

So, it's a sum of several cosines, in this case it's four cosines,

with different coefficients in the summing.

And then in the frequencies domain,

the magnitude spectrum, the main lobe, again, gets wider.

In this case, it's 8 bins.

But the side-lobe level is -92dB and

if we think about it in terms of signal-to-noise ratio,

which is a very important factor in digital signals.

92 decibels is basically below the noise floor of 16

bits of the kind of signal that we deal with.

So basically, that means this side lobes, and

if we consider them as, As artifacts or a noise, they are not heard.

In other windows we could say that these side-lobes

are artifacts that can't be heard.

Anyway, again, we will come back to that.

And now to finish let me just compare some of

these windows being applied to the same sound.

So, we start with a fragment of a sound of a certain length and

we are applying three different windows.

The first one is the rectangle, the next one hamming, and finally, the blackman.

Clearly, very distinct spectra.

And by looking at these, we can see kind of that maybe the best for

this particular analysis is the blackman.

We see a smoother spectrum,

we see these peaks are much more clearly distinct and in fact,

these peaks correspond to the harmonics of the sound.

Okay, so this is all and there is a lot of references for

the topics I covered, especially about windows.

In Wikipedia, you can find quite a bit of information about

Short-Time Fourier Transform about windows.

Julius and his website and his online back discusses this quite a bit.

So, that's a very good reference.

And that's the researchers and their credits and references.

So, this is all for

the first part of the lecture on the Short-Time Fourier Transform.

We have explained the basic equation of the Short-Time Fourier transform,

and we have focused on the analysis window.

In the second part, we will continue with this topic.

So, I will see you in the next class.