Suppose you're taking measurements of a physical quantity that you know

to be smooth and slowly varying.

For instance, the tide level at the sea or the blood pressure of a patient.

Because of the measurement process your values,

the values that you actually measure are corrupted by some form of additive noise.

Which at this point we could just say it's some extraneous quantity that

perturbs the measured value.

So what you get in the end instead of the smooth curve of the first panel

is a curve where you see that there's really fast wiggles that you interpret

as being a noise corruption.

And the question is can we process this data as a discrete time-signal,

so that most of the noise is removed and

we get back a measurement this in line with our expectations of smoothness.

The first idea is to replace each sample of the average sequence

with a local average.

Averages are usually good when you want to

eliminate random variations that you don't know much about.

So for instance, assume a situation where we process the signal and

the process signal is at each sample

the simple two point average of the current sample and the past sample.

Well it could very well be that the two point average is not enough to remove

most of the noise.

And so in general, we could take longer averages by considering the output

samples to be the average of capital M points from the input.

If we try to do that experimentally,

we can see immediately that the strategy works for increasing values of M.

So if we take just the two point average,

we see than we don't really get a lot of leverage.

You can see in blue the original signal, and in red,

the filtered output and with a two point average, we still have a lot of wiggles.

But as we increase the number of points that we use to compute the average,

we can see that we start to remove a little bit more of the noise.

And by a 12 point average we already can see that the strategy already pays off.

We have I'll write this mood without the signal requires significantly, and

if we're willing to increase the average up to say,

100 point, then we can see that the signal we get the output is actually very smooth.

You will also see that with respect to the original signal,

the output has been delayed by a certain amount.

This is the price we pay because we are actually

using up 100 samples to compute the output value.

So in a sense we are accumulating a certain delay with respect to the input

because we're averaging from the past.

We will see this in more detail later on.

What we have done is really a filtering operation, and

if we want to compute the input's response, we need to do nothing

more than just put the delta function inside our averaging machine,

and it's very easy to see that if we put the delta function in here, this,

remember, is the averaging relationship that we saw before.

We have that the impulse response is equal to 1 over M for

n that goes from 0 to M minus 1, and 0 otherwise.

It's easy to convince yourself that in this summation there

is only one non zero term, which is the one where n is equal to k.

Since the index goes from 0 to n minus 1.

If your value of n is outside of this range, none of these terms will be non 0,

and you will get 0.

If we plot the impulse response, it looks like this.

So you will have a string of samples that are equal to 1 over m from 0 to M minus 1.

So what we had learned experimentally, is that the smoothing effect of the filter

is proportional to the length of the impulse response.

As a consequence of that a number of operations that we have to perform to

compute each output sample and also the storage,

because we have to keep in the memory of our processing unit the past values in

order to be able to compute the average while both of

these quantities are proportional to the length of the filter.

So as the filter becomes longer and as its smoothing effect increases,

the price we pay in terms of computational resources and memory are higher.

So we might not like that and we might want to see whether there's a smarter way

to achieve the same effect without using up so much in terms of computer power.

Let's look at the equation for the moving average,

it's just one over capital m, times the sum of input samples from

m to n minus capital m plus now we can split this into two parts.

We have the first term, which is one over capital M times the current input,

plus 1 over capital M, times the sum of past input samples

from m minus 1 to m minus capital M plus 1.

And this is the sum of capital M minus 1 terms.

So the second term here looks suspiciously close to the moving average computed this

time over capital M minus 1 point, so 1 point less than before, and delayed by 1.