0:00

Strided convolutions is another piece of

the basic building block of convolutions as used in Convolutional Neural Networks.

Let me show you an example.

Let's say you want to convolve this seven by seven image with this three by three filter,

except that instead of doing the usual way,

we are going to do it with a stride of two.

What that means is you take the element Y's product as usual in this upper

left three by three region and then multiply and add and that gives you 91.

But then instead of stepping the blue box over by one step,

we are going to step over by two steps.

So, we are going to make it hop over two steps like so.

Notice how the upper left hand corner has gone from this start to this start,

jumping over one position.

And then you do the usual element Y's product and summing it turns out 100.

And now we are going to do they do that again,

and make the blue box jump over by two steps.

You end up there, and that gives you 83.

Now, when you go to the next row,

you again actually take two steps instead of

one step so going to move the blue box over there.

Notice how we are stepping over one of the positions and then this gives you 69,

and now you again step over two steps,

this gives you 91 and so on so 127.

And then for the final row 44, 72, and 74.

In this example, we convolve with a seven by seven matrix

to this three by three matrix and we get a three by three outputs.

The input and output dimensions turns out to be governed by the following formula,

if you have an N by N image,

they convolve with an F by F filter.

And if you use padding P and stride S. In this example,

S is equal to two then you end up with an output that is N plus two P minus F,

and now because you're stepping S steps of the time,

you step just one step of the time,

you now divide by S plus one and then can apply the same thing.

In our example, we have seven plus zero, minus three,

divided by two S stride plus one equals let's see,

that's four over two plus one equals three,

which is why we wound up with this is three by three output.

Now, just one last detail which is what of this fraction is not an integer?

In that case, we're going to round this

down so this notation denotes the flow of something.

This is also called the flow of Z.

It means taking Z and rounding down to the nearest integer.

The way this is implemented is that you take

this type of blue box multiplication only if the blue box is fully contained

within the image or the image plus to the padding and if

any of this blue box kind of part of it hangs

outside and you just do not do that computation.

Then it turns out that if that's the convention that your three by three filter,

must lie entirely within your image or the image

plus the padding region before there's as

a corresponding output generated that's convention.

Then the right thing to do to compute the output dimension is

to round down in case this N plus two P minus F over S is not an integer.

Just to summarize the dimensions,

if you have an N by N matrix or N by N image that you convolve

with an F by F matrix or F by F filter with padding P N stride S,

then the output size will have this dimension.

It is nice we can choose all of these numbers so that there is an integer

although sometimes you don't have to do that and rounding down is just fine as well.

But please feel free to work through a few examples of values of N, F,

P and S on yourself to convince yourself if you want,

that this formula is correct for the output size.

Now, before moving on there is a technical comment I want to make about

cross-correlation versus convolutions and just for

the facts what you have to do to implement convolutional neural networks.

If you reading different math textbook or signal processing textbook,

there is one other possible inconsistency in the notation which is that,

if you look at the typical math textbook,

the way that the convolution is defined before doing the element Y's product and summing,

there's actually one other step that you'll first take which

is to convolve this six by six matrix with this three by three filter.

You at first take the three by three filter and slip it on

the horizontal as well as the vertical axis so this 345102 minus 197,

will become, three goes here, four goes there,

five goes there and then the second row

becomes this,102 minus 197.

Well, this is really taking the three by three filter and narrowing

it both on the vertical and horizontal axes.

And then it was this flit matrix that you would then copy over here.

To compute the output,

you will take two times seven,

plus three times two,

plus seven times five and so on.

I should multiply out the elements of this flit matrix in order to

compute the upper left hand rows elements of the four by four output as follows.

Then you take those nine numbers

and shift them over by one shift them over by one and so on.

The way we've define the convolution operation in

this video is that we've skipped this narrowing operation.

Technically, what we're actually doing,

the operation we've been using for the last few videos

is sometimes cross-correlation instead of convolution.

But in the deep learning literature by convention,

we just call this a convolutional operation.

Just to summarize, by convention in machine learning,

we usually do not bother with this skipping operation and technically,

this operation is maybe better called cross-correlation but most of

the deep learning literature just calls it the convolution operator.

And so I'm going to use that convention in these videos as well,

and if you read a lot of the machines learning literature,

you'll find most people just call this

the convolution operator without bothering to use these slips.

It turns out that in signal processing or in certain branches of mathematics,

doing the flipping in the definition of

convolution causes convolution operator to enjoy this property that A convolve with B,

convolve with C is equal to A convolve with B,

convolve with C, and this is called associativity in mathematics.

This is nice for some signal processing applications but

for deep neural networks it really doesn't matter and so omitting

this double mirroring operation just simplifies

the code and makes the neural networks work just as well.

And by convention, most of us just call this convolution

or even though the mathematicians prefer to call this cross-correlation sometimes.

But this should not affect anything you have to implement in

the problem exercises and should not

affect your ability to read and understand the deep learning literature.

You've now seen how to carry out convolutions and you've

seen how to use padding as well as strides to convolutions.

But so far, all we've been using is convolutions over matrices,

like over a six by six matrix.

In the next video, you'll see how to carry out convolutions over volumes

and this would make what you can do a convolutions sounds really much more powerful.

Let's go on to the next video.