0:00

You've already seen most of the components of object detection in this

Â video let's put all the components together to form the Yolo object

Â detection algorithm first let's see how you construct your training set suppose

Â you're trying to train an algorithm to detect three objects pedestrians cars

Â and motorcycles and you will need to explicitly have the full background

Â class so just v-class labels here if you're using to anchor boxes then the

Â outputs Y will be 3 by 3 because you're using a three by three prints l by 2

Â this is the number of anchors by eight because that's the dimension of this 8

Â is actually 5 which is and then plus the number of classes so 5 because you have

Â PC and in the bounding boxes that's 5 and then C 1 C 2 C 3 that dimension is

Â equal to the number of classes and you can either view this as 3 by 3 by 2 by 8

Â or 3 by 3 by 16 so to construct the training set you go through each of

Â these 9 grid cells and form the appropriate target vector Y so take this

Â first print cell there's nothing worth detecting in that grid cell none of the

Â three classes pedestrian car motorcycle appear in the upper left grid cell and

Â so the target y corresponding to that grid cell will be equal to this where PC

Â for the first anchor box is 0 because there's nothing associated the first

Â hanger box and is also 0 for the second active off and so all of these are the

Â values are don't cares now most of the grid cells have nothing in them but for

Â that box over there you would have this target vector Y so assuming that your

Â training set has a bounding box like this for the car it's just a little bit

Â wider than it is tall and so if your anchor boxes are that this is a good box

Â 1 this is anchor box 2 then the red box has just slightly higher iou with anchor

Â box 2 so the car gets associated with this

Â lower portion of the vector so notice then that PC associative Xbox one is

Â zero so you have don't care as for all of these components then you have this

Â PC there's equal to one then you should use these to specify the position of the

Â red bounding box and then specify that the correct object is fast to write that

Â it is a car so you go through this and for each of your nine grid positions for

Â each of your three by three grid positions you would come up with a

Â vector like this come up with a 16 dimensional vector and so that's why the

Â final output volume is going to be 3 by 3 by 16 oh and as usual for simplicity

Â on this slide I've used a three by three grid in practice it might be more like a

Â 19 by 19 by 16 or in fact to be used more anchor boxes may be 19 by 19 by 5

Â times 8 is 5 times 8 is 40 so being 19 by 19 by 40 that's if you use 5 anchor

Â boxes so that's training and you train a confident that inputs an image may be

Â 100 by 100 by 3 and you're confident but then finally output this output volume

Â in our example 3 by 3 by 16 or 3 by 3 by 2 by 8

Â next let's look at how your algorithm can make predictions given an image your

Â neural network will output this 3 by 3 by 2 by 8 volume where for each of the

Â nine current cells you get a vector like that so for the grid cell here on the

Â upper left if there's no object there hopefully your neural network will

Â output 0 here and 0 here and then I'll put some of the values your new network

Â on I'll put a question mark come up with it don't care so I'll put some numbers

Â for the rest but these numbers will basically ignore because the new network

Â is telling you that there's no object there so it doesn't really matter

Â whether Elpis is a bound the involves to scroll so basically just be

Â some set of numbers are more less noise in contrast for this box over here

Â hopefully the value of y2 the output for that box at the bottom left hopefully

Â would be something like 0 4 bounding box 1 and then you know just up with a bunch

Â of numbers just noise hopefully your also output a certain numbers that

Â corresponds to specifying a pretty accurate bounding box for the car so

Â that's how the neural network will make predictions finally you would run this

Â through non max suppression so just to make it interesting let's look on the

Â new test set image here's how you run nan next suppression if you're using to

Â anchor boxes then for each of the nine grid cells you get two predicted

Â bounding boxes some of them will have very low probability very low PC but you

Â still get two predicted bounding boxes for each of the nine treated cells so

Â let's say those are the bounding boxes you get a notice that some of the

Â bounding boxes can go outside the height and width of the grid cell then it came

Â from NYX you then get rid of the low probability predictions so get rid of

Â the ones that even a neural network says gee this object probably isn't there so

Â you get rid of those and then finally if you have three classes you're trying to

Â detect it trying to detect pedestrians cars and motorcycles what you do is for

Â each of the three classes independently run non-mac suppression for the objects

Â that were predicted to come from that class but use non max suppression for

Â that for predictions of the pedestrians class for a non-expert suppression for

Â the car class they run on max oppression for the motorcycle cause but run that

Â university three times to generate the final predictions and so the output of

Â this is hopefully that you will have detected all the cause and all the

Â pedestrians in this image so that's it for the yellow object detection

Â algorithm which is really one of the most effective object detection

Â algorithms that also encompasses many of the best ideas across the entire

Â computer vision literature that remain to object detection and you get a chance

Â to practice implementing many components of this yourself in this week's program

Â exercise so I hope you enjoy this week's from exercise there's also an optional

Â video that follows this one which you can either wash or not wash as you

Â please but either way I also look forward to

Â seeing you next week

Â