Let's talk about neural networks,
also called neural nets,
and basically deep learning is a synonym in the way it's used nowadays.
I'm not going to talk anything about the biological inspiration,
synapses, and brains and stuff.
I'm also not going to talk much about the maths or any of the deeper theory.
Just after this video,
you'll find a text file with some suggested reading links,
if you want to know more. Yes, I got a diagram.
Lots of circles. The circles on the left are the input neurons.
You input your problem.
And remember, everything is a number in a neuron net.
So, each of those neurons contains a floating point number.
Now, by default, H2O will normalize and standardize your inputs,
which means that it will be between minus one and plus one,
and it will also do one hot encoding for you,
which means categorical data will just work and you don't have to worry about.
The circles on the right are the outputs. We'll come back to them in the middle.
The circles in the middle are the hidden neurons, the hidden layer.
Let's zoom in on one of them.
You can see we've got lots of arrows coming in,
those other weighted inputs from every neuron in the previous layer.
If it's the first hidden layer,
that's your input data coming in.
If it's the second hidden layer,
that is the output of the first hidden layer and so on.
On the right side, we have arrows going out,
and that's connections to everything in the next layer,
or if it's the final hidden layer,
connections to the output layer.
And there's an arrow at the bottom which is called the bias.
I'm not going to go into it much,
just be aware that it's there.
And again, look on Wikipedia or
any of the further reading links to learn more what it is for.
So, let's go back to the output layer first.
We had those circles.
If you're doing a regression,
you'll have a single output neuron,
the number you're trying to predict,
nice, easy, and obvious.
If you're doing a classification,
you'll have one output neuron for each of the categories you're trying to predict,
and the number in those output neurons is its confidence, and that will be scaled.
So they all sum to one.
So, if you're trying to predict the color, red, green,
or blue, and the net is fairly sure it's red,
you might get 0.9 in the red neuron,
0.05 in the green neuron,
and 0.05 in the blue neuron.
If it has really no idea what's going on,
you might get 0.4 in the red,
0.3 in the green, and 0.3 in the blue.
If you're doing a binomial classification,
you either have one or two neurons.
Sometimes, the network is set up so that you
have one neuron for "Yes" and one neuron for "No".
Even though one implies the other,
sometimes it gives a bit more accuracy.
That is an internal implementation detail of H2O really,
don't worry about that.
Let's go back to that hidden neurons on that circle.
So far, all I've said is that the circle does something to
the inputs to be able to send something to the next layer.
What it does inside is what's called an activation function.
The first one I'm going to look at is tanh,
which some people called tang.
In this graph, the x-axis represents the sum of the weighted inputs,
and you can see the output is always between minus one and plus one.
The next one I want to look at and the one that's most popular nowadays is rectifier.
This is simply outputting the sum of all the weighted inputs but clipped to zero.
So, if the inputs sum to zero or a negative number, it will output zero.
And the final one that H2O offers is called maxout,
and this really is the simplest of all.
It takes the maximum of any of its inputs and uses that as the output,
ignores all the others.
Rectifier is the default,
generally is the one you want to go for because it's a bit quicker than tanh.
But your choice of activation function isn't going to
be the most critical thing when you're tuning your network.
And again, that is why we use rectifier because it's a bit quicker.
That's just about all the theory that I'm going to go into.
If you want to know more details,
go through some of the suggested reading.
There's a lot out there on the goal in
mathematical details of implementing that propagation and all these activation functions.
H2O takes care of it.
So, for this course, it doesn't matter.
You don't actually need to know how deep learning works to use it effectively.
After that, the remaining videos this week are going to look at how
to use deep learning on a variety of different problems.