0:08

From your previous exposure to calculus,

you probably learned how to compute derivatives quickly

without the definition by following a certain set of rules or laws.

We're going to cover those rules in this lesson.

Such a topic has the potential to be boring, but

pay attention because we're going to cover why the rules hold, and

in doing so, we'll get a better grasp of the notion of a derivative.

0:39

Recall from our previous lesson that we had two

different ways of defining the derivative.

One is in terms of a rate of change, it's a limit as the input approaches a,

the change in the output over the change in the input.

1:01

Our second definition, a little bit stronger definition,

is in terms of first order variation.

One changes the input to f by a small amount, h and

looks at how the change in the output depends upon h.

If you like, it's the coefficient of the first order term

in the Taylor Series of f at a.

1:30

From these definitions flows the interpretation of the derivative

as the rate of change of the output with respect to change in the input.

And this is the interpretation that you will want to have memorized.

A few remarks are in order.

First of all, the derivative, as you know, most certainly depends

upon the input value a that you are examining.

2:02

Secondly, a derivative concerns rates of change.

We're not actually measuring how much the output changes.

We are looking at the rate of change, as the change in

the input is getting closer and closer to zero.

Lastly, the derivative at a is telling you a rate of change.

It's telling you how fluctuations, or changes, h,

in the input are perhaps amplified,

if the derivative is positive and larger than 1.

Or damped, if the derivative is, say, less than 1.

Or reversed, in the case where the derivative is negative.

3:27

The first rule is that of linearity.

This is consists of two parts.

First what you might call a summation rule, that the derivative of

the function u plus v is the derivative of u plus the derivative of v.

The second part of linearity says that if we multiply

u by a constant c and take its derivative,

what we get is that constant, c times the derivative of u.

4:04

The second rule, the product rule,

states that the derivative of the product of two functions, u and v, is

u times the derivative of v plus v times the derivative of u.

4:20

Lastly, and perhaps most importantly, the chain rule states that the derivative

of a composition of two functions is the product of the individual derivatives.

We'll have some more to say about that in a moment.

4:38

For now, let's focus on the rules of linearity.

The summation rule can be visualized rather simply,

keeping track of how u changes and how v changes.

And keeping track of how u plus v changes is really not that difficult.

Likewise, when you multiply u by a constant,

what happens to its rate of change?

It is similarly scaled.

That picture is at least reasonable.

There's another way to think about these rules, as well.

This is a way that I often think about it.

5:21

If you take the sum of the two functions and then differentiate,

it's the same as differentiating the pieces and then adding them together.

Or likewise, you can multiply by a constant and then differentiate,

or you can differentiate and then multiply by a constant,

whichever path you travel, you will get to the same place.

5:48

These pictures illuminate but do not really justify the differentiation rules.

To do a better justification, let's use our definition of the derivative

as a first order of variation and employ the language big-O.

First of all, u + v evaluated at x + h means what?

Why don't we take u, evaluate it at x + h,

and add to it v evaluate it at x + h?

Now by our definition of the derivative of u,

we know that the term on the left is u plus the derivative

of u times h plus something in big-O of h squared.

Likewise with v.

6:50

What are the first order terms, those that have an h?

Well, on the left, we have du/dx, and

we add to it the term from the right, dv/dx.

Everything else is something in big-O h squared

plus something in big O of h squared.

That is in big-O of h squared, naturally.

So we see that the first order coefficient

is the sum of the derivatives, as expected.

Likewise, when you multiply u by a constant, c,

and evaluate that at x + h, it is simply c times

u + du/dx times h + something big-O of h squared.

The zeroth order term is c times u.

The first order term is c times du/dx times h.

Everything else is a constant times big-O of h squared.

But remember in big-O, constants don't count,

and so we see that the first order term, the derivative is c times du/dx.

8:07

Likewise for the product rule, it's not hard to visualize

the first order variation of u times v in terms of du and dv.

But to write this out using our language of big-O is very simple.

If we take (u x v) and evaluate it at (x + h),

that is u at (x + h)v at (x + h).

We can substitute in the expansions for u and

v and perform this multiplication.

If we multiply these terms as if they were polynomials, what do we get?

The zeroth order term is simply u times v.

8:55

All of those terms that are of first order

in h have coefficients u times dv/dx + v times du/dx.

Everything else in that multiplication, as you can check,

is going to be in big-O of h squared.

So the derivative can be read off from the first order of the term, and

we get the product rule.

9:52

Now, what do the rates of change,

what does the derivative look like, in this case?

Well, dv is telling you at what rate the output of v

changes with the respect to change in the input, du tells you

at what rate the out of u changes with respect to change in the input.

What is the derivative of the composition?

Well, it tells you.

When you change the input to v at a certain rate,

at what rate does the output, the final output of u composed with v, change?

And the answer, as one can intuit is that these rates of change multiply.

10:44

The justification of the chain rule is going to follow the same ideas but

with a bit more manipulation involved.

If we consider u composed with v and

evaluate the input at x + h, then,

clearly, our first step should be to expand out v using

what we know about the derivative of v with respect to x.

Everything else in the variation has big-O of h squared.

11:32

Now we're evaluating u, not at v, but at v plus some perturbation,

some variation term which we know has the structure

of dv/dx times h plus something in big-O of h squared.

Forget about that structure for the moment.

We're going to expand this out.

The zeroth order term is u(v).

Next comes the derivative of u with respect to v times this perturbation term

12:11

plus something in big-O of that perturbation term squared.

And now, here comes the final steps.

If we look at that big-O on the right, and

view it as a function of h, then squaring everything in sight,

we see that everything has powers of h that are at least 2.

And so we can replace that by a big-O of h squared.

Now for the terms in the middle, when we distribute the multiplication

of du/dv, with all of the terms in the parentheses,

well the latter term, du/dv times something

in big-O of u squared yields again something in big-O of h squared.

And so combining those big-O's of h squared,

we are left with a first order term in h that

has coefficient du/dv x dv/dx.

And that is the proper form the chain rule.

13:24

Now one has to be careful.

The differential notation is especially misleading at this point.

You must evaluate your derivatives at the correct inputs.

Let's look at a simple example.

Let's say that f(x) is the exponential function e to the x.

What is the derivative of f composed with itself?

Well, according to the chain rule,

it should be df/dx x df/dx.

Well, that's not exactly what it is,

because we have to be careful about the inputs.

If we were to say evaluate this derivative at x = 0,

then the second term must be evaluated at x = 0.

The first term in this product is evaluated

not at x = 0, but at x = e to the 0, or 1.

That would give us a value of e x 1, which is, of course, e.

Now this looks awfully confusing.

What's really going on?

Well, you know how to differentiate functions, and

you know how to use the chain rule.

Let's think about what f composed with f is.

It's really e to the e to the x.

And if I were to ask you how do you differentiate that, you would say well,

first, I differentiate the e to the x and evaluate that at e to the x.

That gives me e to the e to the x, but because it's the chain rule,

I have to multiply by the derivative of that exponent, in this case, e to the x.

Now if you evaluate both of these at x = 0,

you get the same computation as above with the final value of e.

Be careful about where you evaluate your derivatives.

15:36

There are a number of other differentiation rules that you may have

seen in your previous exposure to calculus, the reciprocal rule,

the quotient rule, the inverse rule.

If you remember seeing these, then take a brief look.

If you've not seen them before,

then you might wanna work through the following examples,

such as showing that the derivative of secant is secant times tangent.

Or that the derivative of tangent is really secant squared.

Or that the derivative of log of x, that is the inverse

of the exponential function, is, in fact, 1/x.

I'll leave it to you to take a more careful look at some of these rules and

to practice in the homework sets.

16:27

And so we see the basic rules for differentiation.

But wait, there's more.

If you look at the bonus material, you'll see how some of the ghosts

of these rules haunt other realms of mathematics.

In our next lesson, we'll cover one of the canonical applications of

derivatives to linearization.