>> In this module, we are going to talk about nonlinear optimization.
We will review unconstrained optimization, and then briefly talk about constrained
optimization and in particular, Lagrangian relaxation and its two applications.
One on some utility maximization problem and another one on a mean-variance
portfolio selection problem. Let's start with unconstrained nonlinear
optimization. What does unconstrained mean?
It means that I'm allowed to choose any vector that I like to try to minimize a
function. To keep everything general, we will assume
that I've got a function f of x, and what is x?
X is a vector, in Rn. It's a vector with n different components.
It's a unconstrained problem so I can minimize my f of x, the function, over the
entire space. It turns out that for non-linear
optimization, we have to differentiate between two kinds of minimum problems.
Ordinarily, we just want to find a vector x which takes the minimum possible value.
So, here's a picture. I've got a one-dimensional function.
Here's my x, here's my function f of x. Now, when I'm trying to minimize this
function f of x, all I need is this point. This is my point x star the minimum.
It's the global minimum. It's the best point that I want in the
entire real line. But as we'll go through this lecture,
note, we will see that, when it comes down to nonlinear programming, we try to find
these points by looking at derivatives. By taking first derivative, which will
give me the gradient, or the second derivative, which will give me the Hessian
matrix. Derivatives, unfortunately, can look only
in the neighborhood of a point. They can't go look far away, because
derivatives are defined by taking limits of points coming arbitrarily close.
So, in order to be able to work with derivatives, we have to have another
notion, which is that of a local minimum. So, this point here is a local minimum.
Why is it a local minimum? Because if I just go in a neighborhood of
this point, if I look for points that live in this little interval around here, then
for this little local neighborhood, this point is actually optimum.
It is true that if I leave neighborhood I get other points that are better, x star
over here is a global optimum point. And therefore, that's better than this
local optimum point or at least as good as the local optimum point.
But within this interval, I can't get any other point.
So, for nonlinear optimization, we have this notion, two different notion.
Global optimum point, that's where we want to get to.
Local optimum point, which are locally optimum, that's what we can get because we
use derivatives. So now, we are going to describe what are
the conditions that we need in order to get a local minimum.
Remember, local means only in the neighborhood.
It means that these criteria are given by taking derivatives.
A point is a local minimum if the gradient at that point is going to be 0, meaning
that, what's a gradient? It's a vector, remember this function?
Is a function from Rn, meaning that it's got a vector x, which has n components.
Therefore, the gradient is simply the partial derivatives.
I take the function, I take the partial derivative with respect to the first
component, the partial derivative with respect to the second component, partial
derivative with respect to the nth component and stack it up as a vector.
When I say that gradient is equal to 0, I mean that every component is equal to 0.
Here's a simple example. Let's take F of x is equal to 2x squared
x1 squared plus 3x2. So, the partial of f with respect to the
first component, is going to be 4x1. The partial of f with respect to the
second component is going to be equal to 3, because the x2 goes away.
So, the gradient is going to be 4x1 and 3. So, when is it equal to 0?
That means, that x1 must be equal to 0. So, for any local minimum, it must be the
case that x1 is equal to 0. But that's not sufficient.
We have to take a matrix of second derivatives, and that matrix must be
positive, semi-definite at any local minimum, and positive definite, if you
want to be sure, that, that point is a local minimum.
How do I construct this matrix? I take the partial derivatives and stack
them up as a matrix. First component, remember, for any matrix,
this is going to be the one, one component.
So, I take the partial derivative with respect to x1 and take another partial
derivative with respect to x1. This point here is 1, 2, so I take the
partial derivative with respect to x1 and then a partial derivative with the respect
to x2. Just a, just to recall that if a function
is twice differentiable, meaning you can take two derivatives then it doesn't
matter the order in which you take the derivative.
Whether you first take it with respect to x1 and then you take it with respect to x2
or vice versa. And we'll see an example in a moment.
You stack them up on the matrix, that matrix must have all non-negative
eigenvalues. We won't get into the detail of
eigenvalues because we don't really explicitly need it.
They can be easily computed in MATLAB. You can just give a command called eig, it
will tell you the eigenvalues of the matrix.
If all the Eigenvalues are non-negative, then it'll turn out that it's a local
minimum. If it turns out that all the eigenvalues
are strictly positive, then it's definitely a local minimum.
Now, functions that are going to be useful are called convex functions and when we
use these functions, we are going to remind you in the course.
But the picture that I have, I want you to keep in mind is the function is convex if
it looks like this. I take any two point and I draw the
straight line joining those two points and the function.
The straight line lies above the function. So, this is a convex function.
Here's a function that is not a convex function because if I take these two
points and join them, that's not lying above the function.
So, convex functions are particularly nice if I'm trying to minimize a convex
function that I don't even have to check the Hessian.
I just get the gradient, set it equal to 0 and I'm done.
We'll walk you through to two examples of what is going on here.
So, first example is unconstrained nonlinear optimization.
Here's the problem that I want to solve. I want to minimize over x in our two, two
components, this function. X1 squared plus 3x1 x2 plus x2 cubed.
I first take the derivative and set it equal to 0.
I get 2x1 plus 3x2 as the partial derivative with respect to x1.
I get 3x1 plus 3x squared as the partial derivative with respect to x2.
I set it equal to zero, and I solve it. The first equation tells me how x1 and x2
are related. I plug that into the second equation.
That gives me a quadratic, which has two roots.
It turns out that the two roots are x equal to 0 or x equal to minus 9 over 4,
and 3 over 2. So, the first component is minus 9 over 4.
The second component is 3 over 2. If I take the partial derivative, the
second partial derivative, I take this one and take its partial derivative with
respect to x1, I get 2. I take the same and I take the partial
derivative with respect to x2, I get 3. For this one, if I take the partial
derivative with respect to x1, I get 3 but I should have known that already because
it doesn't matter the order in which I take derivatives.
So, the off diagonal-terms are always the same.
If I take this,[UNKNOWN] take the second derivative with respect to x2, I get 3
times 2, 6x2. If I plug x equals to 0, which means x2 is
equal to 0, I get this matrix. That's not positive definite, so it's not
a local minimum. It has actually one positive eigenvalue
and one negative eigenvalue. If I put x equal to minus 9 over 4, 3 over
2, it turns out that this matrix is positive semi-definite, meaning it has
non-negative eigenvalues and, in fact, it's positive definite, meaning that it
has all positive eigenvalues, and therefore, I know that this is a local
minimum. That's all that we need to know as far as
this course goes about unconstrained nonlinear optimization.
Take the gradients that are equal to 0, figure out what's happened to the Hessian,
or the second-order matrix. Next, we want to take this idea and apply
to constraint problems. So, here's a constraint problem.
A typical problem like this will show up in a utility maximization problem.
I have two different things that I could do.
I could either invest in 1s, one particular business or another particular
business and I have a total wealth of 12. So, x1 plus x2 equals 12.
If I put money into the first business, I get 2 times log 1 plus x1 as my return.
If I put it in the other business, I get 4 times log 1 plus x2 as the second return.
Now, log, logs are going to have diminishing returns so eventually, even
though I put money in the second one, which gives me incrementally the better
return, I'll end up getting lesser and lesser and so at some point, the first
project will become more competitive. Now, the constraint tells me the total
amount of money that I have is just 12. If this, if there were no constraints, I
would solve this problem easily. But this constraint makes my problem
harder. It's a convex problem because log is a
concave function and trying to maximize a concave function with respect to some
linear constraints. And this is a convex problem so in theory,
this is easy. So, in order to get rid of the
constraints, what we do is we multiply it by a variable and add it to the objective.
So, I've got 2 log 1 plus x1, 4 log 1 plus x2.
This is just the objective. Here is my multiplier and it's, it has a
mean which is sometimes called the Lagrange multiplier because Lagrange was
the first, first mathematician who came up with using this idea.
Try take the Lagrange multiplier, v, multiply it to the constraints x1 plus x2
minus 12, what's happened to the minus 12, I moved the 12 on to the other side that
becomes my constraint and subtract it from the objective.
Now, I throw away the constraints. Remember, we had done something very
similar to this in the module linear optimization.
We dualized the constraints, multiplied them by a quantity that is a multiplier
which had to have some sort of particular sign.
In this particular case, I threw away the signs as well.
We can be anything and then I threw away the constraints, I relaxed and it just
optimized the objective, in order to get the dual linear problem.
I'm doing the same thing here. I've got a nonlinear objective, I'm
subtracting a multiplier times the constraint and then, I'm going to throw
away the constraints and pretend that this is an unconstrained problem.
In order, I know that this problem is convex so in order to get the optimum
point, all I have to do is find the gradient and find its solution.
So, I take the gradient of this Lagrangian function.
From here, I get 2 over 1 plus x1. From here, I get minus v.
That's the partial derivative with respect to x1.
Partial derivative with respect to x2, I get 4 divided by 1 plus x2 minus v, from
this one, and that is equal to 0. If I solve this, I get x 1 is equal to 2
or v minus 1, x2 is equal to 4 over v minus 1.
How do I get this v? I know that the optimal solution, x1 plus
x2, must equal 12, so I plug it in to the equations, and I end up getting an
equation that says, 6 over v must equal 14, or v must equal 3 over 7.
Once I know the value of v, I plug it back into the expression for x, I get x is
equal to 11 over 3 and 25 over 3. So intuitively, it makes sense, you're
going to invest more in the second, second opportunity, because it has a higher
return. But as you scale up that return, as you
scale up your investment, you have diminishing return, so at some point, it
becomes profitable to go to the first one, and the correct balance is in the ratio 11
is to 25. And look how easily we were able to solve
for this problem by including a Lagrange multiplier, taking the gradients, and then
solving it to find what x1 and x2 is. The last piece of this module will show us
how to apply this for a particular problem, which comes up over and over
again in financial engineering, which is portfolio selection.
Now, what's going on here, I've got a bunch of assets, so my x belongs to R to
the n. It has n different assets, so it's a
vector with n components, what does this constraint say?
If I take the vector of all ones and multiply it to x, I get 1.
This multiplication just gives me the sum of j going from 1 through n of xj.
So, if I add up all the components of my portfolio, I have 1 dollar to invest.
And what do I want to do? I want to choose a portfolio that
maximizes my return minus a risk aversion parameter lambda times the variance.
This is the return or the mean return on my portfolio, v transpose xv, gives me the
variance of my portfolio. And I want to do the portfolio x, which
sums to 1, and which maximize the, the risk adjusted return.
Again, the constraints makes this problem harder, so I'm going to take the
constraints, multiply it by a Lagrange multiplier, and put it into the objective.
Here's my objective, mu transpose x minus lambda times x transpose vx minus v times
1 transpose x minus 1. Again, a convex problem, it has no
constraints, I'm going to take the derivative and set it equal to 0.
If I take the gradient with respect to x, I get mu from here.
Just to make this thing a little bit clearer, let me get rid of this.
This mu is coming from the gradient of that quantity.
Minus 2 lambda vx is coming from the gradient of the second quantity.
V times 1 is coming from the gradient of this quantity.
I set it equal to 0. Now, I solve for x.
What is x? X is 1 over 2 lambda v inverse mu minus
v1. What do I do?
I take this minus 2 lambda vx term onto the other side so this minus becomes plus.
And I take the 1 over 2 lambda, divided, take the inverse of the v, multiply it
onto the other side, I get x. So now again, like in the simple example
that we saw before, I now have x in terms of v.
How do I get the v? Plug it into the constraints.
1 transpose x, must equal 1. I put the equation here, I get 1 transpose
v inverse mu minus v1 must equal 2 lambda, v is just a scalar.
I rearrange terms, I get v is equal to 1 transpose v inverse mu minus 2 lambda
divided by 1 transpose v inverse 1. I know the value of v now, I can plug it,
plug that value v here, and get expressions for x.
And again, this complicated portfolio selection problem has been solved by just
taking Lagrange multipliers. We will use this in the course to show
how, by using this technique, we can generate efficient frontiers, we can
characterize what the shape of these frontiers are, we can say that any point
of this efficient frontier can be generated by a small set of mutual funds
and that, ultimately, will lead to the capital asset pricing model, which is a
very important methodology for pricing assets in the market.