0:00
Let us know speak about Backward Induction.
And Backward Induction can be viewed as a way of computing the subgame perfect
equilibrium of a game. It's a procedure that's used, widely.
Or variants of it are used widely in game playing programs.
Be it, chess, or, or other ones. And, and, and how does this work? So this
is a busy, a busy slide and don't be daunted, we'll explain it leisurely.
also the, some of you may have not seen algorithms before but we'll explain the
algorithm in very plain terms. Before we do, let's just first give the
intuition. The intuition is very straight-forward.
What we are trying to do is associate a value of the game not only with the leaf
nodes, with leaf nodes we know what the value is.
It is simply defined by the. Payoff vector associated with the leaf
node is part of the game definition. But suppose we wanted to go to the root
of the node, or any other internal node, and say so what really is the payoff
associated here assuming that agents will play a subgame perfect equilibrium? And
that's the goal of this procedure called Backward Induction.
And the situation is very straight forward.
We'll go to the leaf and back up slowly. If at every step of the, of the way.
assuming the agent will maximize. take an action then maximize their
payoff. at that node.
And so that's intuition. Now let's see how it's done formally.
The procedure is called Backward Induction, and it takes one argument, a
node, a node in a tree, any node. It could be the root, it could be a leaf,
or anything in between. And, of course, every node has some
player associated with it, and just anticipate what we'll encounter shortly.
Row of h, will be the player associated with that node h.
And, when the procedure returns it'll give us, those pay-offs, eh.
Pay-offs of, us, to have all of the agents, associated with that.
Nodes. So how did that work? And again remember,
remembering our intuition eh, we say the following.
If h is a leaf node, Z is a set of leaf nodes.
Ifs h is a leaf node, then we simply return The path vector as defined by the
game. That's where the recurrsion bar comes
out. Most of the work of course happens in the
recurrsive step when we're not at the leaf node.
So for that we do the following. We will keep a A variable called
best_util, and best_util will be a vector, a vector of payoff associated
with the agents each one, one with each agent.
And we will be updating that, that vector as we, as we go along.
So to start out with we'll assume the payoffs are all terrible.
Let's call that minus infinity. a payoff smaller than all possible
payoffs in the game. And then we do the following.
We will look at all the actions. Available at that node.
So [INAUDIBLE] of h is a set of all actions available at that node.
A would be an example of such an action. So, well take each action, in turn one at
time, and do the following. We will look at the child you reach by
taking action a at that node h. That's called sigma h of a.
So sigma of h of a is simply another node, 1 of the children of node h that
you arrive at by taking that particular action a.
And we will recursively look at that vector associated with that child and we
will, keep it at that var-, at this variable
called util.at.child. So we have 2, 2 vectors, notice.
Best.util and util.at.child. Best.util is the best we found so far.
Best for a particular agent. And util_at_child is what we found at a
particular child will be going over all the children one at a time and if ever
the util_at_child is better for the agent than the best.util so far.
We'll update the best_util. That's what's going on here.
Here. So this is what this says.
It says util_at_child is a vector. So look at the element of that vector
corresponding to the agent. We care about the agent in, node h.
If the util_at_child is better for that agent.
Given the best.util, what you've found so far is simply updated.
Update best.util to be util.at.child other wise leave it unchanged and so in
this way we're cycling through all the children and finding from that agent
corres- that to whom [UNKNOWN] from his point of view Which of all the vectors
are bests and again the intuition is you will take the action leading to that
child and updating that that vector accordingly and that's why, when we're
done, we're returning the best we found so far called best.util.
That is the Backward Induction procedure. And notice that we don't have to return a
stradegy, just return the the list of payoffs.
And, in some sense it's you can think of it simply extending the pay off from the
nodes to all the internal nodes. But even though we don't explicitly
return the equilibrium strategy. The one that'll be sub game perfect.
It's easy to read it off, those numbers. Because, at every node, the agent will
take the action that leads it to the nodes.
The child node with the best utility. From his point of view.
So, that is the subgame perfect, That is a procedure for computing subgame
perfect equilibria. The Backward Induction.
And if we look at the special case of zero subgame, it's simplified a little
bit. Because then, there are only 2 players.
And the payoff for one is minus the payoff to the, to the other.
So really, we only need to keep track of 1 number associated with each, with each
node. And so, there's less bookkeeping to be
done. And furthermore,
In, such zero sum games. And all win lose games are, are, are zero
sum. Game, for example chess.
There is a way to speed up the Backward Induction procedure, by the way, in the
zero-sum games we simply call it the minimax procedure, because we alternate
between minimizing and maximizing the value.
One player wants to minimize it, the other to maximize it and in fact there's
a way to speed up the procedure and we won't go into it here, but the intuition
is that as you're visiting a certain child, children of a given node, you may
find out that at that point there's no, no need.
To explore the remaining children of that node as we did in the Backward Induction
procedure, because intuitively it won't matter.
You've already found a value that means that this node that you've examining, the
current node will never be visited. And it's called the alphabetic al,
alpha-beta pruning procedure an optimization of the Backward Induction or
the minimax procedure for zero-sum games, and you are invited to explore it
elsewhere. There's one more thing I want to say in
connection with Backward Induction, and in fact The sub game perfetion and and
sort of two different things here and they're all keyed off.
The same example, the famous example of the centipede game, so this well known
example you have two players, they alternate turns.
We have player one moving And then player two moving, and then player one again,
then player two, and so on and so forth. But the payoffs are constructed in a
contrived way, so that they're gradually increasing.
And, you can imagine, its called centipede, because you can imagine rather
than having only Five legs, here you have a hundred.
They slowly arise so that the payoffs here are much smaller than the payoffs
here. And indeed very much so if you keep
going. But even though they rise, they are
contrived in a way that lead to only one sub game perfect equilibrium.
Players will defect in every place. Their [INAUDIBLE] will go down in every
place here. So the only outcome, subgame perfect
outcome is this one, where the first player defects immediately.
It goes down immediately. Which is of course similar to the
prisoner's dilemma, is a little counter-intuitive because had they only
had the good sense to keep going they would get, have got to something in the
ballpark of this or this, both of which are much better for both than here.
But nonetheless, when you examine you see that there's only one uhs, one, one, one
sub game perfect equilibrium here and fact one, only one equilibrium outcome,
namely this one. And you can see it by doing again the
back one deduction production procedure. If the game progressed and in fact
reached this node, what would player 1 do? Well, they would go down getting a 4
rather than a 3. Player 2 knows this.
So knowing that player 1 would go down, he'd rather go down because he would get
a 4 rather than a 3. Similarly here, player 1 knowing that
player 2 would go down. Elects to go down here because they will
get three rather than two and so on. And this is really the Backward Induction
argument. So on the one hand clearly a clear
account of what will happen in this game but there are two problems.
One of them is sort of simply experimental and common sense and the
other is, more theoretical. On the pragmatic level, if common sense
simply tells you, this is not what's, what's going to happen.
The players will cooperate for a while. Until, at some point, in fact, somebody
would go down in the game. They know there's so much to gain by
going forward. They would, if you wish, take the chance.
And this intuition is for now repeatedly in experiments.
People do co, cooperate for a while until in fact eventually they, they defect.
So that's a problem for theory. But there's also a, another problem
that's theoretical in nature. [INAUDIBLE].
So we know that the only subgame perfect equilibrium is one where agents defect.
Go down at any point in time. Now, imagine that the game starts, and
player one goes across. Does not go down.
What does player two do? Well, on the one hand, you could say, well.
The only sub game perfect equivalent/g to go down, they should go down, because of
those [INAUDIBLE] Backward Induction argument and it'll tell them that the
best thing for them is to go down. But that same argument told them player
two that player one would of gone down right off the bat, but they didn't.
So, Maybe they won't go down again. But what will they do? And fundamentally
what happens here is you an event of going across that the standard theory
tells you Will happen with zero probability.
How do you condition on a ca, an event that had a zero probability prior prior?
there's there's, there's a big literature on this.
It's a very interesting deep issue[UNKNOWN] in game theory.
We will not delve into it any deeper, but it's interesting to know.