0:00

Folks, welcome back. this is Matt again. And I'm going to talk a little bit about

Â now about repeated games when players discount the future payoffs. And let's

Â talk a little bit about more, more about that and what that means. So when we're

Â looking at discounted repeated games, the idea is that we're, we're looking at games

Â where there are players playing the same game over and over and over again. But

Â instead of looking at the limit of the means in terms of some limit of the

Â average of, of what the payoffs are going to be in the distant future, instead

Â people are, are looking at value in today versus tomorrow differently. So, the idea

Â of discounted repeated games is the future's uncertain. You're often motivated

Â somewhat by what happens today. and you trade off today versus the future. So,

Â it's not the infinite future that you care about, but you say, I really care about

Â today. I care about it a little bit more than tommorow. So maybe tomorrow's value

Â is say 80 or 90% of what today's value is. And that means that the next day is worth,

Â if I say today's worth one, tomorrow's worth 0.9, the next day's worth 0.81 0.72,

Â etc. So, things are, are decaying exponentially

Â in terms of discounting. And, so the idea here is, is if I misbehave today, now I

Â have to think about what are the, how are people going to react to that? So, if

Â we're trying to support cooperative behavior in a prisoner's dilemma, I can

Â behave today, or I can cheat and deviate, defect. And if I do that I'm going to get

Â a temporary gain, and then I'm going to possibly be punished in the future. So,

Â the kinds of questions that are going to be important here is, will people want to

Â punish me in the future? Is it going to be in their interest? how much do I care? Do

Â I care about it? what's my discount? Do I care a lot about the future or just a

Â little bit? so we're looking at a stage game. So again, a stage game just take a

Â normal form game, we're going to play that repeatedly over time. And now each player

Â has the discount factor. So , Player 1 has a discount factor and so forth. Discount

Â factor is going to be taken to B and 0,1. Generally, we'll take beta i to be

Â strictly less than one so that it's of more interest. If it's equal to zero, then

Â it means that you don't care about the future at all. It's basically just a one

Â stage game. So generally, the interesting case is going to be when players care

Â somewhat about the future, but they care more about today than tomorrow and so

Â forth. often in these games, people look at the situations with a common discount

Â factor so everybody has the same discount factor which will make things fairly

Â easier in some cases. And then, the idea of discounting is then the path that you

Â get from a whole sequence of actions. So profile of actions, a one played in the

Â first period and, and at in the t-th period, and so forth. What you just do is

Â you sum up these payoffs, but now you weight them by an exponentially decreasing

Â function which is the discount vector raised to the power of t. So, if I care,

Â you know, I, I get if, if this payoff was one every day, I'd be getting 1 today plus

Â 0.9 plus 0.81 plus 0.72, etc., right? So that's the idea. Okay.

Â So when we look at these games again players can condition their play on past

Â history. So history, a finite history of, of some length t is just going to be a

Â list of everything that's happened at every date. so here, a1 is equal to a

Â profile of what every player did in period 1. So in the first time we played this

Â game, what did everyone do? And generally, at is going to be what everybody did at

Â time t, right? So, we've got at1 to atn. so these things are vectors, and they tell

Â us what everybody did in the first period, what everybody did in the second period,

Â and so forth. And then, we can talk about all finite histories. So all possible

Â histories that I could be faced with when I am playing this game, all the kinds of

Â things I'm going to have to think about. What am I going to do if this happens?

Â What am I going to do if that happens? So in an infinitely repeated game, I've got

Â all these histories. What I'm going to do in each circumstance? So a strategy is a

Â map from every possible history into a possibly mixed strategy, over what I can

Â do in the, in the given period facing the giving history. So, if we're looking at a

Â prisoner's dilemma, people can either cooperate or defect in a, in a given

Â period. So, if we're thinking about a history of a given length 3, one

Â possibility would be the following. We both cooperated in the first period. Maybe

Â Player 2 defected in the second period. And then, both of them defected in the

Â third period. So that would be a possible history. And then, they could say, okay.

Â Now what are we going to do in the, in the fourth period. maybe we'll let bygones be

Â bygones and try and get back to cooperation. Maybe we'll just defect,

Â we're angry at each other, who knows. Okay.

Â So, for strategy for fourth period would be what, what you do after you've seen

Â different histories of the 1st 3 periods. So, sub-game perfection again is same as

Â usual. Profile strategies that are Nash in every subgame. What's a subgame here?

Â 5:28

subgames are just starting some period and talk about what remains. So, it has to be

Â a Nash equilibrium following every possible history. So, if you take some

Â history, start at that point, it has to be Nash for forever on. So strategies now are

Â going to be specifications of what we would do in every situation. And then,

Â we've got Nash in every history. one thing to check and it's important here is

Â repeatedly playing a Nash equilibrium of the stage game. so just find a static Nash

Â equilibrium of whatever game it is. So, for instance, defect, defect in the

Â prisoner's dilemma. Just play that forever, no matter what's happened in the

Â past, it's always going to be subgame perfect. So, for every possible history,

Â everybody's going to say that they're going to play the Nash equilibrium forever

Â on and going forward. you can check that that's subgame perfect equilibrium, right?

Â That's going t o be Nash in every possible subgame. So check if everyone else is

Â doing that, I wouldn't want to deviate. So, just think a little bit about the

Â logic of that. Because it's, there's a lot of possible subgames to think about but

Â you can convince yourself that that's true. Okay.

Â So, so solving the repeated prisoner's dilemma, let's think about it a little bit

Â with the context of discounting now. So, let's suppose that what we want to do is

Â we want us to stay in cooperation, right? So we've got our standard prisoner's

Â dilemma. I put in payoffs here of 3,3 for cooperating, 5,0 from you defecting, and

Â the other person cooperating, and 1,1 if you both defect. So, the only Nash

Â equilibrium of the static game is defect, defect with payoff 1. We want to support

Â 3, 3 if we can. So, cooperate as long as everyone has in the past, and a, defect

Â forever in the future if anyone deviates. So, when is this in equilibrium? Okay? So

Â clearly, that's not in equilibrium. So if we set beta i equals 0, for both players,

Â we can't make this work, right? Because I don't care about the future, nobody cares

Â about the future. then we'd end up with so many defect, defect in every period being

Â the only subgame perfect equilibrium. Players only care about the present

Â they're always just going to miopicly defect. They don't care about the future

Â so nothing's going to work. So, the question here is, for which betas can we

Â sustain this kind of strategy, which is cooperate as long as everyone has? And if

Â we ever, if cooperation breaks down, then we just say, forget it. We're going to

Â defect forever after. Okay. Let's have a peek. So if you cooperate and

Â the other players cooperating. If no one's failed to cooperate in the past, what do

Â we get? We get 3 in perpetuity, right? So we get 3 plus beta times 3, so take a

Â common discount factor for now. beta squared times 3, beta cubed in the third

Â period, and so forth. So, in perpetuity, if you remember your sum of series, that's

Â just, the value of that is just 3 over 1 minu s beta. Okay.

Â What happens if I defect? And people were playing this, this grim trigger strategy.

Â Well, everybody else is cooperating. The other person's cooperating in the first

Â period. So, I'm going to manage to change from cooperate to defect. I'm going to get

Â a 5 in the first period. But then, they're going to see that. And the next period,

Â they react to it. They defect, and they say they're, that everybody's going to

Â defect forever after. So then, in perpetuity we get a bunch of ones, right?

Â So, what do we get? We get 5, and then beta times 1, beta squared, and so forth.

Â And if you remember your sum of series here, this is just beta times a 1 plus a

Â beta 1, and so forth. This is beta times 1 over 1 minus beta. So, if I deviate, what

Â happens is, in the first period, I get a gain, but then I lose in the subsequent

Â periods. So, there's a trade off. And how big that trade off is depends on the size

Â of the discount factor. So we've got these two different payoffs. We can look at the

Â difference between these. if I stay cooperating instead of defecting, I'm

Â giving up 2 today. I could, I could gain by defecting. But then, I keep the

Â benefits of cooperation in the future. So, I don't ruin things and that, that means

Â I'm getting a bunch of 2's extras in the future. And so, when you look at this, the

Â value of this is beta times 2 over 1 minus beta minus the 2 I'm foregoing today. And

Â when do I want us to keep cooporating? As long as this is non-negative, right? If

Â this becomes negative, then I'm worse off today by cooperating. I might as well just

Â defect. So, difference is non-negative if this thing is such that beta is greater

Â than 1 minus beta, or basically beta needs to be greater than or equal to a half. So,

Â if you just go through the algebra of solving this inequality, you'll get beta

Â greater than or equal to a half. So, as long as people care about tomorrow, at

Â least half as much as today, they're going to be willing to cooperate in this, in the

Â repeated prisoner's dilemma, with these particular payoffs that we looked at

Â before. So, when we're looking at this, this payoff, this payoff structure here,

Â then we've got a situation where beta has to be if each beta i is at least a half,

Â then they can sustain cooperation in this finitely repeated, or infinitely repeated

Â prisoner's dilemma. Okay. So, let's change the numbers a little bit

Â and see what happens. So now, let's try and make defection a little bit more

Â attractive, right? So, instead of 5, we'll make it worth 10 to defect. So now,

Â defection looks really attractive. what, what has to happen? Well, we can go

Â through the same exact calculations we just did, but we're just going to change

Â the numbers, right? So, we've got the same if cooperating in perpetuity is worth the

Â 3 over 1 minus beta. The only difference is, we're getting a higher number here,

Â and then we're still going back to the defect. So there's a, a little bit more

Â temptation today. And when you do the differences here, you know, you get the

Â same kind of thing. Except now, instead of a minus 2, we've got a minus 7 difference.

Â you're foregoing 7 units for not defecting today. So, when you go through and solve

Â for that, now beta has to be at least 7 9ths before players are going to be

Â willing to cooperate. So, you have to care about tomorrow at least 7 9ths as much as

Â today, okay? And so, you can see the basic logic here, right? So there's tradeoffs of

Â punishments tomorrow versus a good payoff on today. And that, the, the, whether or

Â not something can hold together as an equilibrium, what's it going to be

Â determined by? We have to know how big is the future versus the present. How

Â tempting is the defection versus the current. the, the what, what we're doing

Â in the current period. how big is the threat. So, how bad is it if, what,

Â whatever the thing that we're resorting to in the future. How bad is that in terms of

Â the trade-off. All these things are going to matter in terms of holding together

Â cooperation in these kinds of settings. And that gets back to the discussion we

Â had a little earlier about say, OPEC, right? There's a temptation to pump more

Â oil today. how much do you care about the future? What's your beta? What's the

Â reaction going to be? If I start pumping more oil, how are they going to react to

Â that? Are they going to start pumping more oil and driving the price down? How much

Â is that going to hurt me? All of those things matter, and they determine whether

Â an equilibrium can hang together or not. Okay.

Â So, basic logic play something with relatively high payoffs. Even if it's not

Â an equilibrium of a static game, you can sustain it. and you sustain it by having

Â punishments. If anyone deviates, you resort to something that has lower payoffs

Â at least for that player. And the important thing is that it all has to be

Â credible, has to be an equilibrium in the subgame that goes forward in order to make

Â that work. And, it has to be that the lower payoffs in the future are enough to

Â make sure that you, you know, you deter people from deviating in the present.

Â