0:00

And welcome back, folks. So now we're going to talk a little bit

about learning. So in terms of our course we've done a

bunch on background and fundamentals. We looked at different models of network

formation. And now we've moved towards trying to

understand how network structure impacts different kinds of behaviors, and so

forth. We talked a bit about diffusion, and now

I want to focus in a little bit on learning.

And in terms of learning, we're going to basically look at two different models.

we'll look briefly at Bayesian learning, and then we'll look at, at what's known

as the DeGroot model. And you know, there's a whole variety of

models out there these days and different ways of, of modeling learning.

And, you know, it's going to depend on who observes whom, and what the network

structure looks like, and there's hybrid models out there.

What we're going to do is, is just look at these two to get some flavor of these

things. And the DeGroot model turns out to be a

very useful one. The Bayesian one has interesting insights

and interesting questions associated with it.

So we're going to start with the Bayesian learning, and we'll talk about repeated

actions where people get to observe what each other are doing.

So I'm deciding over time what I'm doing, and I can see what my neighbors are

doing. And there's going to be interaction

between us and, and in terms of what I learned from what my neighbors experience

and, and so forth. So we'll look at that a bit, and then

we'll move into the DeGroot model. And the DeGroot model is going to be one

with repeated communication, where people can, can keep talking to each other.

But it's going to be a very naive way of updating.

So what I'm going to do is, is essentially just keep taking weighted

averages of, of information that I get from my, from my friends.

And I'll form opinions by, by continuing to average things.

Even though I might end up hearing from things from I might end up over-weighting

or under-weighting. So it'll be a, a more naive model, where

I'm not fully rational in a, in a Bayesian sense.

1:56

so Bayesian learning, I mean these people are, are probabilistically sophisticated.

you, you, you take into account information, you update a posterior using

Bayes rule, and then, and maximize some payoff based on that.

So the DeGroot model is going to much more naive and actually easier to work

with in, in many ways. And there's some experimental work these

days which you can find in, in some of the references, which compare, you know,

Bayesian models and learn, and DeGroot models and other models.

And you know find that, that it, humans are somewhat rational in what they're

doing, but they have limits to their rationality and, and they don't look like

they're necessarily full Bayesians. and some of these naive models,

alternative models, can be better at actually capturing human behavior.

Okay, so let's start with the Bayesian model as a, as a very useful benchmark

and an important point to consider. And the idea here is, you know, first of

all we can ask you know, will society converge?

So will it be that eventually everybody converges to doing the same thing or, or

having the same beliefs? will people learn and aggregate

information properly? So imagine that new a technology comes

out, and we're not sure whether this is a good technology or a bad technology, and

some people start playing with it and using it, and other people can see

whether others are, are enjoying it. will eventually, if it's a good

technology, and a better one than the old one, will it take over, or will it not?

Will, will people not necessarily learn, under what conditions might that happen?

so, we can ask information, you know, questions about whether or not people are

going to accurately aggregate information.

And so, I'll start with a model by Bala and Goyal from 1998.

And it's a very simple setting. A very natural one to, to analyze.

So there's a number of people in some network.

And we'll take this network to be a single component, so people are all

going to be able to, to path-wise be connected to each other.

4:57

Whereas B is uncertain, and it pays two with probability p and zero with

probability one minus p. Okay?

And let's suppose, to make things simple, that people don't mind risks, so

basically they just care about the expected value.

So they know they can get one from, from choosing action A.

And action B is either going to payoff two, with probability p, or zero, with

probably one minus p. So basically, B is better if p is bigger

than a half. This is bigger, bigger than one then they

should choose B. But if p is less than a half, then they

should choose A. Alright, so very simple setting.

B's better if p's bigger than a half. A's better if, if p's less than a half.

But we don't know what p is. So it's a new technology, we're

uncertain. Maybe we have some prior information,

maybe we have some guess at what p is, but we don't know for sure.

So these individuals have to experiment a little bit with p with B, sorry, to find

out whether p is, is good or bad. Okay?

So, there're going to be choosing actions over time.

And the learning model is, is going to be as follows.

Each period, a person makes a choice between A or B, and each period you get a

payoff, okay? And so, if I choose A, I get a payoff of

one, for sure. And if I choose B, I'm going to get a

payoff of two with probability p, and zero with probability one minus p.

Okay, so I, I try this new technology. Maybe I'm a farmer.

I try a new thing, and I either get the higher payoff or the lower payoff with

probability p or one minus p, 'kay? So, each period I'm going to do that, and

what does the network do? The network is such that what I also get

to see is I get to see my neighbors choices, and their outcomes.

Okay, so if I'm a given individual and I have two friends I chose A, I got a

payoff of one. this individual chose B, they got to

payoff of zero. This person chose A, they got a payoff of

one. this person over here chose B and got a

payoff of two. So basically, what I learn, I learned I

chose A, they chose A, I get to see that this person chose B and got a payoff of

zero. This person chose B and got a payoff of

two. And I'm, everyday I'm going to get all

this information, and I'm going to store that information over time.

And over time, I'll begin to, to, to learn.

So if I get, you know, if I begin to see lots of people choosing B, and lots of

people getting two's, I'm going to think, well it's probably a good thing.

P's probably pretty high. If I see lots of people choosing B, and

lots of people getting zeros, then I'll downgrade my belief on p, and I, I would

be more likely to pick A. And what people are going to do in this

setting, is they're going to maximize their overall stream of expected

payments, right? So I'm going to get, you know, a dollar

today if I chose A. I get some random amount if I choose B.

And every day I'm, I'm making this, this choice.

And I have an expectation of what this thing looks like, conditional on what

I've seen up to a, a point in time. And so I'm getting some prior, some

payoff that, that pi sub i at time t, will be the payoff I get from following a

certain strategy of choosing As and Bs. And they'll be some delta less than one

greater than zero, and I'm going to maximize that sum of, of discounted

payments, okay? And let's suppose that p is unknown, so I

don't know it initially. It takes on some finite set of values.

So maybe it could be 0.1, 0.2, 0.3, etc. Right, so it has some finite set of

values that I'm trying to guess, whether B is a good thing.

Okay, that's the structure. So now let's talk about some of the

difficulties with this. What, what are the real challenges in

Bayesian learning? So first of all, let's, let's think of

the following. So let's, let's suppose that I, I know

what the network looks like. I'm person one, here's person two.

person two is connected to some other individuals, say three and four.

9:35

But, it also tells me things about what they might be seeing from other

individuals, right. So, so over time they've been seeing

three and four are doing. And I don't get to see what three and

four did. But I know that two saw three and four.

And I know that three is influenced by five, and four is influenced by six, and

so forth, right? So there's some network out there of a

bunch of individuals. And let's suppose, for instance, this is

just a, is an example. Let's suppose that I've been choosing A

for awhile, I choose, I, I see that this person is choosing B for awhile.

And so, and let's I suppose I see them. You know, I see person two choose B, they

get a payoff of two. I see them choose B, they get a payoff of

two. B, they get a payoff of two, so I'm

thinking, wow, this is really great. and so then I switch to B, and I, I keep

seeing them get twos. And then suddenly, I see them switch to

A. What would that tell me?

Well, now I have to think, well why would they have switched to A?

It's probably because their own experience has been pretty good.

It must be that they see some bad experiences somewhere else, right?

So now I have to think, what are all the possible experiences they could have had?

Well, it could be that they saw three getting bad payoffs, or four getting bad

payoffs. Or maybe it's that they saw three switch

from B to A. Or, you know, so they saw both three and

four switch. So I have to, you know, in order to think

about this problem, it's a very complicated problem.

I have to think about, what are all the scenarios that could be considered in

terms of all the histories of As and Bs, and what everybody is seeing, and how

does that impact each person's decision. And what, what should they do in response

to that, 'kay? So the updating question here is actually

fairly complicated. So I can make all kinds of indirect

inferences just based on what somebody's strategy is.

Okay, that's one, one challenge. What's a second challenge here?

11:38

A second challenge is that there also could be some interaction.

Let's suppose I start with with a belief that p is less than half, right,

If, if I was alone in the world, even if I believed that p was less than a half, I

know that I'm going to be wrong for a long time.

Its still worthwhile for me to try B a few times, just to see, to experiment,

and to see if p, if, if in fact maybe I'm wrong and maybe p is, is higher than a

half. So even if I start out with a prior, I,

it could be that I want to experiment. Right, it could be that I want to try B a

few times just to see what happens. Then once I've learned that could be very

valuable information, because if it pays off two a bunch of times in a row, I'm

going to want to take B. And that's going to give me payoffs for

the rest of my life. So trying something out can be very

worthwhile. And so, being fully rational, even if I

start with p less than a half, as long as I'm making choices over time, there's an

option value for trying this thing and and, and that's going to be positive.

And I might want to try that and, and experiment for a while, and see what

happens. Okay, so, so now there's an

experimentation that comes into play as well.

12:54

Okay, well now let's suppose there's two of us, person one and person two.

I'd like the other person to experiment. If I think p's less than a half, why

don't I let them try it, and I'll sit by and just choose A?

And if, if they experiment and play with B for a while and it pays off well, then

I can switch to B. But I don't have to pay the cost of the

experimentation. I want a free ride.

'Kay? Now that becomes a game, which is

actually going to have a fairly complicated equilibrium, especially when

you start putting that game in a network with all kinds of players.

And we begin to have the players connected to other players, and now we

look at this, this simultaneous decision of, who's going to choose B in this

period? Who's chose it in the last period?

What are our beliefs? What do I think everybody else's belief

is, and so forth. So when you, when you get to, to overall

looking at this game, the game becomes very complicated.

both because of the strategic aspects, and because of the, the, Bayesian

inference. And so, now you can begin to see why it

might be that in fact when we put humans in the laboratory and, and ask them to

play games, or to make these kinds of choices, that they might not behave in a

fully Bayesian manner. and its just, just complicated to do.

It's, it's hard to even write the model down and solve it.

Okay so in fact, the way that let me just say a little bit about how this is solved

in terms of the Bala and Goyal approach. So what they did is assume that players

are not going to be strategic about this, and each person is just going to choose

things which maximize their own payoff and, and not worry about the gaming

aspect of it. And secondly it, I'm not going to infer

things from the fact that other people are making different kinds of choices.

I'm just going to, to keep track of what have I seen in terms of my histories of

As and Bs, okay. So I just, I just keep track of my, what,

whatever I've seen through myself and my neighbors.

And I'll just keep track of what are the relevant payoffs, and important, most

importantly, how many times have I seen B payoff two, how many times have I seen it

payoff zero? And then I can update on what I think p

is, just based on, on those observations. And I'll ignore everything else, and I

won't do the complicated updating, I won't game things.

I'm just going to look at, at the twos and zeros, and decide whether or not I

want to switch from A to B or B to A. Okay, so let's look at that.

Okay, so what's a proposition you can prove then fairly directly?

the first thing you can show is, let's suppose p is not exactly a half, where

I'd being different between choosing an action.

Then with probability one, there's a unique time, or there'll be a time, a

random time, sorry. such that all agents in a given component

play just one action and play the same action from that time onward, okay.

So bascially what's, what's happening is that as long as p's not exactly a half,

and we would be sort of indifferent we're basically going to eventually all end up

choosing the same action. And we'll just lock in on some action

and, and play that forever after at some time, okay.

So, so, sometime we'll all eventually converge, and, and play the same action

forever. Okay, so that's the nature of the

proposition. So let's talk through the intuition and,

and basic proof behind this, why is this true.

and I'm just going to sort of sketch out the proof, it's, it's fairly easy to, to

fill in the details here. So let's suppose that, that this weren't

true, right? So if, if it wasn't true, then basically

somebody's going to be having to switch back and forth infinitely many times,

otherwise we'll eventually converge. So somebody's gotta be going back and

forth between A and B infinitely often. And then, in particular somebody let's

suppose we just have one component, and this just works, you know, regardless of

which component you're looking at. So, let's suppose somebody is playing B

infinitely often. Okay, so they, if we don't converge,

somebody's gotta be playing B infinitely often, okay.

Now we can use the law of large numbers. So law of large numbers is going to tell

us that if somebody plays B infinitely many times, then they're going to come to

get an arbitrarily accurate estimate of what p is.

So with the probability going to one in time, they will, their belief will

converge to p. And so, what does that mean?

Well, in order for them to keep playing B, if their belief about p is becoming

arbitrarily accurate, then it must be that p is converging to bigger than a

half, otherwise they would stop, right? So over time, they're good, they're good

Bayesians, they know how accurate their belief is.

They would either converge to p above half or not, or below half, because it's

not allowed to be exactly half under our assumption.

If it's above half, then they'll keep playing it.

If it's below half, then eventually they would stop playing it.

Because now they'd be arbitrarily accurately convinced that it's, that it's

not good. So if it's not good, they should learn

that, and they'd stop playing it. If it is good, they'll keep playing it.

Okay, so it must be that if they do play it infinitely often, then it's gotta be

the case that they're converging to the good belief.

Otherwise they would've stopped. Okay?

So, now this means that, that they have to be converging to the true belief or,

or with probability one that the, the true p has to be bigger than a half.

so then, everybody who sees this person is actually going to see this sequence

played. They're going to also see B played

infinitely often. They're also going to have to converge to

the belief that P is bigger than a half. And so they should all start playing B,

right? So if this person is, is learning that B

is, is good, then their neighbors are all going to have to converge.

Then these people are all going to see B infinitely often and converge to p.

Their neighbors are going to have to converge, and so forth.

So the neighbors of agent must play, then all agents must have to play B.

19:13

So it just has to spread out, okay? So, if, if anybody's going to play B

infinitely often, then it's gotta be that it's a good thing, and you learn.

If not, then we've stopped, and everybody had to have played A.

So, either somebody plays B infinitely often, in which case, we converge to B.

Or they don't, in which case everybody's converging to A.

So that gives us a proof that basically we're going to get a convergence.

And we're going to converge to either all playing B or all playing A.

Well, does that mean that we always converge to the right action?

That if, if B is, if p's really bigger than half we converge to B, and if p is

really smaller than a half then we're going to converge to A?

Well, let's suppose that p is really bigger than a half.

Is p is bigger than a half, then B is the right thing to do.

We should really be playing B. so, then we will play the right thing if

we actually converge to that. But its possible, that we might not

converge to that. And how could that happen?

That could happen if we all start pessimistically enough.

And we just happen to get some bad draws on, on B, initially.

So it's possible that everybody gets some bad draws on B, stops playing B, and then

we never learn after that, right? So, so even when p is bigger than a half,

it's possible for us to converge to A. on the contrary, if A is the right

action, then we've gotta converge to the right action.

Because that means that now p is less than a half.

21:05

if B is the right thing to do, we'll eventually converge, but we could all

stop playing B too soon, and we might end up just converging to B.

Okay, so, so, we will all converge to doing the same thing in this model, but

whether it's the right thing or not depends on whether B is the right thing

or A is the right thing. If A is the right thing, we'll definitely

converge to the right answer. If B is the right thing, we might or

might not, depending on what our prior distribution is, and whether we get good

luck in the initial draws. Okay you could enrich this model so that

you have different priors for different individuals.

You can actually start specifying a prior, so what's different people's

priors or prior beliefs. And then the probability of converging to

the correct action, of converging to B, for instance, when it's the right thing

to do, can be made arbitrarily high. Basically if, if, you know, if we add

many actions, as long as each action there has somebody who initially has a

very high prior that that action is the best one, then we'll get enough

experimentation in these different actions.

So that we'll learn about these different actions.

And society can, can learn arbitrarily accurately, as long as there's somebody

who's really willing to try out every technology.

And the case where we might fail to learn is going to be a situation where nobody's

convinced enough to begin with to, to give it a long enough try.

We might end up not learning about it. Okay, conclusions.

where did we end up in this, in this model?

we all end up choosing the same actions, so we reach a consensus.

it doesn't necessarily mean that we all end up with the same belief.

Because we're going to have different observations, so it might be that one of

us stops with a probability, you know our probabilities differ on whether B was

good or not. we, we might end up with different

beliefs, but we all believe that it's, we're all pessimistic enough to stop.

you can do speed of convergence kinds of results here.

You could, you know, go through and, and with depending on whether B is good or

bad, sort of do computations. There you have to actually explicitly

solve though for these optimal actions as a function of what the histories look

like. And there's a number of, of theorems in,

in studies of, of two-arm bandits and other things in probability theory, where

you can get rates of convergence on these things.

law of large numbers, especially in this kind of Bernoulli world, have well

established speeds of convergence, so you can calculate those kinds of things.

and, and relatively, you know, these things will happen relatively quickly in

terms of the number of observations giving good information here.

limitations, okay so there, there's a number of limitations in this kind of

model. one is that, that, you know, basically

everybody was getting the same payoffs from A or B.

And so, when you think about new technologies in the real world new

technologies might be right for some people and not right for other people.

And so, when you start putting in that heterogeneity, then it's much harder for

me to necessarily infer, you know, maybe my, my, my neighbor is getting a good

payoff from this, but will that be the same payoff I get?

That complicates things. And that heterogeneity means that the

learning might take a very different form than what it did in this model.

here we've got repeated actions over time, so everybody keeps taking all these

actions and trying all these different things.

There's a lot of things in which we're not trying things repeatedly over time

we're just learning about them slowly over time.

So with things like global warming, we're going to get one, one go at it.

And you know the, the, it's not as if we can just try infinitely often

experimenting with different things. And you know, if we get it wrong we've

got it wrong. so, so the, the repeated actions over

time, and getting feedback from that is, is a situation where we get lots of

information over time and, and incoming information.

It might be that information, we get it in, in slower clumps or different bits.

here this is a very stationary environment.

It could be that the environment changes, which make things even more complicated.

And finally and, and probably most importantly in this model we were not

really able to take the network into account.

the network really didn't play any role. All the arguments were just that, you

know, somebody would eventually learn, and the neighbors have to learn, and the

neighbors of the neighbors have to learn, and so forth, so it's a simple induction

argument. And we weren't able to say anything about

what was going on in terms of one network versus another.

Now, you could do simulations and see whether speeds are faster in one versus

another. but we can go to other models to try and

get a better feeling for exactly how the network works on that.

And that's what we'll, we'll do next when we move to the DeGroot model.

That'll bring in the network very explicitly and allow us to do a lot of

calculations quite easily. So that'll be our next subject.

We'll start looking at the DeGroot model where network structure's going to play a

much more prominient role in the learning process.