0:19

But how can we measure these paths, and how short are they really?

Â So in the 1960s,

Â a researcher named Stanley Milgram was wondering about this very question.

Â And he set out to create and experiment that would allow him to measure how

Â long these paths are, and whether or not they really exist.

Â And so the set up of the experiment was the following.

Â He chose 296 people at random as starters, all over the US, and

Â asked them to forward a letter to a target person.

Â This target person was a broker in Boston.

Â And the instructions for these starters was that,

Â they were to send the letter to the target if they knew him on a first name basis.

Â Of course, since these were randomly selected people,

Â most of them didn't know this person on a first name basis.

Â So if they didn't know him, their job was to send a letter to someone else

Â along with instructions, who they knew on a first name basis,

Â was more likely to know the target on a first name basis.

Â And the instructions contained some information about the target

Â like the city where he lived, and his occupation.

Â So then, what they did is, they checked how many letters

Â actually arrived at the destination and how many hops they took to arrive there.

Â 1:30

And so, these were the results.

Â Out of all 296 starters, 64 of them reached the target.

Â So the median chain length, the number of hops that it took for these letters to

Â get there was 6, which was consistent with this phrase of six degrees of separation.

Â Here in this plot, we can see a histogram of the chain length for

Â the letters that actually arrived, and we can see that

Â they took anywhere from 1 to 10 hops to get there, but the median was 6.

Â And so the key points here are the following.

Â So first of all,

Â a relatively large percentage of these letters actually arrived.

Â If you think about the fact that these were kind of randomly selected, and

Â that people could drop out of this and

Â say I'm not even going to try to forward this letter to anyone.

Â And so despite all of these things,

Â more than 20% of the letters actually reached the target.

Â And the second is that for the ones that reached,

Â these paths were relatively short, right.

Â So median of 6, a single digit, in a network of millions and

Â millions of people, that seems pretty small.

Â The other thing that's interesting, although we're not going to focus much in

Â this part of it, is that people actually are able to find this short paths.

Â So not only do these paths exist, but somehow people are able to find them,

Â which is also got interesting.

Â 2:49

Now more recently, people have tried to answer this question but

Â without actually running an experiment,

Â instead looking at actual data in measuring how long these paths are.

Â Of course in 1960s, we do not have access to very large data sets of networks but

Â nowadays, we do.

Â So one example of this is,

Â looking at instant message communication among people.

Â So researchers took a network of 240 million active users on

Â Microsoft Instant Messenger.

Â And then they connected them if two users were engaged in a two-way communication

Â over a period of a month.

Â And so these defined a network, a very large network.

Â And the estimated path length in this network, so if you take any two people

Â at random and check what their distance between them in the network,

Â so the shorter the length of the shortest path, the medium is estimated to be 7.

Â Which is very close to what Milgram had found in the 1960s which was 6, and

Â it's also very small.

Â Here is the histogram of the distances of this particular network.

Â We again see that these tend to be small.

Â People have also tried to do this on using Facebook data.

Â And so what they did is, they looked at how the distances change over time, and

Â how they vary if you try to measure them on the full Facebook network versus

Â just a subset of it in a particular region, like in the United States.

Â And what they found is that, if you take the global network, the average path

Â length in 2008 was about 5.28, and three years later in 2011, it was 4.74.

Â So it seems like these path lengths are short and

Â they seem to be getting smaller over time.

Â And as you may expect,

Â if you take a smaller region like United States, then this tend to be even smaller.

Â 5:12

And when we look at real networks, like for example Facebook in 2011,

Â we find that the average clustering coefficient tends to be pretty high.

Â So here is a plot that has on the x axis the degree of a node, and

Â then on the y axis we have the average clustering coefficient.

Â And we find that, it decreases as degree increases.

Â So for people that have lots and

Â lots of friends, their clustering coefficient tends to be smaller.

Â But on average, it's still pretty large.

Â So if you look at nodes that have say, 20 to 50 friends,

Â they're clustering coefficient is somewhere in the 30s or so.

Â 5:46

In the Microsoft Instant Message network,

Â the average clustering coefficient was 0.13.

Â And in the actor network that we talked about before,

Â the average clustering coefficient is even higher at 0.78.

Â And so the thing to note here is that, I say that these clustering coefficients

Â are high because, if you imagine that these graphs were completely random.

Â Meaning, you take all of these nodes and whatever number of edges it has and

Â you kind of just assign them at random, then you would find that the average

Â clustering coefficient would be much, much smaller because there is nothing that's

Â sort of making these triangles appear, right?

Â And so, these clustering coefficients tend to be high.

Â We think that there is something happening in the formation of the network

Â that is sort of favoring triangle formation.

Â So we've noticed two things.

Â One, that social networks tend to have a high clustering coefficient and two,

Â they tend to have small average path length.

Â And so the next question is,

Â can we think of a model that generates networks that have these two properties.

Â Just like we did for the power law distribution,

Â we observe that the networks tended to have these power law distributions.

Â And then we covered the model that gives rise to these types of properties.

Â And that allowed us to think about, well,

Â maybe this model explains how these properties come about in the real data.

Â Can we do that same kind of thing here?

Â Well the first natural thing to do would be to check whether we already have

Â a model of network information, preferential attachment model.

Â Can we check if this particular model

Â has networks with high clustering coefficient and small average path length?

Â 7:22

So let's create one of a 1,000 nodes and parameter m of 4,

Â that means that each new node is going to attach to 4 existing nodes.

Â And let's see what its average clustering is.

Â Well in this case,

Â it's 0.02 which is pretty small compared to the networks that we had seen before.

Â Now for the average shortest path length, it is 4.16, which is pretty small,

Â it's pretty decent.

Â So it seems that it get's one of the properties but not the other.

Â Let's see what happens if we vary the number of nodes and

Â the number of edges per new node perimeter m.

Â And let's see what happens to the path link in the clustering.

Â 7:59

On the x axis, I have the number of nodes.

Â And on the y axis, I have the average clustering coefficient.

Â And we have different curves that represent different values of

Â the perimeter m.

Â And then, I have the same thing for the average shortest path.

Â Well we serve as these networks have a small average shortest path, and

Â the reason why is that, if you remember, these power law degree distributions have

Â the property that some nodes have a very, very high degree.

Â And so these high degree nodes act as hubs that connect many pairs of nodes, and

Â make sort of bridges between them.

Â So this is why we see that these average shortest paths tend to be kind of small.

Â However, for the clustering side, we see that as the number of nodes increases,

Â the average clustering coefficient also decreases and it becomes very small.

Â And so even at 2,000 nodes, we see that the average clustering coefficient is

Â very small, at like 0.05 or so.

Â And we have seen networks of millions and

Â millions of nodes that had a much larger clustering coefficient.

Â And so it seems like the preferential attachment model, while it has the small

Â average shortest path property, fails to have these cluster and

Â coefficient, high cluster and coefficient property.

Â And the reason is that, there is no mechanism in the preferential

Â attachment model that would favor triangle formation.

Â So it's natural that it just doesn't have that property.

Â So now let's talk about a new model called the small world model that actually gives

Â network that have these two properties.

Â So first, let's discuss how the model actually works.

Â So you start with the ring of n nodes, so basically you put n nodes in a circle,

Â and then you have each node connected to its k nearest neighbors.

Â So there's some parameter k to fix.

Â And then you have fixed another parameter p between 0 and 1,

Â and what you're going to do is, we're going to take each one of these edges

Â that we created in the first step and with probability p, we're going to rewire it.

Â So let's say we're taking the edge u,v with probability p,

Â we're going to find another node w, completely at random.

Â Or we're going to rewire it so that the edge u,v becomes u,w.

Â You do this for each one of the edges and you're done.

Â So let's walk through an example here.

Â We have 12 nodes, and in the example, k will be 2, so

Â each node is connected to its 2 nearest neighbors and p will be 0.4.

Â I will say that these parameters k = 2 and

Â p = 0.4 are not the typical parameters you would use for this particular model.

Â Typically, you have a k that's much larger, and a p that's much smaller but

Â just for the purposes of the illustration I am showing you here,

Â I'm going to choose these parameters.

Â 10:41

So what we have to do is, we're going to have to go through every edge and

Â decide whether we're going to rewire it or not.

Â And we're going to rewire it with probability 0.4.

Â And so let's stat with the edge LA and we have to decide whether to rewire or not.

Â And let's say we use some type of random number generator and

Â we decide that we don't rewire it, okay.

Â And then we go to the next one and this one, we won't rewire either.

Â We go to the next one and this one we will rewire.

Â So now we have to pick another node at random, and in this case, we would pick G.

Â And so this edge instead of connecting K and J, is going to connect K and G.

Â Okay, we go to the next edge, GI and this one we're not going to rewire.

Â The next edge, we will rewire, so we pick a node at random and rewire it.

Â Go to the next edge, we don't rewire it.

Â Go to the next edge, we don't rewire it.

Â Go to the next one, and we rewire it.

Â It becomes FJ.

Â Go to the next one no rewire, go to the next one, rewire so it becomes the DH.

Â Go to the next one, we rewire it and

Â go to the next one we don''t rewire it, and we're done.

Â So this is the network that these model produces, or at least one of the possible

Â instances of of these networks that can be produced with these parameters.

Â So now let's think about,

Â what happens to this network as we sort of change this parameter p,

Â which ends up being the kind of most interesting parameter of the network?

Â 12:12

Well here's an illustration from the original paper that

Â came up with this model.

Â Let's think about first what happens in the extreme cases.

Â So when p is 0, so we're looking at this type of network.

Â What we have is a completely regular lattice.

Â So there is no rewiring, and because every node is connected to k of its neighbors,

Â then there are lots of triangles that get formed locally, right?

Â Because, well it depends on the value of k, but if k is sort of large enough,

Â then you start to form many triangles.

Â And so this network will have pretty high clustering coefficient because it

Â purposely going to form triangles in the beginning.

Â And then nothing gets rewire, nothing gets changed, so

Â it has a pretty high clustering coefficient.

Â However, if you imagine there's been a very large network,

Â you can imagine that the distances between nodes can get very large, right.

Â So if you're in one side of the ring to get to the other side of the ring,

Â you can hop a little bit but there is no long bridge that can take you there.

Â And so, we expect that the distances would be long.

Â 13:15

Now let's think at the other extreme where we're going to rewire

Â every single one of the edges, and so that would be this network.

Â And so what's happening here is that, we're going to rewire every single edge.

Â And so this network is now completely random.

Â So we've created a bunch of long bridges.

Â And so presumably, distances are pretty small between different nodes.

Â But now, we kind of destroyed the sort of local structure that we had before.

Â And so now we probably don't have many triangles there.

Â And so while the distances are small, now the clustering also got very small.

Â If you're In between, if p is between 0 and 1,

Â then what you have is that some edges get rewire, so you create some long bridges.

Â And so the distances between nodes, the average distance gets reduced.

Â But the local structure depending on p can be maintained.

Â So you can maintain that high clustering as well.

Â So there's a sweet spot for the parameter p, where you can maintain both

Â the short distances as well as the high clustering, and so let's see that next.

Â 14:26

So here, I'm showing two plots where on the x axis we have the value of p, and

Â on the y axis, we have the average shortest path and the average clustering.

Â And we have curves for networks that have different parameters, k.

Â And so what we see is that, as p increases from 0 to 0.1,

Â notice here that we don't get anywhere close to p = 1.

Â So, this is staying with very small values of p.

Â What we see happening is that, the average shortest path decrease rapidly right

Â after sort of p is away from 0, it just drops down.

Â Whereas, the average clustering coefficient while it also decreases as p

Â increases, it decreases much slower.

Â So for example, an instance of a network with 1,000 nodes and

Â k = 6 and p = 0.04 has the following values.

Â It has a value of 8.99 average shortest path, and

Â 0.53 average clustering coefficient.

Â So for these types of values of p,

Â we can achieve both of the properties that we wanted.

Â The average shortest path being small, single digit, and

Â the average clustering coefficient being pretty large.

Â 15:54

Now let's wonder about the degree distribution of the small world network.

Â What does it look like?

Â Well, let's use the same code that we used before to

Â visualize the degree distributions of network.

Â This time using a small world network with 1,000 nodes, k = 6, and p = 0.04.

Â What we is that, it looks like this.

Â So most nodes have degree 6.

Â A few of them have 5 and 7, and I think maybe 1 or

Â various small number of them has degree 4 and 8, and that's about it.

Â And this makes sense because, well, the rewiring probabilities is very small so

Â most of the edges aren't going to be rewired.

Â So most of the nodes are going to stay with their degree of 6 that they had in

Â the beginning when the ring was first formed.

Â And because there's no mechanism that sort of makes some nodes

Â accumulate a very large degree, then none of them do.

Â 16:51

And so, while this model gets the two properties that we discussed,

Â the clustering coefficient and the shortest paths, it doesn't get the power

Â level degree distribution that we also observe in real networks.

Â And the preferential attachment model gets correctly.

Â There are certain variants of the small world network in NetworkX that

Â you can use.

Â So the first one addresses and issue that small world networks,

Â in general, can become disconnected.

Â So there's nothing that makes them so

Â that they necessarily are in a single component.

Â If you rewire too many of the edges,

Â you could end up with more than one connected component.

Â And this is sometimes something we don't want.

Â Sometimes we want to create a network that is connected,

Â where every node can reach every other node.

Â And so if that's what we need,

Â then we can use the function connected watts_strogatz_graph,

Â which has the same parameters except it has an additional parameter t.

Â And what it does is, it runs the standard watts_strogatz_graph model up to t times

Â until it returns a connected small world network.

Â So it kind of keeps trying different iterations of the model until

Â it gets one that's connected.

Â Unless it runs out of tries, which if it tries more than t times,

Â then it returns some type of error.

Â There's also the newman_watts_strogatz_graph,

Â which is very similar to its small world model.

Â But instead of rewiring the edges, when it's time to rewire an edge,

Â it actually adds a new edge and leaves the other one still in the network, and

Â still does this with probability p.

Â So instead of rewiring, it adds additional edges.

Â In summary, today we looked at real networks and

Â we found that they appear to have a small shortest path.

Â And that they tend to have high clustering coefficient.

Â And we found that this preferential attachment model that we talked about last

Â time produces that works that have small shortest paths, but

Â they also have a very small in clustering coefficient.

Â So then we talked about a new model, the small world a model,

Â that starts with a ring lattice with nodes connected to the k nearest neighbors, so

Â it starts out a very high clustering.

Â But then it rewires some of the edges with probability p.

Â And we found that for some values of p, namely for very small values of p,

Â this small world networks have a small average shortest path.

Â Because new long edges are added to that network and those create bridges for

Â nodes to reach each other.

Â And they still maintain these high clustering coefficient,

Â which matches what we observe in the real networks that we look at.

Â However, the degree distribution of a small world network is not a power law,

Â as we had also seen in real data.

Â And on NetworkX, you can use the function watts_strogatz_graph or

Â some of the other variants that we covered to produce small world networks.

Â And that's all for today, I hope to see you next time.

Â