0:07

Hello. As we have seen before,

the degree of a node in an undirected graph is the number of neighbors that it has,

but sometimes we're not interested in that particular degree of a specific node.

We're interested in seeing how the degrees of

all the nodes are distributed across the whole network.

For example, here in this network,

we see that many of the nodes have degree two,

some have degree three,

and only one of them has a degree four,

and one has degree one.

We like to see how these are distributed over the network, and when we do,

we look at the network's degree distribution which is

the probability distribution of the degrees over the entire network.

If we let P(k),

where k is degree,

be the degree distribution of this network,

we'll find that P(k) has the following values: P(1) would be 1/9 because only one node,

node E, has degree one out of all nine nodes;

P(2) will be 4/9 because there are four nodes that have degree two over the nine nodes;

and then P(3) is 3/9 or one-third because three nodes have degree three.

We can plot the degree distribution of a network

using Network X in the following way: first,

we use the function degrees which returns a dictionary where

the keys are nodes and the values are the degree of the nodes;

and then we construct a list of sorted degrees (so this would

just have the degrees of all the nodes in the network in a sorted list);

then we construct a histogram that tells us

how many nodes of a particular degree we have;

and then we can just plot a bar plot of these histogram.

If we do that for this network,

this is what that histogram looks like.

We see that as we had seen before,

most of the nodes have degree two,

then some of them have a degree three,

and very few of them have degrees one and four.

This allows us to see how these degrees are distributed over the network.

Now, sometimes we're working with directed graphs instead of undirected graphs,

and then in that case,

we would be interested in the in-degree or the out-degree of the nodes.

Usually, we look at the in-degree, and in this case,

if we look at the in-degrees,

this is how they're distributed.

If we look at the in-degree distribution, say P_in(k),

that degree distribution would have these values: we have

P_in(0) is 2/9 because two nodes have degree zero,

P_in(1) is 4/9 because four of the nodes have in-degree one and so on.

We can also visualize the in-degree distribution in

the same way that we did for the degree distribution in the undirected graph,

but now we use the in-degree function rather than the degree function,

and everything else is the same,

and we can visualize the in-degree distribution of this network.

So one interesting question to ask is,

what does the degree distribution look like for real networks?

Let me show you three examples.

Here are the degree distributions of three different networks,

and let me tell you what the networks are: Network A is a network of

225,000 actors connected when they appear in a movie together,

B is a network of the World Wide Web so it

has about 325,000 documents that are connected by URLs,

and then C is a network of

the US Power Grid so it's a network of

about 5,000 generators connected by transmission lines.

There are two things to notice about these degree distributions.

The first thing is that they're all in log-log scale,

meaning that the x-axis and the y-axis are both on log scale rather than linear scale.

The second thing to notice is that,

for at least part of these distributions,

they tend to look like straight lines for all three cases.

When you put those two things together when you have

a degree distribution on a log-log scale and it looks kind of like a straight line,

then we say that this degree distribution looks kind of like a power law.

A power law degree distribution would have the form

P(k) equals C times k to the negative alpha,

where C and alpha are constant,

and the alpha values for these three distributions that we have here are 2.3 for A,

2.1 for B, and four for C. Okay,

these details are not so important,

at least for this course,

but the thing to know about power law degree distributions is

that they tend to have most of the nodes with very,

very small degree, but you have a few nodes that accumulate a very, very large degree.

For example, if we look at the degree distribution for Network A,

we have that most actors have a very small degree

so they have appeared in movies with a very small number of other actors,

but few actors actually appear in a movie with many, many actors.

Some appear in at least one movie with almost 1,000 actors.

You can notice that when you look at this degree distribution,

these actors have accumulated a degree that's very large, almost 1,000.

Whereas, most actors actually have a very small degree.

And that's typical of power law degree distributions.

One of the things we try to ask when we see something like this is,

what could explain this property that we observe happening in many different networks?

The way we try to answer this question is by coming up with models that

generate networks that make a few assumptions about how these networks get formed,

and then they give rise to whatever properties we observe.

So in this case, the question would be,

can we come up with a model that generates

a network that has a power law-like degree distribution?

One of the models that achieves this property is called a Preferential Attachment Model.

Let me tell you how the preferential attachment model works.

First, we start with just two nodes that are connected by an edge,

and then at each time step,

we're going to add a new node,

and the new node is going to connect to a single existing node.

The sort of special sauce in

the model comes when you decide which node to pick out of the existing nodes.

The way that these new node is going to pick an existing node to attach

to is going to be that it's going to choose it at random,

but it's going to choose it with probability proportional to the node's current degree.

So the probability of connecting to a node u that has a degree, k_u,

is going to be k_u divided by the sum of all the degrees of all the other nodes.

Let's run an example of this model.

Again, we start with these two nodes connected by an edge,

and then at each time step we're going to add a new node.

Node three is going to come in and is going to attach to a single node,

either node one and node two,

but it's going to do so with probability proportional to their degree.

But right now, node one and two both have degree of one,

so the probability of choosing one and two is 50/50 for node three.

Let's say, it chooses node two.

Now, node four comes in, but now,

node two has degree of two,

and nodes one and three have degree one,

and so the probability of attaching to node two is 0.5,

and the probability of attaching to nodes one or three is 0.25.

Node four attaches to node three,

which had a lower probability.

That could happen. So it attaches to node three.

Now, node five comes in,

and we have to recompute all the probabilities.

Well, now, nodes two and three both have degree two,

and nodes one and four have degree one.

So nodes two and three are going to have a higher probability of attaching,

and so that's 0.33 for nodes two and three,

and 0.17 for nodes one and four.

Let's say, five attaches to node two,

and we continue this node six,

again, we recompute the probabilities.

Now, node two has the highest degree,

so we will have the highest probability of getting that new attachment.

So six, let's say, attaches to two.

Now, seven comes in,

we recompute the probabilities.

Again, now, node two has an even higher degree so it has

an ever higher probability of getting that new node.

Second is node three that has a degree or two

with probability of getting that new node edge at 0.2,

and seven attaches to two.

Node eight comes in, we recompute the probabilities.

Node two probability becomes even larger,

but this node eight attaches to node three, and so on.

You can see how this continues.

The thing to notice here is that as node two started to get larger and larger degree,

its probability of getting a new edge became larger and larger as well.

There is this sort of rich get richer phenomenon,

where as the nodes get larger and larger degree,

they also start to become more and more likely to increase their degree.

What we can prove about this particular mechanism is that it gives rise to a power law.

It approaches a power law as the number of nodes gets larger, and larger, and larger.

So it matches the kind of degree distribution that we see in these real networks.

This type of modeling technique allows us to explain or at least have

some hypothesis for what kind of mechanism could give and

rise to this shape of the degree distribution that we observe.

For example, if we believe that

a very popular actor that has

appeared with many other actors in movies has a higher likelihood

of getting an additional actor to co-appear in

a movie than a maybe less popular actor

that has not appeared with many other actors in movies,

then this is the kind of mechanism that could be explaining

the sort of very skewed power law distribution that we observed.

In Network X, you can use the function,

barabasi_albert_graph, which is named after the researchers that came up with this model,

with input powers n and m,

where n is the number of nodes and m is

the number of new nodes that an arriving node would attach to.

In our example, the way we define it,

this m parameter would be one because we said that

every new node would attach to only a single existing node,

but you can generalize this and have it so that every node attaches to m existing nodes,

and that m will not change the fact that you still get a power law degree distribution.

You can use this function in Network X to create

networks that follow these Preferential Attachment Model.

Let's create one here with a million nodes and an m(1),

so every node attaches to a single existing node.

Now, let's plot the degree distribution in the same way that we

did before except that now I'm not using a bar plot,

I'm using a scatter plot,

and now I'm setting the scales,

the y-scale and x-scales to be logged so we can see

that straight line that gets formed on the log-log scale for the degree distribution.

Here it is. We see that it forms

the straight line that characterizes the power law degree distribution.

So in summary, we have the degree distribution of a graph is

the probability distribution of the degrees over the entire network,

and in many real networks that we see,

this degree distribution tends to look sort of like a power law

because it looks sort of like a straight line on a log-log scale.

We also discussed that models of networks generation allows to identify

mechanisms that give rise to patterns that we observe in real networks.

In this case, we observed that

many real networks tend to have these power law degree distribution,

and then we covered a model that gives

rise to this particular property which allows us to have

some insight about the kinds of things that could be

given rise to these properties in real networks.

In this case, this was the Preferential Attachment Model that produces

these networks with power law degree distribution.

In Network X, you can use this function,

barabasi_albert_graph to construct networks that have n nodes and where

every single node attaches to m

existing nodes following the Preferential Attachment Model.

That's all for today. I hope to see you next time.