Now the way that we quantify homophily, tri-enclosures, and triangles in social relationships is using something called the clustering coefficient. Which should not be confused with the cluster density, that we discussed in the context of influence models for Facebook and Twitter. So the clustering coefficient is a measure of the number of triad closures that we have in a graph. Relative to the number of connected triples that we have in the graph. Right, so triad closures are the triangles, where as connected triples are just when want us to draw it next to the connected triples. So triad closures are when we have a full triangle, whereas connected triples are only when we have three nodes connected. They don't have the full triangle of the social relationship. So, now, the question is, it's obvious how we define a triangle, just if we see a triangle in the graph. Now the question is, what is a connected triple, well how do three nodes form a connected triple. So, three nodes form a connected triple if we can get from. So now the question is how do we find a connected triple and how do we find a tri-closure in our graph to be able to evaluate the metric. So if three nodes form a connected triple. If we can get from one to any of the others in the connected triple. So take three nodes and if it's the case where we can get from one to any of the others, then we're going to have a connected triple. So in this graph right here, for instance, we can get from B to D and we can get from C to B, C to D and so forth. So we'd say that BCD forms a connected triple. Similarly, ABC is going to form a connected triple, because we can get form A to B and A to C, and from B to C. Now, three nodes are going to form a triad closure if each pair of nodes has a direct link. So, in this case, for BCD, there's no direct link between B and D. So it takes more than one hop. For B to get to D. So that does not form a triad closure. What does form a triad closure, though, is ABC, right? So, we would say that, ABC here is a triad closure. And for connected triples we would say ABC. is a connected triple, but also BCD is another connected triple. Now the clustering coefficient is going to be the ration of connected triples to triad closures. And the exact formula for clustering coefficient, or cc we'll say. Is going to be the number of triad closures we have, divided by the number of connected triples we have over three. And the reason we divide by three, is the fact that in any given triad closure, any given triangle, there's going to be three connected triples. And we'll explain the reasoning for that In the next slide, right. But the formula for clustering coefficient just, for now suffice it to say it's the number of triad closures,. Over number of connected triples divided by three, so the high of the coupling coefficient is. That means we have more tri-closures per connected triples. And the lower it is that means we don't have any triangles going on in this graph and there's no measure of homophily. So this is going to be a value now between zero and one. So it's going to be between zero and one. Because, if we just had a triangle. Right, a triangle is going to be, is exactly one triad closure. And it represents exactly what we mean by the concept of homophily and the concept of. Now I'm having transitivity in the graph, having friends of your friends be friends with each other. So we want the triangle, the canonical example of what that means, to have a clustering coefficient of one. And in this case, the, the triangle has one triad closure, and as we've said it's going to have three connected triples. And we just divide that by three to normalize, which would give us one. So that's how, we come up with that division by three metric here. So this is the formula for clustering coefficient: number of triad closures over the number of connected triples divided by three. So now let's do that for this graph here in this example, okay? So the first thing we have to do is figure out the number of triad closures that we have. And, as we said each triangle is going to be one triad closure, and this is just ABC here is one triangle. So, that's exactly one triad closure. In terms of connected triples, all right, we said each triangle is three connected triples. Three unique connected triples. And additionally we've got, for instance, BCD as being a connected triple. That's one. And ACD, which is another one, which is going to be five. So here the clustering coefficient is going to be number of tryout closures, which is one; divided by the number of connected triples over three. Five thirds, which is three over five. So, the clustering coefficient here is going to be 0.6 for this graph. So, as we said, there's three connected triples in a triangle. The question is, why is that exactly? 'Kay, so, the reason is that we need to only consider the connected triples in the triangle that are unique. Right, so if we have a set of three nodes, okay? There's many different ways that we could connect all the three nodes but suppose that they're connected in a triangle. Now a connected triple as we said is basically, is only saying that we have to be able to get from one node to any other node in a graph. Okay, so now the question is what are the unique different ways that we can connect these three nodes. To make that happen. Well one way is by doing this, okay? So now we can get from one node to any of the other nodes in the graph. Now, this is only one connected triple because going, or this having, this is A, this is B and this is C. ABC is the same exact thing as CBA. All right, there's, they're the same nodes and the same lengths. So, we have a link from A to B and a link from B to C in either case. But now another connected triple, would be instead if we took this link here and this link here. So instead of B to C we'd have from A to C. This is ABC. Same set of nodes, but different lengths. And still we can get from node to any of the others, right? So we can still go from BAC, but again, CAB is no different from BAC, right? So this is just one of the connected triples. This is one. This is two. And then we can do that again. In a third way, by saying that we won't have this link here. Instead, we'll just have these links. So ABC. So, there's no direct link from A to B, but BCA is another connected triple. Because each nodes are connected. So this makes three total. If we have a triangle. So we can also look at that. This is exactly what we just illustrated here. The naming of the nodes is a little different, but basically we can also say that at each of the center points. If we consider the path that we travel, in going through the connected triple right, so from. A to B to C. Each of the center points has to be different, right? So here the center point is B. Here the center point is A, going from BAC. And here from A-C-B the center point is C. So that defines three different connected triples because each one can have a different center node that is connected to both the other nodes. And both of those other nodes are not connected to each other. So, really here, all we're doing is, we're saying that we have to remove the links one at a time in a triangle. And, by removing the links one at a time, since there are three different links, that's going to define three unique connect triples, so. This is just an explanation of why, exactly, it is that when we see a triangle that we immediately say, okay, there's three connected triples in this triangle.