[MUSIC] So for handling non-coordinate data, non-numerical data, we're often more interested in the relationship between data items than in the data items themselves. And so we'll use a graph or a network to represent the relationships between data items. And so we'll talk here about different ways of specifying a graph, and different attributes of graphs, and how graphs are represented computationally. So a graph just consists of nodes and edges. In this case, we have four nodes and we have five edges connecting these nodes. And so the nodes will represent a data item and the edge will represent a relationship between two data items. If we add this particular edge, then every data item is going to be connected to every other data item. We would call that a complete graph, or a clique, of these four nodes. Here on the left, I've got I've got a graph of four nodes connected by five edges, and on the right I've got four nodes connected by five edges. In fact, these two graphs are isomorphic. They're the same graph, they're just laid out differently. And on the right, I've got a planar graph, which means to nodes and edges are laid out so that none of the edges cross each other. And then I can speak of a face as being the region bounded by a cycle of edges starting and ending at the same node. And also, we would say that these are two different embeddings of the same graph or at least these graphs are isomorphic. If we think of this whole arrangement of eight nodes and ten edges as a single graph, then that would be one graph consisting of two connected components, but the entire graph containing both of these sub graphs would be a disconnected graph. We can also have directed graphs. So in an ordinary graph, or an undirected graph, just has edges, but does not really suggest the direction of these edges. We can have a directed graph by using arrows to indicate a direction, so that we would have a connection from this node to this node, but not necessarily a connection from this node back to this node. And you can think of a graph as having a cycle, a directed graph having a cycle, so this undirected graph, this ordinary graph has a cycle because these three nodes. I can start at this node and I can follow the cycle to get back. You can have a cyclic directed graph, because I can follow a cycle here. This graph is acyclic, because there is no cycle. Once I get to this node I can go to this node. Once I get to this node, I can't go any place further. So there's no way to get back to any of these nodes by following these edges in the directions indicated. You can also have trees. Any graph that's connected that has one less edge than the number of nodes forms a tree. It's minimally connected. And you can think of having a parent node, but for an undirected graph, any of these nodes could be the parent and you can think of multiple siblings. Here's some graphs with many more edges and you can see kind of the way they're laid out would imply a parent. But I could lay these out in isomorphic methods, equivalent methods, and another potent node might appear to be the parent. When you have directed graphs, the tree forms a hierarchy and then you can have a parent. So in this case, the child nodes, these child nodes point to their parents. And so I've plotted their parents higher than the child nodes. And then there is a clear root node. This one parent that's the parent of all of these nodes. You can also have a hierarchy that's not a tree. In this case we have a parent relationship, but this child node has two parents. And so you've got a definite hierarchy here, but it's not a tree because a tree would have each node would have a single parent node. There's also a relationship between the number of edges each node has and the kind of graph you're looking at. We talk about the degree of the node as being the number of edges extending from a node. Directed graphs would have a different in degree than an out degree. The in degree would be the number of edges going into the node and the out degree would be the number of edges leaving the node. And then these, the number of nodes you have of each degree, if it follows this kind of fall off, it's called a social network. A social network, you can think of this as a friend's network or many natural data relationships follow this power law, this social network power law. In this case, this is the number of interactions of yeast proteins. Each one of these nodes is a yeast protein, and the edges represent interaction between these proteins and, in this case, you have many nodes that have a few interactions that are connected to, you know, one or two, you have one or two edges. And then you have many fewer nodes that have high degree, that have a lot of edges that are connected to a lot of other nodes by a single edge. And so in this case, the number of nodes that have a certain degree follows a power-law. If the y is the number of nodes with that degree, is equal to the degree to some power with some constant multiplied to it. And you see this fall off happen quite often. These graphs tend not to be plainer, they tend to be difficult to embed. They also tend to be the most popular graphs that we encounter in real life. Finally, we're going to use an adjacency matrix to represent graphs. And so in this case we have a graph that has four nodes, four data items. One, two, three and four, and then we're going to represent this graph using this matrix. In this case, the Adjacency Matrix will have a one in row one, column two, if there's an edge connecting node one to node two. Likewise, it'll have a one in row two, column one, because it connects node two to node one and so it will be symmetric. Because this is a non-directed graph. If it was a directed graph, then you would connect, if it was directed from node one to node two by an edge leaving node one going to node two. Then you would have a one in row one column two but you would have a zero in row two column one. So a directed graph would have asymmetric or could have an asymmetric adjacency matrix but an ordinary graph, a non-directed graph, would just have a symmetric adjacency matrix. And these edges could also have weights, in which case you might not just have zero or one, you might have a value here to indicate how strong the connection is between nodes. And the diagonal is usually zero, unless there's some relationship between a node and itself, and you can represent that along the diagonal. So it might be good to review graphs and the different kinds of graphs, the different representations of graphs, and the different attributes of graphs to familiarize yourself with them, because we'll be using those for the rest of this module. [MUSIC]