0:03

As our final digraph processing algorithm,

we'll take a look at computing strong components.

So, the definition, two vertices v and w in a digraph,

are strongly connected if there's a directed path from v to w and

another directed path from w to v. And the thing about strongly connected is,

that it's an equivalence relation.

That is, each vertex is strongly connected to itself.

If v is connected to w strongly,

then w is strongly connected to v.

That just means there's directed paths connecting them.

And it's also transitive.

If v is connected to w and w to x,

then v is strongly connected to x.

How to get from v to x you go from v to w and w to x.

And get from x to v, you go from x to w and w to v. So that means,

that since it's an equivalence relation,

it divides the digraph up into sets,

called strongly connected components,

that have the property that there's directed

paths connecting each pair of vertices in the set.

So for example 9 in 12 are strongly connected.

There's a path from 12 to 9,

another one from from 9 to 12 and so forth.

So, our challenge is to compute the strong components in digraphs.

And it's worth comparing this to what we

did for connected components in undirected graphs.

So this is just a quick review,

in an undirected graph,

if there's a path between two vertices, they're connected.

That's an equivalence relation,

divides them up in a connected components.

And what we did was,

our design pattern is to build our graph processing algorithm as a constructor,

that takes a graph and does the pre-processing to create a table like this one,

which assigns a unique ID to all the vertices in each given component,

so that we can have a constant time client query,

to check whether two given vertices are in the same connected component or not.

So, linear time pre-processing in the constructor to build the table,

constant time client queries.

And that's as good performance as we could

expect to have for a graph-processing algorithm.

Remarkably, we're able to do the same thing for strong connectivity.

It's a much more sophisticated algorithm,

but the design pattern and the bottom line for the client is the same.

There's a constructor that processes the graph in linear time

and assigns a unique ID to each one of the strongly connected components in the graph.

So, one's in a component by itself, 0, 2,

3, 4 and 5, 6,

and 8, 7, and 9, 10, and 12.

There's five different strongly connected components in this graph.

The constructor builds that array and the client gets to,

in constant time, know whether two vertices are strongly connected or not.

It's quite amazing that we can solve this problem in this way.

And as I mentioned, it's got an interesting history that I'll talk about in a second.

So, here's an example of strong component application.

So, the food web graph,

this is a very small subset of that graph.

The vertices are all the species in edge goes from producer to consumer.

So if animal a eats animal b,

there's an edge from a to b.

And what a strong component corresponds to in this graph is,

kind of a subset of species that have a common energy flow.

They mutually eat each other.

And that's extremely important in ecological studies.

Another example is again in processing software, big software,

is made up of modules in the vertices,

or other modules, and edges if one depends on another.

And so a strong component in this graph is,

a subset of mutually interacting modules.

In a huge program like Internet Explorer,

you want to know what the strong components are so that you can

package them together and maybe improve the design.

So, again software, these graphs can be

huge and this kind of graph processing can be extremely important,

improving the design of software.

Now again, this algorithm has

an interesting history along with scheduling and other things,

it's a core problem in operations research that was widely studied,

but really not understood how difficult it was.

In Tarjan's paper in 1972,

was a big surprise that this could be solved in linear time with depth first search.

Now, this algorithm had a couple of other data structures,

and I guess a diligent student

in this class could understand it with quite a bit of work.

But it really demonstrated the importance

of depth first search in great graph processing.

Now, the algorithm that we're going to talk about today,

actually was invented in the 80s by Kosaraju independently by Sharir.

And the story goes that Kosaraju had

to go lecture about Tarjan's algorithm and he forgot his notes.

And he had taught it a number of times.

He was trying to figure out what Tarjan's algorithm did,

and he developed this other algorithm that's extremely simple to implement.

And that's what we're going to look at today.

And actually in the 1990s,

Gabow and Mehlhorn, particularly Mehlhorn,

had to implement this algorithm for a big software package.

And I found another simple linear-time algorithm.

So this story indicates,

even from fundamental problems in graph processing,

there's algorithms out there still waiting to be discovered.

And this algorithm is a good example of that.

So, you get the intuition and the code,

and you see how the algorithm works fully convincing yourself or proving why it works,

is a bit more of a challenge.

We'll leave that mostly for the book,

but let's describe what it is.

So, the first idea is to think about the reverse graph.

So, if we take a graph and we reverse the sense of all the edges,

we're going to get the same strong components.

We need edges in both directions.

So, if we switch the directions,

we're still going to have edges in both directions.

The second concept is called the kernel DAG.

And what that does, is it kind of,

you think about contracting each strong component into a single vertex,

and then just worry about the edges that go from one strong component to another.

So, the digraph that you get that way turns out to be sink-like.

If it was sink-like it would create a bigger,

a couple of strong components into one.

So, if we think about that processing that kernel DAG,

that's the edges that go between strong components,

and we get a DAG,

and we know how to deal with the DAG.

So, the idea of the algorithm is to go ahead and compute the topological order,

or the reverse post order in the kernel DAG,

at least put out the edges of

the original digraph in order

so that all the edges in the kernel DAG point from right to left.

That's like a topological sort.

And then we run DFS.

But instead of considering the vertices in numerical order for the DFS,

we consider them in that reverse topological order.

So, it's extremely easy to implement this algorithm.

Of course we're going to use DFS.

So, let's look at a demo at how the algorithm works.

Okay, so this is a digraph,

and our goal is to compute the strong components.

And we're going to do it in two phases, two DFS's.

One is to compute the reverse postorder in the reverse of the graph,

and the other is to run the DFS,

but for the order in which we visit the vertices,

we use that reverse postorder that we computed.

So, here it goes.

So, first we'll do the DFS in the reverse graph.

So, that's the graph.

That's the reverse graph.

Remember these two have the same strong components.

So, there's our marked array.

We do the DFS and reverse postorder means that when we're done with the vertex,

we put it out.

So, check six unmarked,

check eight unmarked, check six it's marked.

So we're done with eight and so that's the reverse postorder.

And again as before,

I put it on the stack,

but we'll just list them in reverse order in this demo.

So, eight's done, so six has lots of places to go from six.

So, let's check seven.

That's unmarked, so we go to seven,

no place to go from seven,

so we're done with seven.

We put it on the reverse postorder list.

We're also done with six,

because four and nine are coming in in this example,

so put it on the list.

Next place to go from zero is two,

so we check two.

That's unmarked, so we mark it and recurse and we check the vertices adjacent to two.

First four, that's unmarked,

so then we recurse and go to four.

We've got to go to 11 first and recurse.

So now we're at 11, and from 11 we go to nine which is unmarked.

So we have a pretty long recursive stack right here.

So from nine we have to check a bunch of things,

we'll check 12 first,

and then there's a 12, it's unmarked.

From 12 we check 11 which is marked, nowhere to go.

Then we check 10, that's unmarked, so we go to 10.

Visit 10, it's unmarked so we recurse,

and then go to nine, and that's

marked and that's everywhere we get from 10 so we're done with 10.

So we put it on the list in return.

And now we're done with 12,

so we put it on the list and return.

And now at nine,

we have to check seven which is marked,

and six which is marked,

and then we're done, so we put on the list.

Then we're done with 11, we put it on the list.

Then we're finished with four,

so we check six which is marked,

and we checked five which is unmarked,

so we recurse to five.

From five we checked three which is unmarked, so we recurse.

Then we check four which is marked,

we check two which is marked,

and then we're done with three,

so we put it on the list.

And then from five we check zero, it's marked.

So we're down with five,

we put it on the list.

And that means that we're now done with four,

and then we're also done with two,

now after checking three,

and then we go to zero and put it on the list.

So that's all the vertices that you get two from zero.

So we look for more vertices,

and it's one and it's marked.

So that's the last one in the reverse post-order.

So that's a reverse postorder of the reverse graph.

And all we're going to do is take that order,

and use that order to check vertices

at the top level in depth first search of our regular graph.

We have to check all the other vertices to make sure we're done.

So that's phase one.

So now we just do a DFS in the original graph using that order that we just computed.

So we don't start with zero the way we always have,

now we start with one.

We visit one, it's unmarked and now

all the vertices that we visit during

that DFS are going to be in the same strong component.

That's the theorem that makes this algorithm work.

In this case, there's only the ones,

so vertex one is in it's own strong compound with label zero.

So now we get a start.

One's all done, so now we have to start with looking for another vertex to search from,

in this case it's zero,

it's second on the list,

so that's where we start, with zero.

And now all the vertices that we can reach from zero in this graph,

are going to have strong component label one.

So we do the DFS.

So first we get to five,

that's in the same strong component,

we check four, and it's unmarked so we label that,

we check three it's unmarked,

we check five which is marked,

we checked two which is unmarked,

so now we have shown that zero,

two, three, four and five are all in the same strong component.

And now we're going to find both vertices and get two from two are marked,

so we're done with two,

then we're done with three,

four we have to check two which is marked,

and then we're done with four and five.

From zero we can check one but that's already marked.

So that's not relevant for this search.

And then we're done with zero,

and we've established that zero, two, three, four,

and five are all in the same strong component.

So that's the second one.

So now we continue and we check all of those,

and they're all marked.

So the next vertex in the reverse post-order or reverse graph is 11.

So we visit 11,

check four it's already marked,

check 12, it's not so, we mark it,

and these are the third strong component they get labeled with two.

Then we get to nine,

from nine we check 11,

and nowhere to go, then we check 10,

and that's in the strong component.

From 10 we check 12 which is marked,

we're done with 10.

We're done with nine,

we're done with 12, and we're done with 11.

So that's our third strong component 9,

10, 11 and 12.

So we're done with those.

And now we go through the list to find another unmarked vertex.

Nine's marked, 12's marked, 10 is marked,

when we get the six, from six, nine's already marked.

Four is already marked,

eight's not marked so we go there.

From eight we can get the six and that's it.

Zero's already marked, so we can only get to eight from six at this point.

And so that's a strong component.

And then finally we finish up by doing seven.

So the end of the computation gives us

the strong component array which is a unique ID for each of the vertices,

so that two vertices in the same strong component have the same ID,

and that's a sports constant time strong kind of activity checks for clients.

So the bottom line is that this algorithm is very simple,

even though it might be a mysterious algorithm for computing the strong component.

First, run DFS on the reverse graph to compute the reverse post-order,

then run DFS on the original graph considering

the vertices in the order given by the first DFS.

And so these diagrams give a summary of the computation.

I'm not going to spend too much time explaining why and how it works in this lecture,

but these kinds of diagrams give the detail that can

give some intuition about how and why the algorithm works.

For the proof requires some mathematical sophistication,

and you'll find it in the book.

But the programming of it is quite simple,

and prove that it's efficient is all it does is run DFS twice,

but to really see the correctness proof,

you want to look at the second printing of the textbook.

We got it wrong in the first printing.

But look at the implementation.

So this is connected components in an undirected graph with DFS that we did last time,

and I'm sure many of you thought it was one of the simplest algorithms that we looked at.

We just marked the vertices and do recursive DFS,

and that's the end of it.

If you take that code and just had to change the names,

and then just instead of going through the vertices from zero through V minus one,

if you first compute that first order that you get by

doing a depth first search of the reverse graph,

and then you iterate through the vertices in that order,

you get an algorithm for strong connectivity.

It's really remarkable that just changing that one line will

perform this computation that was thought to be difficult for many years,

and many people were learning quite difficult codes for many, many years.

So that's a quick summary of digraph processing.

We talked about single source reachability,

getting the paths from a vertex to

any vertex that can be reached from that vertex with a directed path,

we talked about top logical sort in graphs that have no cycles,

and that uses DFS,

and we talked about computing strong components in a digraph with two DFS's.

Digraph processing is really a testimony to

the ingenuity that's possible in algorithmic graph processing.