I just briefly wanted to some up here that Deadlock is actually not an enemy.
You can try to use deadlock, to your benefit or, or just live with
deadlock. Just because your protocol has deadlock
does not mean you can't try to recover from it.
So what I mean by this is deadlock avoidance can be very expensive.
So trying to draw all possible graphs here, you should think about, you know,
where your deadlocks happen in your systems, and plan them out and make sure
they don't happen often. But some of the things you need to do, to
try and cut edges here, might be too restrictive.
So for instance, dimensional routing may only route in the X dimension and then in
the Y dimension, could be very expensive relative to an adaptive routing scheme,
or you try to route around congestion in your network.
So this deadlock avoidance, you know, you'll, you'll design the protocol to
never deadlock, this could be quite expensive.
So an alternative, and this is something you should not necessarily be afraid of,
but you have to be careful of this is that you find the deadlock and then you
try to recover from it. So what I mean by this is on-chip network
or a, a, multi-chip network you can actually have a deadlock occur,
notice that a deadlock occurred. And either try to roll back the state and
somehow jitter the state so the deadlock won't happen a second time.
Or many times these deadlocks happen because your dependent on one particular
buffer in the system and two people trying to acquire that buffer.
So you can try to virtualize that buffer. Effectively saving the state of that
buffer in memory and adding more, more states.
So you, a lot of these protocols, if you were to add more FIFO entries into the
system, it would actually break the deadlock.
So, sort of going back into this picture here,
if all of a sudden if I added another, let's say, FIFO here that the inbound
traffic went into and then this other traffic tried to bypass it, it would
actually cut one of these edges, and all of a sudden you would not be having a
deadlock. So you can think about trying to have a
deadlock recovery mechanism that's an example of it.
So a good example this is actually in the raw micro processor from an IT where we
I, you can actually have that much on the on-chip network.
Its dimensional routed, but you can still have message depended deadlocks.
So what I mean by that is while the network itself is not going to deadlock,
the traffic flows you have on top of it that from since your memory coherence
protocol on top of that could be potentially deadlock.
So how do you recover from that? Well you can actually have a counter,
a timer which goes off when you determine that the network has not moved for a 1000
cycles. So all the sudden you're running along
and no traffic moves on your network. Well it's a pretty good indicator that
you have a deadlock condition, because something should be flowing.
There should be some forward progress guarantees.
[COUGH] And if you notice that, an interrupt will go off.
So this timer will go off saying nothing has moved on my network for 1,000 cycles,
this is probably a deadlock. So that, you could take an interrupt and
then software can go look at all this data in the network and introduce more
buffering into the network. So, effectively through software try to
virtualize a particular FIFO entry and that can break the deadlock.
So there are ways to recover from deadlocks.
If your network is for instance used for message passing network, you can just use
memory to go do that. So actually on the raw processor, our
memory coherence protocol, we used deadlock avoidance and then our message
passing protocol, our message passing networks use deadlock recovery.
later on in, in Tylera actually we have something depending on which memory
network we have mixtures of these two. And you might say that sounds scary,
but if you walk through the proofs and are very careful about it and you are
guaranteed that you don't need anymore resources to go break the deadlock you
may be okay. But yeah it's, its just some say it's
playing with fire. You've got to be a little careful of
these, these sorts of solutions when trying to play with deadlock recovery.
But if you're guaranteed that you're not going to need any more resources you can
just resolve it right then. And oh,
so the reason you would want to, to use the lock recovery is it does not restrict
your protocol. And by doing that, you can have the
common case of your protocol be very, very fast.
And, you could make it so that the deadlock almost never happens.
Or never, in practical concerns ever happens.
Then, when the deadlock does happen oh, you take a little bit of performance
bump, or a performance hit there. Because you have to virtualize it in
software. But the probability of it happening is so
low.