[SOUND]. So here in Lecture 9.5, we're going to talk about a very famous kind of ASIC Placement Technology, something called Simulated Annealing. Simulated Annealing was originally invented in the mid 1980s. It was a tremendously famous technical innovation, and one of the first applications of this technology was actually to integrated circuited placement. simulated annealing is a kind of hill climbing, it's a particular kind of controlled, random hill climbing that actually takes it's inspiration from, of all places, statistical physics and statistical thermodynamics. I recognize that sounds scary but the actual, the underlying idea and the underlying technology has a very nice and surprisingly simple explanation, a surprisingly simple pseudo code. And even more important, it gives spectacularly better answers than the very simple random iterative improvement techniques that we showed. So, let's go look at how a simulated annealing placer actually does its job. So I'm starting with another copy of the last slide from the end of the previous lecture, the hill climbing lecture. And what we said in that slide was that the big idea, the two big ideas, of this new method for placement with random hill climbing were that there was this temperature control parameter that we started at a very hot value. And we cooled very slowly over the course of many, many, many swaps, random swaps, that we're going to try to drive the placement process forward. And that this temperature parameter was a hill climbing control parameter and when the temperature was hot, we were more tolerant of. Of changes in the gate locations that made the placement wire length worse. And when the temperature control parameter was cold, we were not so tolerant. And what tolerant meant was that we actually used some random numbers in some interesting ways to calculate some probabilities to decide whether we should or should not keep a gate swap that made things worse. This is an instance of a very famous general optimization method that has a very famous name. This is an algorithm called Simulated Annealing and so in this lecture, we are going to talk about. Simulated Annealing in the context of placement, but we're also going to give you a little background about why this thing has this name. So let's let's just talk about a little context here. So here's a physical analogy for you. Suppose you want to make a perfect crystal. What does perfect mean for a crystal? It means that all the atoms are lined up in their crystal lattice sites. And that's actually the lowest energy state. That's what the atoms really want to do. So I've got a little example over here of a crystal that has lousy order. It is a disordered crystal. And I've got basically a little 8 by 8 grid of, of, pink circles, that, Are trying to look like atoms. And then I've got four little red atoms that are in sort of strange locations. They're clearly not where they want to be. This is disordered as a higher energy. How do you take this physical system and actually make it a really good high quality crystal, a low energy crystal, a crystal where all the atoms are on their lattice sites? The answer is you get the material really really hot. So the atoms have the energy to move around even to bad places which is to say, places not on their lattice crystal sites. And then you cool it very, very slowly so the atoms settle into good locations. And the idea is that when there's a lot of energy, the atoms can move around even to bad places and when it's not hot, the atoms have to sort of stay where they found. And by getting it really hot and cooling it really slowly, you actually encourage it to go into this, from this highly disordered state to an ordered state. This process has a name, it's called annealing. So, you anneal it, you get it really hot and you cool it really slowly and hopefully you get a crystal with, well it's probably not going to be perfect and it's probably not going to be minimum but it's going to be really good. It's going to have very few dislocations in the crystal lattice. So that's physically how you do that. What if you actually wanted to write a computer program to simulate that thing? What kind of physics do you need to know to make that actually work. Well, you would have to have some kind of a data structure that would represent the crystal lattice. And you know, that could just be a bunch of atoms with x and y or x,y and x coordinates. I'm just going to draw it in two dimenstions here. And you would have, just like our placer had. An inner loop where you would randomly grab two gates and swap their locations and evaluate the change in a number that mattered to you, and that change was the wirelength. For the physical system, you grab an atom and you move it someplace, that's what I'm showing here. You grab the blue atom and you move it someplace, and the thing you calculate is the energy, then in particular you calculate the change in energy, because that's what the crystal cares about. And you also need to know how hot it is, because whether or not this is a good thing to do, whether or not this is a likely thing to do, depends on how hot it is. And so let's ask a question. To model a real physical crystal, where we grab an atom, and we move it to a new location and we calculate the change in energy, delta e. What is the probability that that actually should happen in my computer program? And the answer is 1. If delta e goes down, if delta e is smaller, if this change in the atomic location makes the energy better. You should keep it. That's a good thing to do. That's a physically reali-, reasonable thing to do. What if, to model a real physical crystal, what if the energy goes up? What if you grab the atom and you move it to a worse place in terms of the overall crystal lattice organization? What if delta e is bigger than zero? Now, in your computer program, what's the probability that you ought to, you know, make that happen? And interestingly, the answer is e of the minus delta e over kt. That's the physics of the real universe of taking that atom and putting it someplace where the energy got locally, microscopically at the atomic level, a little worse. Delta e is the change in the energy. T is the temperature, and k amazingly enough, is the Boltzmann constant. In real physics the units all have to line up. If you probably took a physics class in your life you probably got yelled at by some professor, lecturer, instructor or teaching assistant, because your units didn't add up. You know, you can't take newtons and add meters. It's just not okay, and similarly here the thing that's in the exponent has to be without Units and so the thing on the top and the thing on the bottom. The nits all have to cancel. And so if the thing at the top is energy. And the thing on the bottom is temperature there has to be something that links the way we measure temperature to the way we measure energy. For this to get the real probability. That the real atom, makes this real physical move and amazingly enough, that's a famous number. That's a famous physical constant. That's the Boltzmann constant and that's Ludwig Boltzmann over there on the right, a picture from Wikipedia. The Boltzmann constant, in real physics. Converts the units of temperature to the units of energy. So what, you know, Ludwig was doing back there in the 19th century, was pioneering some of the initial mathematics of statistical thermodynamics to be able to sort of write these equations about what happens with what probabilities with what energies. This is a long time ago. Whats amazing is this e to the minus energy over temperature thing is the sort of the foundational model how the probabilities works. Now, I caution you that you don't need a k when you're doing a placement. Because you don't want a placement. Nobody cares what the units of wire length and temperature are. So as far as you're concerned, pretend the temperature is measured in meters. Right? It doesn't matter. It's all artificial. It just has to be a probability. It has to be a number between 0 and 1. We tune the parameters so that everything works. But if you're doing real physics, with real atoms and real crystals and you want real probabilities with real energies, the units all have to work and that's where Ludwig comes in. This new method is called simulated annealing. It's a very famous general optimization method. You can optimize lots of different things with it. It is, however, widely used in VLSI CAD. It was invented at IBM in the early 1980s by Kirkpatrick, Gelatt, and Vecchi, from this very famous paper from Science Magazine. Which is a very interesting paper to read, I recommend it, if you can get yourself a copy. and in fact some of the earliest examples that our friends at IBM, in fact I know Scott pretty well used were actually electronic design examples, ship design examples. They were actually designing parts of IBM computers back there in the early 1980s, and getting fabulously good results. And it's this idea of annealing something getting it really hot. Allowing both downhill changes that make things better. And uphill changes that make things worse, with a probability controlled by the temperature. But applying this in the form of a, of software, of a computer program, to things that are not physical systems. Things like gates moving around on the surface of chips. So, here's some pseudo-code for a simulated annealing placer and it looks kind of complicated. And what I want to say is that we're going to read this sort of from the inside out to convince you that it's actually something very close to what you already know. So, ignore the stuff with color behind it. Ignore the yellow stuff and ignore the kind of pink brown stuff, and let's just start with the simple white parts here, and so up at the top it says that start with a random initial placement. Okay, that's just like the greedy random iterative improvement stuff that you know. And then it's got this part here that says for s equals one to m times the number of gates, m is just how many swaps we're going to do per gate, so pick a number. 1000. I'm going to do 1000 times the number of gates swaps and that's going to be my placement. And then look at the code inside the for loop, swap 2 random gates gi and gj. Calculate delta l, the change of the wire length if delta l is less than 0, then accept this swap, else. And then ignore the stuff with the brown background, else undo this swap, okay. Well, great that's just the greedy placer. What happens now when we add the, simulated annealing stuff, is that, the first thing we do, is we add this inner loop stuff that says, if the wire length went up, if delta l is greater than zero than we generate a uniform random number r. That's this thing here, and we calculate that probability p which is e to the minus delta l over t. And we ask if r is less than p. And if that's true then we accept this uphill swap. It's a new worse placement but our hill climbing control parameter key says it's okay to do. Alright, so that's the probabilistic swap acceptance, and then the yellow stuff, is this new outer loop, this annealing temperature control loop. And that says that you start with the temperature being hot. And then you have this, this basically this Boolean right, just a flag, that says frozen is false, and it says while not frozen, do, we do a whole bunch of gate swaps at this temperature. And then at the end it says, if the overall wire length is still decreasing over the last few temperatures, then okay we're still annealing, what do you do? You make the temperature a little cooler because as the annealing runs, you have to make the temperature control go down so you don't keep taking all the big uphill moves. So this is an arbitrary number. T is 0.9T. We cool it. And if it's not the case, then, if it is the case that the total sum of the wire length has stopped changing. So let's say it just, you know, it hasn't changed more than, you know, 1 100th of a percent over the last five temperatures. Okay, we're done, right? I mean, we're just not going to find anymore improvement in the wire length. Then we set frozen as true, we drop out of the loop and we return the final placement as the best solution. So, it's not that much different, it's just the temperature outer loop, the yellow stuff in this diagram, and the brown-pink stuff. The, you know? e to the minus delta, l over t. Let me go uphill with a certain probability stuff that's really new here. This probabilistic acceptance of swap stuff is really famous. So this little snippet of code. If uniform random less than exp delta l over t. Then accept the swap else undo the swap. This little piece of code is sort of a version of a very famous idea. So this is the idea that randomly accepting a perturbation of a system with a specific probability will correctly simulate the physics of that real system. This has a name. This is called the Metropolis Criterion. And it's named after this guy, Nicholas Metropolis, Nick Metropolis, a very famous guy. what was Nick doing when he was sort of doing this? Nick was working with some very famous people like John Von Neuman and Stanislaw Ulam, who's a very famous mathematical physicist. Way back at the dawn of atomic physics in the 1950's, when people had computers the size of football fields built out of thousands of vacuum tubes. And they were trying to write the first computer, some of the first computer programs, to simulate how the physics of things, you know, with atoms doing interesting things, worked. And these are the guys who invented the first Monte Carlo simulations, if you've ever heard that word. And this sort of inner loop thing that is generating particular probability distributions, checking that the perturbation you make either does or doesn't fit within some probabilistic acceptance criterion. That's really famous and so when you see lots of computer. algorithms that are based on random sorts of acceptance methods for how things make progress. You'll often see metropolis the name show up. Metropolis criterion, metropolis algorithms. Lots of interesting places we use this idea. This is really famous. How does this metropolis criterion make our overall algorithm behave? and the answer is, in some very unusual ways. In some ways that you've probably haven't experienced before. So it really is random. You might accept the swap. You might not accept the swap. It depends on the random number you generate. And the way to maybe explain that is suppose the temperature's 10 and delta l is 20, plus 20. So either the minus delta l over t. Is either the minus 2. Its about 0.14. So this is about a 14% probability you accept the swap. You know, what does that mean? So here's a, let me give you a concrete example of what that means. Supposed this temperature you're trying a really large number of swaps, in the, in the, in the, in the placement. So, I mean, suppose you're trying a million swaps it this temperature. And suppose it turns out that 5,000 of those different swaps all happen to evaluate that they change the wire length by delta l is plus 20. Then roughly 0.14 of those swaps that you try that all have a wire length degradation of 20. Approximately 14% of them are 0.14 x 5000 equals 700 of them will be accepted. A random 700 of those 5000 which all change the placement in exactly the same way. They all make the placement 20 units of wire length longer. About 700 of them will get accepted and about 4,300 of them won't. And if you change the random number seed so that you get a different stream of random numbers and you run your placer again. You will get a different set of placement swaps proposed, and let's assume that you again try about 5,000 of them that have delta l as 20. You will still expect about 700 of them to be accepted, but it'll be a completely different set of swaps with a completely different set of results. It's a very different universe when you have probabilistic algorithms like this, and the thing that's amazing is how well this stuff works. So, how well does it work? And the answer is, it works great. There is nothing else to say other than it works great. So this is the same 2,500 gate benchmark, this is an annealer that looks very much like the annealer pseudo-code that I showed you a couple slides ago. The hot temperature is 800, the m number for how many moves per gate, how many eval, swaps per gate, is 100. So we try 2500 times 100. which is what, 250,000? swaps per temperature. And then we lower the temperature this is the curve of the progress. So the first thing I'll just say is we got about 30% less half-perimeter wirelength total than the random iterative improvement algorithm, and that's a gigantic number. That's a very big deal. So let me tell you how you read an annealing curve, because this is a pretty standard looking annealing curve. The x axis is temperature. Okay? And the thing that's interesting is that, it's a log scale. So what you see is the temperature is 0.1, 1, 10, 100, 1000, going from left to right. And the other thing to remember, right? Is that you read the curve from right to left. Because the temperature starts hot. Okay, and it ends cold. So you read the curve from right to left. The vertical axis is the wire length. It's the thing we're optimizing. And what you always see Is this amazing, beautiful s shaped curve, it's a beautiful amazing thing. And it's always got these different sorts of regions on the curve, which are often called regimes, right? And so it's got this regime up here, okay? Where things are basically hot, and where what's really going on is that we are randomizing things. When it's really, really hot, you're basically just randomizing exploring the solution. And then you've got this part where you're actually making real, serious progress on the quality of the placement, and what you see is it's just dropping like a rock. The placement is going from being very, very bad, actually much worse than the one random placement we started with on the last example I showed you, to being very, very good. And then, down at the end here, right, there is this regime where things are basically frozen. Things are so cold that it really is random iterative improvement. There's not a lot of progress happening. You're just taking Iterative gate swaps that improve things, but there's almost no hill climbing. And the thing that's amazing is that, as soon as you get a placement problem big enough, they all look like this. Now, the numbers are different, the temperatures will be different on the horizontal axis, and the gate wirelength will be different on the vertical axis. But I could show you this 2500 gate benchmark, I could show you a 10000 gate benchmark, I could show you a 100000 gate benchmark, I could show you a quarter of a million gate benchmark. And if I erase the numbers on the axes, they all look like this. It's a sort of really amazing kind of a, kind of a set of of behavior. So here's another interesting view of the dynamics of how a simulated annealing placer actually does its job. This is a placement running 6,500 gates, and what I'm showing you at the right is a map of the congestion. Then the, the way to understand what that is, is that to remember that the, the placement happens on a grid, so I'm just drawing a very simple minded grid here. One gate in every square of the grid. And what you know is that we evaluate a net by it's half perimeter length, and it's half perimeter length is really just a box. Right? And so I'm just sort of coloring some squares in. Like, okay, that's a box in this grid. That's, that's a, a bounding box for a net. What we did is we have a very simple mathematical function that says, for every bounding box. For every net who's length we're evaluating. We calculate a number that's sort of a model of the likelihood that the physical wire we route that's associated with that box wants to use every square in the grid underneath it. And so we calculate the simple function. It's related to how big the bounding box is and relate it to how many Gates there are on the net that defines that bounding box. We calculate this simple function for every bounding box, and we take that number and we add it to the cell in the grid. So if there are a lot of bounding boxes representing a lot of nets, covering a particular square of the grid there's a lot of demand for that. Region of space. There's potentially a lot of congestion. Right? And congestion is bad. There's only s many wires you can put in every space on the grid. And what we're plotting is that function evaluated at the end of every temperature of the annealer. And so what you're seeing at the right is the first frame of a movie of this heat map associated with net congestion. And so what you see is, there's a whole bunch of red, and some orange, and a lot of yellow. That's bad. There are a lot of bounding boxes representing a lot of nets all on top of each other, all wanting to use that space. And what we're going to show is, in this movie, is, what happens as the annealer actually runs. And so have to go back here, and turn the arrow, the pen back into an arrow. And so here's the movie of the congestion of the annealer, as the annealer sort of reduces the wire length. because remember, the annealer doesn't really understand this stuff. you can put these kinds of things into the formulation of the annealer. You don't have to anneal just the wire length. Right? But this particular version is just annealing the wire length and one of the things you're seeing is that just by making the wire length better, right? By making the gates closer together, by making the wires and their estimated length shorter, we're getting almost all the, all the red is out. There's a little bit of orange you can see a little bit way up on the top on the left. We're getting most of the red out almost all the orange out. There's a few little yellowy kind hot spots. This is dramatically better. What you're seeing is most of the wires are now short. Most of the mounting boxes are now small, and there's not that many places where there are lots and lots of mounting boxes on top of each other. And in particular, there's not that giant hot spot on top of the chip. This is a really pretty cool result of showing how simulated annealing can do a really good job of, of sort of improving the quality of something like a placement. I want to also do something about some frequently asked questions, because this just happens all the time. People see simulated annealing, they often get very excited, because a very cool algorithm. And there's a bunch of questions they ask and it's often the case that people have misconceptions. So let me just go through these. First, does simulated annealing always find the perfect best global optimum? No. Gosh, we would all be out of jobs if that was true. If you could take any combinatorial optimization problem. Apply a simulated annealing algorithm and get a perfect answer in, you know, like an hour or overnight? None of us in computer science would have jobs. No. It doesn't find the best global answer, it's just good at avoiding a lot of bad answers. Really, very good at avoiding a lot of bad local minimums. Second question, does annealing work on every type of optimization problem? No. It does work on very many, but it's not always the most efficient option. It is often the one that people try first because it is often not very hard to build a sort of crude simulator, the annealer, to just go try and see if you can make some progress on your problem. but it's not always the right, most efficient answer. Third one, is annealing always slow, doing all those many swaps over many temperatures? If you remember the the previous slide where I talked about the, a couple of slides ago, about the actual annealing curve? That annealer was doing a quarter of a million swaps at every temperature, and it was doing a lot of temperatures. No, annealing is not always slow. There's a lot of engineering tricks to make it go fast, but yeah, for some big problems, it can be slow. Do I have to set all those parameters how hot the initial temperature is and M, how many swaps per gate at every temperature? And there was a line in that psedo code that says, the new temperature is 90% of the old temperature. Do I have to set all those things by trial and error? No. There are fancy adaptive techniques to determine these things automatically. There are actually lots of interesting papers on how you automatically control a simulated annealer based on the dynamics of the problem you measure as the problem runs itself. and finally, what happens if I run my annealer severen, several different times? I mean, what if I change the random seed that starts the random number generator every time. And the answer is, you get a different random answer every time. And you know, probably those answers cluster in terms of the quality of their solution. But if you run it enough times You might find a really bad answer. It just happens. And, you know, if run it enough times you might find a really good answer, one a little better than the others. And so I'm going to let you a dirty little secret of all real simulated annealing algorithms in sort of commercial use. You never ever run it once. You always, always, always run it at least twice. And you take the best answer. So here's my high-level summary for simulated annealing for placement. Simulated annealing is a very famous and successful method. It is much better than dumb random improvement, and really, it was the dominant method for placers in most of the end of the 1980s and some of the 1990s. And still today there are lots of other CAD tasks where it's very important. And it's just important to understand what a simulated annealing placer does. But I hate to tell you this, it's not really the way we do placers today. It's a very important technique to know for a lot of reasons. Annealing works well up to oh, 100,000 gates it works pretty well, a, a quarter of a million gates it's working okay. 400,000, 500,000 gates, really? Just too slow, not very efficient, and honestly, there're some other techniques that just give better answers. Especially if you want to do a million gates, 2 million gates, 5 million gates, 8 million gates, 10 million gates. There're just other ways of doing this. And that's what we're going to talk about next. So, annealing is too inefficient for things where there are a million or more gates. We need another set of interesting ideas. And so, that's what we're going to do next.