0:00

Previous videos covered an outstanding algorithm for the selection problem, the

Â problem of computing the Ith statistic of a given array. That algorithm which we

Â called the R select algorithm was excellent in two senses. First its super

Â practical, runs blazingly fast in practice. But also it enjoys a satisfying

Â theoretical guarantee. For every input array of length N at the expected running

Â time of R select is big O of N, where the expectation is over the random choices of

Â the pivots that R select makes during execution, now in this optional video

Â I'm going to teach you yet another algorithm for the selection

Â problem. Well why bother given that our select is so good? Well frankly, I just

Â can't help myself. The ideas of this algorithm are just too cool not to tell

Â you about, at least in optional video like this one. The selection algorithm

Â , we cover here is deterministic. That is, it uses no randomization

Â whatsoever. And it's still gonna run in linear time, big O of N time. But now, in

Â the worst case for every single input array. Thus, the same way Merge Short gets

Â the same asymptotic running time, big O of N log N, as quick sorts gets on

Â average. This deterministic algorithm will get the same running time O of N, as the R

Â select algorithm does on average. That said, the algorithm we're gonna cover

Â here, well, it's not slow. It's not as fast as R select in practice, both because

Â the hidden constants in it are larger. And also because it doesn't' operate in place.

Â For those of you who are feeling keen, you might wanna try coding up both the

Â randomized and the deterministic selection algorithms, and make your own measurements

Â about how much better the randomized one seems to be. But if you have an

Â appreciation for Boolean algorithms, I think you'll enjoy these lectures

Â nonetheless. So let me remind of the problem. This is the i-th order

Â statistic problem. So we're given an array, it has N distinct entries. Again,

Â the distinctness is for simplicity. And you're given a number I between one and N.

Â You're responsible for finding the i-th smallest number, which we call

Â the i-th order statistic. For example, if I is something like N over

Â two, then we're looking for the median. So let's briefly review the randomized

Â selection algorithm. We can think of the deterministic algorithm covered here as a

Â modification of the randomized algorithm, the R select algorithm. So when that

Â algorithm is passed in array with length N, and when you're looking for the

Â i-th order statistic, as usual, there's a trivial base case. But when

Â you're not in the base case, just like in Quick Sort, what you do is you're gonna

Â partition the array around pivot element P. Now, how are you gonna choose P? Well,

Â just like Quick Sort, in the randomized algorithm, you choose it uniformly at

Â random. So each of the N elements of the input array are equally likely to be

Â chosen. As the pivot. So, call that pivot p. Now, do the partitioning. Remember

Â partitioning puts all of the elements less than the pivot to the left of the pivot.

Â We call that the first part of the partitioned array. Anything big, bigger

Â than the pivot gets moved to the right of the pivot. We call that the second part of

Â the array. And let j denote the position of the pivot in this partitioned array.

Â Equivalently, let j be what order statistic that the pivot winds up

Â happening to be. Right? So, we happen to choose the minimum element then J's gonna

Â be equal to one. If we happen to choose the maximum element, J's gonna be equal to

Â n. And so on. So, there's always the lucky case, chance one in N, that we happen to

Â choose the Ith order statistic as our pivot. So, we're going to find that out

Â when we notice that J equals I. In that super lucky case, we just return the pivot

Â and we're done. That's what we're looking for in the first place. Of course, that's

Â so rare it's not worth worrying about, so really the two main cases depend on

Â whether the pivot that we randomly choose is bigger than what we are looking for or

Â if it's less than what we are looking for. So, if it's bigger than what we are

Â looking for, that means J is bigger than I, we're looking for the Ith smallest, we

Â randomly chose the J'th smallest. Then remember we know that the Ith smallest

Â element has to lie to the left of the pivot. Good element in that first part of

Â the partition array. So we recurs there. It's an array that has J-1 elements in it,

Â everything less than the pivot. And we're still looking for the Ith smallest among

Â them. In the other case, this was the case covered in a quiz a couple videos back, if

Â we guess a pivot element that is less than what we're looking for, well then we

Â should discard everything less than the pivot and the pivot itself. So we should

Â recurs on the second part of A, stuff bigger than the pivot. We know that's

Â where what we're looking for lies. And having thrown away J elements, the

Â smallest ones at that. We're rehearsing on a ray of [inaudible] and minus J, I'm

Â looking for the [inaudible] smallest element in that second part. So, that was

Â the randomized selection algorithm, and you'll recall the intuition for why this

Â works is random pivot should usually give pretty good splits. So the way the

Â analysis went is we should. Each iteration, each recursive call, with 50

Â percent probability, we get a 25/75 split or better. Therefore, on average, every

Â two recursive calls, we are pretty aggressively shrinking the size of the

Â recursive call. And for that reason, we should get, something like a linear time

Â bound. We do almost as well as if we picked the median in every single call,

Â just because random pivots are a good enough proxy of best case pivots, of. The

Â median. So now the big question is: suppose we weren't permitted to make use

Â of randomizations. Suppose this choose-a-random-pivot trick was not in our

Â tool box. What could we do? How are we going to deterministically choose a good

Â pivot? Let's just remember quickly what it means to be a good pivot. A good pivot is

Â one that gives us a balanced split, after we do the partitioning of the array. That

Â is, we want as close to a 50/50 split between the first and the second parts of

Â the partitioned array as possible. Now, what pivot would give us the perfect 50/50

Â split? Well, in fact, that would be the median. Well, that seems like a totally

Â ridiculous observation, because we canonically, are trying to find the

Â median. So previously we were able to be lazy, and we just picked a random pivot,

Â and used that as a pretty good proxy for the best case pivot. But now, we have to

Â have some subroutine that deterministically finds us a pretty good

Â approximation of the median. And the big idea in this linear time selection

Â algorithm, is to use what's called the median of medians as a proxy for the true

Â meaning of the input array. So when I say median of medians, you're not supposed to

Â know what I'm talking about. You're just supposed to be intrigued. Now, let me

Â explain a little bit further. Here's the plan, we're gonna have our new

Â implementation of chose pivot and it's gonna be deterministic. You will see no

Â randomization on this slide, I promise. So the high-level strategy is gonna be we're

Â gonna think about the elements of this array like sports teams, and we're gonna

Â run a tournament, a 2-round. Knockout tournament, and the winner of this

Â tournament is going to be who we return as the proposed pivot element. Then we'll

Â have to prove that this is a pretty good pivot element. So there's gonna be two

Â rounds in this tournament. Each element, each team is gonna first participate in a

Â world group, if you will. So they'll be, small groups of five teams each, five

Â elements each. And to win your first round, you have to be the middle element

Â out of those five. So that'll give us N over five first round winners. And then

Â the winner of that second round is going to the med-, be the median of those N over

Â five winners from the first round. Here are the details. The first step isn't

Â really something you actually do in the program, it's just conceptually. So

Â logically, we're going to take this array capital A, which has N elements, and we're

Â gonna think of it as comprising N over five groups with five elements each. So if

Â N is not a multiple of five, obviously, there'll be one extra group that has size

Â between one and four. Now for each of these groups of five, we're going to

Â compute the median, so the middle element of those five. Now for five elements, we

Â may as well just invoke a reduction to sorting; we're just gonna sort each group

Â separately, and then use the middle element, which is the median. It doesn't

Â really how you do the sorting. Because after all, there's only five elements. But

Â you know, let's use [inaudible] sort, what the heck. Now what we're going to do is

Â we're going to take our first round winners and we're gonna copy them over

Â into their own private array. Now this next step is the one that's going to seem

Â dangerously like cheating, dangerously like I'm doing something circular and not

Â actually defining a proper algorithm, so c you'll notice has linked over n over five.

Â We started with an array of link n. This is a smaller input. So let's recursively

Â compute the median of this array capital c. That is the second round of our

Â tournament amongst the n over five first-round winners, the n over five

Â middle elements of the sorted groups. We recursively compute the median, that's our

Â final winner, and that's what we return as the pivot element from this subroutine.

Â Now I realize it's very hard to keep track of both what's happening internal to this

Â juice pivot subroutine and what's happening in the calling function of our

Â deterministic selection algorithm. So let me put them both together and show them to

Â you, cleaned up, on a single slide. So, here is the proposed deterministic,

Â selection algorithm. So, this algorithm uses no randomization. Previously, the

Â only randomization was in choosing the pivot. Now we have a deterministic

Â subroutine for choosing the pivot, and so there's no randomization at all. I've

Â taken the liberty of in-lining true's pivot subroutine. So that is exactly what

Â lines one, two, and three are. I haven't written down the base case just to save

Â space I'm sure you can remember it, so if you're not in the base case. What did we

Â do before? The first thing we do is choose a random pivot. What do we do now? Well,

Â we have steps one through three. We do something much more clever to choose a

Â pivot. And this is exactly what we said on the last slide. We break the array into

Â groups of five. We sort each group, for example, using merge sort. We copy over

Â the middle element of each of the n over five groups into their own array capital

Â c. And, then, we recursively compute the median of c. So, when we recurs on select

Â that we pass the input c. C has n over five elements so that's the new link.

Â That's a smaller link than what we start with so it's a legitimate recursive call

Â refining the median of n over five element. So, that's gonna be the n over

Â tenth order statistic. As usual. Well to keep things clear I'm ignoring stuff like

Â fractions, in your real implementation you'd just round it up or down. As

Â appropriate. So steps one through three are the new [inaudible] step routine that

Â replaces the randomized selection that we had before. Steps four through seven are

Â exactly the same as before. We've changed nothing. All we have done is ripped out

Â that one line where we chose the pivot randomly and pasted in these lines one

Â through three. That is the only change to the randomized selection algorithm. So,

Â the next quiz is a standard check that you understand this algorithm, at least, not

Â necessarily why it?s fast; but, at least, just how it actually works. And I only ask

Â you to identify how many recursive calls there are, each time. So, for example in

Â [inaudible] there's two recursive calls, in quick-sort there's two recursive calls,

Â in r-select there's one recursive call. How many recursive calls do you have each

Â time, outside of the base case in the d-select algorithm? All right, so the

Â correct answer is two. There are two recursive calls in deselect, and maybe the

Â easiest way to answer this question is not to think too hard about it and literally

Â just inspect the code and count, right namely there's one recursive call in line

Â three, and there's one recursive call in either six or seven, so quite literally,

Â you know there's seven lines of code, and two of the ones that get executed have a

Â recursive call so the answer is two. Now what's confusing is that in the random, a

Â couple things, first in the randomized selection algorithm, we only have one

Â recursive call. We have the recursive call. In line six or seven, we didn't have

Â this in line three. That one in line three is new compared to the randomized

Â procedure. So we're kind of used to thinking of one recursive call using the

Â divide and conquer approach to selection, here we have two. Moreover. Conceptually.

Â The roll of these two recursive calls are different. So the one in line six or seven

Â is the one we're used to. That's after you've done the partitioning so you have a

Â smaller sub-problem and then you just recursively find the residual or statistic

Â in the residual array. That's sort of the standard divide and conquer approach.

Â What's sort all crazy. Is this second use of a recursive call which is part of

Â identifying a good pivot element for this outer recursive call and this is so

Â counter-intuitive, many students in my experience don't even think that this

Â algorithm will hold, sort of, they sort of expect it to go into an infinite loop. But

Â again, that sort of over thinking it. So let's just compare this to an algorithm

Â like merge sort. What does merge sort do? Well it does two recursive calls and it

Â does some other stuff. Okay. It does linear work. That's what it does to merge.

Â And then there are two recursive calls on smaller sub problems, right? No issue. We

Â definitely feel confident that merge [inaudible] is gonna terminate because the

Â sub problems keep getting smaller. What does deselect do, if you squint? So don't

Â think about the details just [inaudible] high level. What is the work done in

Â deselect? Well. There are two recursive calls, there's [inaudible] one's in line

Â three, one's in line six or seven, but there's two recursive calls on sm, smaller

Â sub problem sizes. And there's some other stuff. There's some stuff in steps one and

Â two and four, but whatever. Those are recursive calls. It does some work. Two

Â recurs have caused the smaller sub-problems, TI's got to terminate. We

Â don't know what the run time is, but it's got to terminate, okay? So if you're

Â worried about this terminating, forget about the fact that the two recurs of

Â cause have different semantics and just remember, if ever, you only recurs on

Â smaller sub-problems, you're definitely going to terminate. Now, of course who

Â knows what the running time is? I owe you an argument on why it would be anything

Â reasonable, that's going to come later. In fact what I'm gonna prove to you is not

Â only does this selection algorithm terminate, run in finite time, it actually

Â runs in linear time. No matter what the input array is. So where as with R select,

Â we could only discuss its expected running time being linear. We showed that with

Â disastrously bad choices for pivots, R selects can actually take quadratic time.

Â Under no circumstances will deselect ever take, ever take quadratic time. So for

Â every input array it's big O of N time. There's no randomization because we don't

Â randomly do anything in choose pivot, so there's no need to talk about average

Â running time; just the worst case running time over all inputs is O of N. That said,

Â I want to reiterate the warning I gave you at the very beginning of this video which

Â is, if you actually need to implement a selection algorithm, you know, this one

Â wouldn't be a disaster. But it is not the method of choice, so I don't want you to

Â be misled. As I said there are two reasons for this. The first is that the constants

Â hidden in the begon notation are larger for V select than for R select. That will

Â be somewhat evident from the analyses that we give for the two algorithms. The second

Â reason is, recall we made a big deal about how partitioning works in place and

Â therefore quicksort and r select also work in place, that is, with no real additional

Â memory storage. But in this deselect algorithm we do need this extra array c to

Â copy over the middle elements, the first round winners. And so the extra memory, as

Â usual, slows down the practical performance. One final comment. So for

Â many of the algorithms that we cover, I hope I explain them clearly enough that

Â their elegance shines through and that for many of them you feel like you could have

Â up with it yourself, if only you'd been in the right place at the right time. I think

Â that's a great way to feel and a great way to appreciate some of these very cool

Â algorithms. That said, linear time selection, I don't blame you if it, if you

Â feel like you never might have come up with this algorithm. I think that's a

Â totally reasonable way to feel after you see this code. If it makes you feel

Â better, let me tell you about who came up with this algorithm. It's quite old at

Â this point, about 40 years, from 1973. And the authors, there are five of them and at

Â the time this was very unusual. So, Manuel Blum. Bob Floyd. Vaughn Pratt. Ron Rivest.

Â