0:05

So, this is the same setup and

the question is how many people, this is in a much bigger group.

How many people would you have to ask before you found every day of the year?

Well, you have to ask at least 365 and

if you are lucky and everybody had a different birthday you'd be fine.

But generally you'd have to go much more.

So that again corresponds to throwing balls into urns.

And again randomly.

And then the question is how many do you have to throw in before every urn has at

least one ball in it.

So this one here.

Nine finally finds it's way in.

0:53

So generally, it'd be much more than the number of urns.

The question is, how long til each urn has at least one ball in it.

This comes from, I don't know what the rage is lately, say baseball cards or

magic cards or Charlie and the Chocolate Factory in the cards.

How many of the different ones you have to collect before you get

every possible coupon.

That's the same problem, in different types and

when you get one it's random how many you have to get before you get them all.

1:29

And there's actually plenty of practical applications where we want to know

the answer to this problem as well.

I have a 20-sided die, this is, again, just another illustration.

You keep rolling the die, and every time you get a new value,

you check off the value.

So how many times, you're going to have to roll the die,

so because of the birthday problem, after not too many rolls, you get a repeat.

So not new, and then after a while you start to get into plenty of repeats.

But the question is how many times do you have to roll it untill you get all

possible values?

So in this case now we're still looking for just two more possibilities.

Finally we get the 4, and

then after 37 rolls we have all possibilities For a 20 sided die.

So in general, if given M,

how many times you have to throw to get all different M values.

That's the coupon collector problem.

Now, with classical probability, this problem is actually not too difficult.

And so I'll give a classical analysis.

Actually, the analytic combinatorics is more complicated than this.

But it's still worth doing because the classical analysis just gives the average.

And if you want to study variants of a problem, or

other properties of the distribution,

you're going to need the structure that we get from analytic combinatorics.

But anyway, here's the classical analysis.

So, first thing is the probability that you need more than j

rolls to get the k+1st coupon is k/M to the j.

That's the probability that the first j rolls are all from the same k values.

3:21

Again, adding those up in the same way,

the expected number of rolls to get the (k+1)st coupon is just solving that,

which is 1/(1-k/M) which is the same as M/(M-k).

So that's for the (k + 1)st coupon,

that's the expected number of rolls to get the (k + 1)st coupon, and so

the expected number of rolls to get all the coupons is just summing that

because that's an expectation, and we can sum expectations.

So sum of that, of M/M- K, K from 0 to M, that's just M times the harmonic numbers.

4:41

Although, really, the motivation for

this is much more interesting when we look at variations later on.

So start out the same way as we did for the birthday problem.

Coupon collector sequence is an N word that no empty set.

Or it's a string that uses all the letters in the alphabet.

So how many coupon collector sequences are there?

5:05

So, for example, for m equals 26, that's one.

Uses all the letters of the alphabet.

Well known one.

Then, after m equals five.

There's an example.

So, no empty sets So, again, we set it up the same way, as we did before.

5:26

With using exponential generating functions treated as a label set.

So what is a coupon collector sequence?

It's a, we have M different letters, so

it's a sequence of M different letters And now our sets have to be non-empty,

so there just has to be a set size greater than zero.

And that immediately gives us a EGF equation that,

either it's the C-1 to the m.

5:59

So, extracting coefficients, getting the coefficient of

n factorial comes the coefficient of z to the n and we get this complicated

equation, for the counting sequence, for the number of coupon collector sequences.

Out this is an so helpful but

6:21

if you go ahead and extract the coefficient to Z to the end getting

alternating some and so not necessarily so helpful.

I mean it’s asymptotic to be It's like m to the n, and

then a smaller number to the n so subatomically it's m to the n.

But all that means is if you have a long string almost all,

if you have a random string of n objects where n's bigger than m.

Almost all of them are going to use all the letters if N is

significantly bigger than M.

So that's not helpful.

What we need to do is exactly evaluate this.

7:08

So the probability, the random

M-word length N is a coupon collector sequence is that, what I just showed.

And this is a little bit complicated to

evaluate because of the alternating sum feature of it.

7:50

And while if you have studied Kanuth eventually you can get the solution MHM,

but it really is still kind of a magic solution.

So this is possible to get to the end, but

this is not a recommended way to get to the end.

For this problem.

8:09

When we look later on at generalizations and asymptotic

methods in the second part of the course, we'll kind of see why this happens.

But for the elementary derivation this one doesn't work.

Now there is a way to do it with OGFs

that pretty much mirrors the classical derivation.

So if we define WMk, it'd be the class of M words that have k different letters.

With the last letter appearing only once.

8:50

So then you have an OGF for those, and

then we can actually make a probability

generating function just by evaluating that at Z over N.

And so then if you differentiate that

probability generated function evaluated C equals 1 you get the mean weight time.

For k coupons.

So that's pretty simple generating function on manipulations.

And so, what we'll do on the next slide is

use a combinatorial construction to get an equation for the generating function and

then perform this operation to give us a solution.

So here's the construction.

So if we have m words with k different letters.

The last letter appearing only once.

9:57

And then that construction immediately translates

to this generating function equation, which I won't read out.

And then evaluate at z over M to get the probability generating function.

Differentiate and evaluate at 1, then we get a recurrence relation for

the thing that we're looking at, the wait time for k coupons, and

that leads us to the same sum that we got in the elementary derivation,

which is going to lead to, again, the end result.

10:42

And our answer in practice is, if you

know how many different baseball cards there are, multiplied by the log of that.

Now that's how many you're going to have to look for to get every possible coupon.

So for 20 sided die it's about 60 or

on application Is maybe testing pages in memory,

you do it by having a program randomly access the pages,

11:12

maybe that you want to be sure that the test hits every page.

Then you can figure out, you can take, if M is a million,

then it's going to take you about a million times natural logarithm,

which is about 14.5 million Steps to test every page.

That's an example of an application in the real world.

It's a well known phenomenon that has lots of applications.

And that's the combinatorics of the coupon collector problem.

There is a combinatorial concept called a surjection that does

really need analytic combinatorics to study.

11:50

So what we call a coupon collector sequence is,

it's an M-word with no empty set.

So that's called an M-surjection.

But there's a more general thing which is just a word that is there is

an M-surjection for some M and that appears in lots of applications.

So that is either it uses just one letter or

uses two letters, but without leaving any out.

Where it uses three letters without leaving any out.

And this is a combinatory logic that's a well defined and

easily handled with the symbolic method.

So this is what we did for M-surjections.

It's a sequence of non-empty sets, e to the Z minus 1 to the M.

And we hit for that.

For surjections, it's not a sequence of size M, it's a sequence of any length.

12:56

set of, our empty set is this z-1.

Sequence is 1 over -1 so it's 1 over 2 e to the z.

And so how many subjection are there linked in.

We'll see that this is best handled with complex asymptotics but it's N.

Over 2(log 2) to the N+1.

That's a classical example in combinatorics.

Related to the coupon collector problem.

And we'll see this coming up in more detailed studies in part two.

So that's the coupon collector problem.

And next we'll go onto applications in computer science.