Now the question is, how to choose the size of your hash table?

Of course, it control the amount of memory used with m which is your

chronology of the hash functions and which is equal to the size of the hash table.

But you also control the speed of the operations.

So ideally, in practice, you want your load factor alpha to be between 0.5 and 1.

You want it to be below 1 because otherwise you store too much keys in

the same hash table and then everything could becomes slow.

But also you don't want alpha to be too small because that way you will

waste a lot of memory.

If alpha is at least one-half, then you basically use linear memory

to store your n keys and your memory overhead is small.

And operations still run in time, O(1 + alpha) which is a constant time,

on average if alpha is between 0.5 and 1.

The question is what to do if you don't know in advance how

many keys you want to store in your hash table.

Of course, there is a solution to start with a very big hash table, so

that definitely all the keys will fit.

But this way you will waste a lot of memory.

So, what we can do is copy the idea you learned in the lesson about

dynamic arrays.

You start with a small hash table and

then you grow it organically as you put in more and more keys.

Basically, you resize the hash table and

make it twice bigger as soon as alpha becomes too large.

And then, you need to do what is called a rehash.

You need to copy all the keys from the current hash table to the new bigger hash

table.

And of course,

you will need a new hash function with twice the chronology to do that.

So here is the code which tries to keep loadfFactor below 0.9.

And 0.9 is just a number I selected, you could put 1 here or

0.8, that doesn't really matter.

So first we compute the current loadFactor, which is the ratio

of the number of keys stored in the table to the size of the hash table.

And if that loadFactor just became bigger than 0.9,

we create a new hash table of twice the size of our current hash table.

We also choose a new random hash function from the universal family

with twice the cardinality coresponding to the new hash table size.

And then we take each object from our current hash table, and

we insert it in the new hash table using the new hash function.

So we basically copy all the keys to the new hash table.

And then we substitute our current hash table with the bigger one and the current

hash function with the hash function corresponding to the new hash table.

That way, the loadFactor decreases roughly twice.

Because we added, probably just added one new element,

the loadFactor became just a little more than 0.9.

And then we increase the size of the hash table twice while the number of keys

stayed the same, so the loadFactor became roughly 0.45,

which is below 0.9, which is what we wanted.

So to achieve that, you need to call this procedure rehash after

each operation which inserts something in your hash table.

And it could work slowly when this happens because the rehash

procedure needs to copy all the keys from your current hash

table to the new big hash table, and that works in linear time.

But similarly to dynamic arrays, the amortized running time will still be

constant on average because their hash will happen only rarely.

So you reach a certain level of load factor and

you increase the size of our table twice.

And then it will take twice longer to again reach too high value of load factor.

And then you'll again increase your hash table twice.

So the more keys you put in, the longer it takes until the next rehash.

So their hashes will be really rare, and

that's why it won't influence your running time with operations, significantly.