The next data structure we can use to implement a multi-map is a hash table. Hash tables are extremely versatile, efficient, and widely used data structures. They are more or less the standard data structure used to represent sets and maps in practice. So let's walk through an example where we're going to use a hash table to implement the same kind of multi-map that we implemented before using an ordered list of pairs. So, here's an empty hash table. An empty hash table consists of an array of buckets, and the buckets start out as empty boxes. You can think of these as null references or null pointers. As we add items to the hash table, these empty boxes will become lists. Also associated with this table is a hash function, which we'll refer to with this red h. And the hash function maps each distinct key, each distinct 3-mer onto one of these buckets in this array. In our grocery store analogy, you can think of the buckets as aisles of a grocery store, and the hash function as the way that we assign groceries to aisles. So now, let's add all of the 3-mers of T to this table. So we'll start with the first 3-mer. And we use our hash function to assign it to a bucket. And let's say the hash function assigns it to the bucket indicated with the red arrow here. And so now we append the corresponding key value pair to this bucket, like this. So a given bucket is simply a link list. So to add our key value pair, we're simply appending an entry onto that link list. And the entry consists of a key, which is the 3-mer, which is highlighted in blue here. And then a value, which is the offset from which we got that 3-mer, which is shown in green here. And then a null reference, which is shown in pink. And so if we add some more entries later, we add them on to the end of the list, by expanding on this null reference here. So now let's add the second 3-mer. So say the hash functions assigns it to the top bucket, and so we append the corresponding entry to that bucket like this, and then let's do the third 3-mer, and let's say the hash function assigns it to that bucket indicated with the red arrow. So we append the element to that bucket. For the fourth 3-mer we do the same thing, and then now for the fifth 3-mer this is one that we've seen before, we've seen GTG before. So the hash function assigns it to the same bucket as before. And since there's all ready an entry in that bucket we have to add it on to the end of the list that's already present in that bucket. So, we could've pre-pended it to the beginning of the list too. It's not all that important which we do. So moving on, the next 3-mer is TGT. So let's say the hash function assigns TGT to this bucket. So again, we'll append the corresponding entry onto the end of the list. Now if we stop and consider the entries that are in this bucket now, the one toward the bottom of this slide, we notice that two different 3-mers are now both present in this bucket. This isn't too surprising since we have many more possible 3-mers than we have buckets here. So the pigeonhole principle tells us, we expect some distinct 3-mers to end up in the same bucket together like this, but when this happens, we call this a collision. And when there are many collisions, then querying the data structure can become somewhat slower. But it does happen now and then and it's not a terrible thing, but it's something to take note of. So onto the next 3-mer. We add this onto the end of this list. And then, the next one, which is TGG, which we add like this. And then, we go into the first of three 3-mers that are all triple G. Three GGG's. So let's say the hash function assigns GGG to this bucket here, so we append it onto the end of that list. And we notice that once again we have two different 3-mers that coexist in the same bucket, so that's again a collision. Then we move onto the second GGG and the third GGG, and of course those get assigned by the hash function to the same bucket, so they get appended on to the end of that list. Now say that we want to query this hash table. We want to query this index. So, let's say we have our pattern P here, and we're going to extract some 3- mer from this pattern. It doesn't have to be the first one, it could be. It doesn't matter which of the 3-mers we pick. Because when we built our index we used all of the 3-mers from the text phi. So let's go ahead and pick this one, this GGG, this triple G. And we want the index to tell us all the offsets where triple G occurs within the text T. So the first thing we do is we use our hash function to map triple G onto a bucket, we get this bucket here. We know that this is the one and only bucket that we have to look in because this is the bucket that the hash function assigns triple G to. In other words, we know we're in the right aisle of the grocery store now, we just have to go looking down the aisle. So we look at the first entry in the list. And we look at the key, at the 3-mer. And we noticed that this key is not equal to the 3-mer that we're querying with, it's not equal to triple G. So we ignore it and move on. And then for the next three entries in the list we noticed that the key does match our query 3-mer. In all three cases it's triple G. So in each case, we know that the corresponding offset in that key-value pair is one of the ones that the index is going to report back as being an offset within T where triple G occurs. So at the end of the day, we got offsets 8, 9, and 10. So in python, it's very easy to build and use a hash table because the python dictionary type is an implementation of a hash table. So here, I've initialized a python dictionary in a way that corresponds to the situation from the previous slide. This dictionary here. And here are the three-mers, the keys in this dictionary, and then associated with each 3-mer is a list of offsets where that 3-mer occurs. So for example, triple G, the 3-mer that we were quarrying with before, occurs at offsets 8, 9, and 10. And to query this dictionary, we simply use square bracket notation here, so for example, if we want to find all the offsets where the triple G query occurs, then we do this, and we get back the list 8, 9, 10. Just like form our previous example. If we want to know where the triple CGT occurs, we again can use square bracket notation and we get back a list that has one entry which is the offset 3. So the details here are hidden from us. We don't know what hash function is being used or exactly how many elements are in the hash table. All of that is taken care of by Python. But under the hood, when you use a dictionary in Python, you're using something like the hash table that we just saw.