Get a month of TabletWise Pro for free! Click here to redeem

Hash table double hashing

Share the link to this class

 Get access to thousands of classes and millions of flashcards

Sign up or log in to access this lesson
Already have an account? Log in


All right, let's talk about hash tables and the double hashing, open addressing collision resolution technique. So just a quick recap. For those of you who don't know how we do insertions into hash tables or open addressing, we start with a variable x initialized to one, we compute the hash value of the key. We set that to be the first index that we're going to check in our hash table to see if it's not No. So the goal is to find an empty slot. So we're going to loop while we are still hitting spots where there are already keys.

And every time that happens, we're going to offset our key hash using a probing function in our case is going to be the double hashing probing function. And we're also going to increment our value of x so that we keep probing along further and further. Once you find a free spot, then we can insert our key value pair into the hash table. Okay, so what's the deal with double hashing? So double hashing is just a probing method like any other. But what's special about it is that we probe according to a constant multiple of another hash function.

So specifically, our probing function looks something like this. We give it as input, the key which is a constant, and the variable x, and then we compute x times h sub two of k, where h two k is a secondary hash function. So here's an important note. h two k must hash the same type of keys as h one, okay? So if your key is a string, well, it To have k must hash strings. If h one a K is a hash as integers, then h two of K must also hash integers.

So here's a note at the bottom that notice that double hashing actually reduces to linear probing, except that the constant is unknown until runtime, because we dynamically computed with h two of K. So since double hashing reduces to linear probing at runtime, we may be stuck with the same issue we saw with linear probing, which is that we get stuck in infinite cycle. So if p of x equals three x meaning our secondary hash function at runtime cafe, a value of three for the constant and say, h one if k was four, at the table size was nine, then we might end up with a following cycle occurring So the cycle produces values of four, seven, and one, which makes it impossible to reach any of the buckets, 02356, and eight. So if four, seven and one are already occupied, that means we're stuck in an infinite loop inside a hash table.

And we're not able to insert our key value pair because we're not able to go anywhere else. So this is an issue we have to deal with. So to fix the issue of cycles with double hashing, here's one strategy. We're going to pick our table size to be a prime number, and we're going to compute a value called delta. So delta is just going to be h of K mod n. But occasionally Delta might be zero. And if that's the case, then we're going to be guaranteed to be stuck in a cycle because we're not going to go anywhere.

When we probe hills. We're to be multiplying zero. So when this happens, just set delta to be equal to one. All right. So here's the justification why this works. So notice that delta is always going to be between one inclusive and and non inclusive.

So the greatest common denominator between delta and n as you want to be one, since n is a prime number. So therefore, with these conditions, we know that the probing sequence is guaranteed to have order n. So we are able to hit every single slot in our hash table. So given that there's at least one free slot in the hash table, which there will be because we're keeping our load factor below a certain threshold, then we're going to be able to reach that slot. Okay, so here's a core question how do we construct our secondary hash function, suppose the keys we're using have type T. And whenever we use double hashing, and we need to construct h two K, two hash keys are also of type T. So, it would be really nice if we had a systematic way of generating these new hash functions every time we need one, right?

Because we may be dealing with multiple different data types of technique. So luckily for us in computer science, almost every object we ever create is constructed of the same fundamental building blocks in particular integers, strings, real numbers, fixed length vectors, and so on. So we can use this to our advantage. Luckily, there are many well known hash functions for these fundamental data types. And we can combine them when we want to construct our new hash function h two of K. frequently when we compose hash functions, We select them from pool hash functions called Universal hash functions, which generally operate on these fundamental data types, which is quite convenient. Alright, let's have a look at an example with double hashing.

So suppose we have an originally empty hash table above, and we selected our probing function to be p of x equals x squared plus h sub two of K, which is what we needed to be in our table size to be an equal seven. Notice that that's a prime number. And we set our max load factor to be alpha equals point seven, five. So our threshold to resize fives once we hit five elements, we need to grow the table size. All right, so let's insert all these key value pairs on the left into this hash table. So first key value pair is k one and v one.

Now suppose that the hash value for the first hash function of k one is 67 and h two k one is 34. And first thing we do is we compute delta. So we compute h two of K one modulo seven, which is the table size and we obtain six. And then we compute where this key value pair should go and it should go at position four. So now k two suppose h one of K two is two and H. Two of K two is negative 79. So delta is five, but we're just going to insert position two because there's no collisions.

So each one is k three is two and h two k three is 10. These are just arbitrary numbers I chose for these hash functions. So then delta would be three in this, in this case, because 10 mod seven is three. So we have a hash collision, because we're trying to insert our K three at index two, but k two is already there. So we need to do so you need to probe. So we're going to compute the position to be our original hash function position.

So h one of K three at position two plus one times our delta value, mod seven, that gives us five. So we're going to hop to position five right there. All right, now let's insert k four. So suppose now that h one of K four is equal to two, and h two of K four is equal to seven. So let's compute delta. So h2 K four modulo seven is equal to 0.7 months seven is zero.

So when this happens, we know that we need to set delta equal to one so that we don't get stuck in a loop. So now, each one of K four plus zero times delta mod seven gives us two. So we have a hash collision at the bucket for index two, so you keep probing. So now we probe by multiplying by delta. That gives us index three. And then we, we were able to insert so now we're going to try insert k three, which is already in our hash table actually, but with a new value.

So we're going to be performing an update this time. So suppose h one k three is equal to two. Actually, we should have known that from before and HTF k three is 10 so compute delta. So we have a collision. And we find k three, which was already in there and update its value. Now suppose the first hash function for K six is three, and the secondary hash function of k six is 23, then that gives us a delta value of two.

So when we try to insert, it goes to position three. So when you probe, so we compute one times delta mod seven uses five. There's another collisions, and then we compute the offset at two times delta minus seven, which gives us zero. And we find a free slot so we're able to insert our key value pair there. So notice that we actually inserted five elements, so it's time to resize and grow the table because we've reached the maximum threshold so One strategy when we're trying to resize the table with double hashing because we need to keep our table size to be a prime number is to double the table size and then find the next prime number above this value. So in our case, when we double, we get 14 and the next prime number 14 is 17.

So 17 is going to be the new table size. So allocate a new table of size 17, and go through the elements in the old table and insert them into the new table. So from before h one, k six is three, and h two of K six is 23. We compute delta and we know we're going to put it at position three and there's no hash collision. Next up and nothing in position one, position two, we have k two so the hash value for K two To in the secondary hash value of K to negative 79, from before to compute delta to be sex, so we're going to insert that at position two with no collision. Next one, k four.

So we know h one of K forest two, h two of K four would be seven. So we compute our delta value and notice that our delta value now is not zero, which it was before, but seven because our mod is 17 Delta seven, so we have a hash collision here. But we need to keep probing. So compute the offset at one times delta which gives us nine mod 17. Next one, insert k one. Suppose h one if k one is two and h two k one is 34 then compute delta, and that gives us zero.

So when delta zero said to be one. So now, h one of K one plus zero times delta gives us two. So there's a cushion as we keep probing. So now compute the offset at one times delta, there's still a collision, keep incrementing, the x value. So now two times delta t is four, and we find a free slot so and so that key value pair there. And the last one, k three.

So from before h one, k three is two and h two of K three is 10 deltas than 10. And we have a hash collision. So one times delta now gives us 12. And that slot is free. And we reached the end of the new table, so get rid of the old table and replace it with a new table. And let's finish inserting our last key value pair from before which is k seven.

Suppose k of seven is equal to 15 and h two of K seven is three, then our delta value is three, and we're able to insert it at position 15. Okay, that is double hashing in a nutshell. If you like this video, give us a thumbs up and please subscribe. And if you're confused about something, drop a comment. I'll try and get back to you. And in the next video, we're going to be looking at how to remove elements in the hash table using the open addressing scheme.

Guys, thank you so much for watching. Catch you next time.

Sign Up