Get a month of TabletWise Pro for free! Click here to redeem 
TabletWise.com
 

Hash table open addressing source code

00:14:41
Share the link to this class
Copied

 Get access to thousands of classes and millions of flashcards

Sign up or log in to access this lesson
Already have an account? Log in

Transcript

All right, today we're going to be having a look at some source code for a hash table that uses open addressing as a collision resolution scheme. And you can find all the source code on GitHub at William fiza. Slash data structures, and then just go to the hash table folder to find a whole bunch of hash table implementations. I have three different open addressing implementations here. In particular, we are quadratic probing, linear probing and double hashing. They're all very similar to each other.

So I will only be looking at one right now. But if you're curious, you can go on GitHub and check them out for yourself. The one that's really different, or slightly more different is the double hashing. But other than that, they are really essentially the same thing. But today, I've decided that we're going to have a look at the quadratic probing So let's dive into the code. All right, here we are inside the code for the hash table that uses quadratic probing.

So let's dive right in. So I have a class called hash table quadratic probing. And notice that it takes in two generic types K and V. So this is the key type, and this is the value type. So you're gonna have to specify these when you instantiate an object, which is a hash table for quadratic probing. So I have a bunch of instance variables that we're going to need. The first is the load factor.

So this is going to be like the maximum load factor that we're willing to tolerate the current capacity of our hash table, the maximum threshold we're willing to tolerate. The modification count, this is for the iterator. Next I have two instance variables that keep track of the Total number of buckets being used. And the key count, which tracks the number of unique keys currently inside hash table. Now since we're doing open addressing, we are storing these key value pairs directly inside an array. So instead of having one array with a wrapper class for an entry have decided just allocate two different arrays, one for keys and one for values, it makes the code a lot easier and shorter actually.

Okay, this is just an instance variable we're going to be using, or rather setting when we call the get method. So this is the tombstone I was talking about in the last video. This is the marker we're going to be using for deletions. So every time we deleted an entry, we're going to mark it with a tombstone. And we know this tombstone object is unique. All right, so these are just some default constants.

So whenever you want to initialize it hash table without any parameters, you can use these constants. So this is the default load factor, I set it pretty low, but you can change it up as you like. So you can initialize it with a default capacity. And this is the designee constructor. So let's have a look. So the capacity can't be negative or zero.

And I check if the user passes in some sort of weird load factor because we don't want that. So then set the max value for the load factor, then calculate the capacity. And to ensure that the capacity is a power of two, I need to essentially round up the capacity. So that's going to be with this method next to power essentially it. It finds the power of two that is just above this current Number, or it'll also take the default capacity, which itself is a power of two, so we don't have to worry about that. So either way, it's going to be a power of two.

Then we compute the threshold, which is just a load factor times the capacity and initialize our tables. All right, so let's get rolling. So this is the quadratic probing function I chose. So P of x, so you give it the variable x, and then we compute x squared plus x divided by two. So this is a really important method, the normalize index. So given a hash value, it essentially strips the negative sign and mods by the capacity so it dumps our hash value inside the domain, zero to the capacity non inclusive.

This is a clear method and this is pretty self evident. planetory just clear the contents of the cat of the hash table and start fresh. Then some helper methods size, get the key count and check if our hash tables empty and put add an insert, or essentially all the same method. So let's look at the insert method. This inserts a key value pair inside the hash table or updates, a value of the key already exists. All right, we don't want to allow no keys.

That happens we throw an exception. If the number of buckets use is greater than or equal to the threshold we're tolerating. We're going to resize the table before doing anything else. Otherwise, we want to calculate the hash value from the key using the built in hashCode method and you can override this for your particular object. Okay, now I need to explain what i j and x are. So I is going to be the current index or add in the hash table because we're gonna be bouncing around this I value is going to change a lot, then j is the position of the first tombstone we encounter if we encounter one, otherwise it's minus one.

And we're going to be using this for an optimization is an X, adjust the probe offset, which is set to one, initially. Okay, so this is a do while loop. It's a little long, but it's pretty easy to understand. Alright, so first we check in the key table. Have we hit a tombstone? If we have and j is equal to minus one that means we haven't hit a tombstone yet.

So save where we found this tombstone. Okay, so, this next check checks if the key table has an entry that's not no meaning there's a key inside of it. So we have to check If the key already exists in hash table, so that's what this does. It compares the current key index i with the key we're trying to insert this key. And if j is equal to minus one, meaning we haven't hit Tombstone, then just update the value. If we've hit a tombstone, then we want to do a swap with the tombstone.

And at the modification count, and always return like the old value that was there before. Just Just in case why use it. Alright, next up the current cells No, so we can do an insertion. So j is equal to minus one means that we haven't seen a tombstone so far. So increment number of use buckets and a key count and then store our key value. pair.

Otherwise, we have seen a tombstone. And instead of inserting where I element i, where the element is inserted where the deleted token was found. So we're inserting at j instead of AI. So here we're inserting an AI, but stay within sorry, J with tombstones. And we're going to return null because there was nothing there before. Okay, and if if we do a loop, so we get through all these if statements and we haven't returned, that means that we need to keep probing we had a hash collision and we, the cell we landed on, or it has something in it, so we need to probe so we need to offset where we first originally hash two plus the probing index, or the probe position and increments x Same time, so this will just hop to the next spot.

And we do this while we haven't found an empty slot and we will find an empty slot. All right, so contains key and has key, just check if a key exists within the hash table. And to do this, I'm being pretty lazy. And I'm just calling the get method. And I'm setting an instance variable in there called contains flag, which gets set to true or false whether a key is inside your hash table or not. Because the highest key in the get method would look would have essentially the same code.

So that's a bad code smell. All right, so let's look at the get method since it's being used in the hash key method. So sync has set up, find the original hash index is equal to the hash set j and X to what they were before. So essentially do all the same stuff, or mostly except set the flag. So that's different. We set the flag to be true when we identify that the key is indeed inside hash table.

And here are ELLs condition is just shorter, we return if we hit a null cell, instead of inserting a new element and set that contains flight to be false. Okay, so pretty easy. And the Remove method is actually quite a bit shorter. Interestingly. So same setup, check the keys know find the hash set x to be equal to one here, we don't really care about tombstones too much so we don't have a j position. So we're going to start the hash index and quadratically probe until we find a spot So for every loop, we're going to increment i or find a new offset position if this loop gets completed, so here's what we do, we ignore tombstones, so just skip over those.

So if this happens if the key was not found the hash table, we can return No. Otherwise, the key we want to remove is in the hash table. And we can do this check because we checked if it's null before, so we're not going to get a null pointer. So decremented key, count up the modification count, extract the old value and dump a tombstone here and just wipe whatever value was in there. So this is where we set this tombstones in the Remove method. And then just return the old value for good measure.

Right and Okay, these two methods are pretty simple. explanatory, they just return all the keys and return the values that are contained within our hash table. So the next really interesting method is this resize table method. So this is this gets called when we're inserting new elements, and we need to grow the table size. And remember that in this particular quadratic probing implementation, we always need the capacity to be a power of two. But since we know that the capacity is already a power of two, multiplying by two will keep it a power of two.

So that's fine. So we compute the new threshold, allocate some new memory for a new table. I call it old table, but it's actually going to be the new table shortly. So here I perform a kind of interesting maneuver here. I swap the current table There is a small table with this new table, which I called Old table. In order to be able to call the insert method down here, we'll get to that.

So swap the key tables, swap the value tables, reset the key count and the bucket count. And the reason I call it all key table was because since the swap, well, the the the new table is actually the current pointer for what was the old table. That might sound confusing, but I'm using the table we had before to do insertions on or the pointer to it. Alright, so loop through the old table values. And if we encounter a token, or a pointer that's not no and not a tombstone that we want, insert it So because we're avoiding reinserting tombstones, we're getting rid of all the tombstones. So even though our table might have been cluttered with tombstones, we're just getting rid of them here.

All right. So that's that. The iterator you can probably have a look at this yourself is just looping through all the keys and returning them one at a time. That's a pretty standard two string method. So that's how you do quadratic probing with open addressing guys if you learn something, like this video, and drop a comment. Thank you so much for watching.

Sign Up