Design a Hashset - LeetCode Interview Coding Challenge [Java Brains]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to this java interview coding challenge video where we don't just solve coding problems no no no we do more than that we learn how to tackle them in an actual interview setting right how to solve the problem and what to say what to discuss to maximize your chances of success in an interview all right in this video we're going to be tackling lead code problem 705 designing a hash set all right how do you design a hash set here's the problem statement the interviewer tells you design a hash set without using any built-in hash table libraries okay this is what the interviewer asks you to do okay and if they leave you at this it's up to you to ask for the questions right you need more detail um here is some detail that you can you can kind of extract from the interior like what are the apis that you need i said there's a whole lot of stuff but there are some core apis that provides okay so your design let's get into your answers to your questions and says here are the things i want you to do okay so here are three things you should be doing first is adding a value a hash that needs to be able to add some value so given an add method your class your custom implementation of asset needs to have an add method which takes in a value and then adds it inserts it to your hash set all right then there needs to be a contains method which takes a value and sees if it actually exists in the hashtag right it returns true or false if it exists in your asset or not and then the third one is remove you need to check if you need to remove a given value from your hash set if the value exists in the hashtag you remove it if the value does not already exist in a hashtag you just don't do anything so these are the three operations that your hash set needs to perform now how you do them is up to you but this is what your your program needs to work with okay so here is a starting point that's given to you all right so here's my uh my hash set class okay i have a constructor where i can initialize my data structure i need to have my data structure right i need to use some data structure to hold on to the data because they're going to be sending they're going to be doing ads a bunch of times remove a bunch of time so i need to track them so this is where i'm going to track uh i'm going to hold my own data structure and initialize it over here here's the add method which interestingly takes in an integer okay so this is a question that you can ask the interviewer if they don't provide you this information what's the data type that i need to be tracking it in right if they say anything well then it becomes a little bit more complicated but in this particular question in this particular lead code challenge the starting point is an integer so let's work with that right i've given an int key and uh i need to add to it the return type is a wide in the starting point so i don't have to return through our files like an actual java hash set does the actual java hash set actually returns true if it if the underlying data structure got modified as a result of the ad if that element was already there and it didn't modify anything then the ad returns false okay so you don't have to do that it's returning white that's cool makes it easier remove ticks in a key and then again it removes it again no return type true or false if it actually modified the data structure just avoid okay so if it did not have the value again you can just return silently and then there is a contains method which takes in the key and then it returns a boolean if it actually contains it or not this thing has to return a boolean otherwise there's no point in the contains okay so these are the methods that you need to implement okay you need to write it so there are a bunch of assumptions in the lead code problem so if the interviewer is getting their source of questions from lead code they might share the same assumptions with you so here are the assumptions on lead code okay first of all there is a range of values it's an integer so there needs to be a range of values so i think the interviewer says these are the range of values that i can add to my hash set okay that ranges from 0 to this number how many were zeros here okay you don't get more than that or less than that no negative numbers nothing of that sort okay that's great but the other assumptions the number of operations will be in the range of one to these many okay there'll be at least one operation and there will not be more than this there will not be more more add remove contains calls than this number okay good to know and then please don't use the built-in hash set library okay that's what the lead code problem says let's say the interviewer asks you this okay now there is one ridiculously easy way to solve this problem okay if you think about it it's one ridiculously easy way you're not supposed to use the hash set library well what about if i use something else can i use a tree set that'll work right i can just stick a tree set in here and then say add add to the tree set remove remove from the preset contains reset dot contains right you're basically delegating your work to the tree set it's going to work fine and if the requirement is just don't use hash set i'm going to use tree set or we'll use some other set implementation it's going to work just fine they're not using hash set right they're not using the built-in hash table well let's assume that the integrators doesn't doesn't want you to do that okay otherwise there's no point in asking this question it's like you can solve this in a minute let's assume the interviewer says no set implementations right so not just has said there is no built-in set implementations you have to write your own set implementation so in that case well there is something for you to do all right so let's say the answer to the question can be use a different set implementation is no okay let's assume no existing set implementation now there is a simple way to solve the problem which is not affection but it will do right and the simple way is leveraging the fact that they've given you a range okay they've given you a range and said it's the number that you're going to get as input there's going to be between 0 and this number they're all going to be integers okay and it's going to be between 0 and this number well then you can actually leverage this fact to implement it a simple way okay the way that works is to just create a boolean array okay they're all going to be numbers okay and they're all going to be within this range so just create a boolean array of size this one which is basically the the range plus one okay because we're looking at a zero index here and i'll tell you why the zero index matters in a bit so now what you need to do is you need to just keep track of whether somebody added a certain number or not so what you need to do is just go to that particular index let's say somebody adds five okay so go to that index five of this boolean array and set it to true that's it if somebody later says okay contains what you need to do you just need to look at that index five see if it's true or false have i said it as true i've said it as true well then it contains it what is remove entail remove basically says go to that index and then make it false you're basically removing it so basically you have like an array where you're it's like a check right anytime some element gets added you're just checking that particular location in that array you're not saving the number itself you're checking that location of a boolean array so that next time somebody asks you for it or somebody who removes that element you're basically unchecking that or making it as false so to remove and for contains you're basically seeing if that is checked or not okay adding a number n is basically setting array of n equals true okay removing is basically setting array of n equals false and then contains is just returning array of n right whether it's true or false okay this is the reason why we had to add one to this the capacity is whatever is zero over here i had to add one because the zeroth index right or you can say n minus one in all of these cases and you can have an array of size one zero zero zero zero zero instead of one okay so simple implementation this is going to be super fast right you're just looking at array indexing so it's going to be super fast no collisions it's going to be faster than hash set okay if we were to use hash set because hashtag kind of does the uh the collisions and all that stuff there's going to be even faster than that but the memory utilization is going to be huge right so here's the uh implementation we're just going to run through the logic and um granted it's not ideal but this is one way you can solve this problem okay so what are you doing over here first i'm going to create a boolean array which is going to keep track of my you know is an element available or not kind of a thing and in my hash set i'm going to initialize it with this number because i know that this is going to be the maximum capacity okay it's a boolean array since the default boolean value is false i don't have to initialize the value itself i just have to create a boolean array and i'm done now if somebody were to add an element to it what i need to do is array off that element equals true right so i'm checking that element in my boolean argument somebody adds something right they can add 10 elements i'm going to check those 10 things adding the same element again is not going to matter because i'm just saying the value is true okay now removing is fairly simple i'm going to set the value as false kind of like unchecking that location and contains is fairly simple as well all i'm doing is i'm returning array off that key if somebody had added it it would be true so this is going to return true if somebody added it and then removed it it's going to be false so this is going to return false if nothing was added there at all it's going to be false as default so this is going to return contains is going to return false okay simple implementation very very simple what's the time complexity yeah so of one right you're just doing a lookup that's constant time it's independent of how many elements you have the space complexity is crap though you're looking at often right if you imagine allocating so much space there granted it's boolean but still it's not good okay so if if this goes if the interviewer says yeah this is a perfect solution then you're good but let's say you have a picky interviewer and they say no this is not what i have in mind i want you to actually implement a hash set yourself okay so in order to avoid the problem first convey your approach right convey well is a boolean array okay i'm this is what i'm planning to do is it okay if they say no don't even go with the code okay if they say yes that's fine but if they say no let's say you have to implement it yourself okay well you can kind of take hints from the the underlying java hashtag implementation and kind of implement it yourself you get better space complexity you have a little worse time complexity because it's not going to be one anymore but slightly better space complexity right so how do you implement your own hash set let's look at the the default hashtag implementation like how does it work in in java you basically have this hash table which is basically a table of values that you're going to have to map each input to okay so you have a bunch of predefined values any input that comes will have to be mapped into one of these buckets okay java usually does this by using the hash code it's going to do a hash code off the input and then it's going to map the hash code to one of these buckets okay now it's going to go to this bucket you get another input if it maps to the same bucket well what you get there is a hash collision okay so this bucket cannot be just one location each bucket is basically a list okay and java typically internally does this by using a linked list okay basically have a linked list every time he has a hash collision say something goes to this bucket it basically creates a link list for this thing add one more goes to the same bucket it is going to add another node to this linked list okay so this is how java implements hash set and you can do that too there is one other thing java does which is it it looks at what's the load factor okay if there are too many numbers that are being input well then it kind of balances it it starts out with a certain number of buckets initially and then if there are too many numbers and then these lists are obviously going to go longer in that case it's going to say oh the number of buckets is not enough let me increase the number of buckets and it's going to reshuffle all of these linked lists all of these collisions so that they go to the new bucket it's going to come up with a new hashing strategy for mapping the hash code to those buckets and then re-adjust and all that stuff so typically in an interview you wouldn't be expected to do the readjusting part but you never know but at least for now what we're going to do is at the very least you need to have some kind of a hashing strategy right create a bunch of buckets okay map the input you know that the input is between one and that whatever billion or whatever is the number right so you map that to these number of buckets and put it into a list there okay anytime you get an input which matches that bucket put it into that list now adding is basically doing that all right so you have accommodated for collisions as well and removing is basically finding the bucket removing from that list contains is again finding the bucket and scanning that list to see if that element is in one of those uh elements in that list for that bucket okay this is going to be the implementation it is kind of simple i'm going to walk you through the code and how i did it uh you're free to change it and come up with some additional ways of doing it again we're not doing the readjusting and all that stuff we're just mapping to a flat list of buckets okay so here's how we implement our hash set we first create a list of buckets okay let's assume 100 buckets i'm going to create an array of 100 and they're not going to contain the element themselves they are going to contain a linked list each each bucket is going to be a list okay so you map the range of inputs to these buckets so you have a billion then you're going to have a billion by 100 as possibility for each bucket and then to handle collisions make a list for each bucket and whatever maps to a bucket you put that element into a list for that bucket okay so here's the approach let's start with initialization code okay so i'm going to create this max value constant this is again a good thing to have because it's an assumption that the interviewer has given you so it's good thing to call that assumption out rather than have this number in your code and it's what's referred to as a magic number it always helps to have a constant so that you know that this thing is max value and you use max value in your code easier to read okay so make sure you do that in an interview so max value is this i'm going to set the array size to be 100 okay this can be different i'm going to just make it 100 just a random you know number that i've picked now how do you initialize my hash set well first i create a parent list okay these are the buckets okay this is a list of linked lists okay list of integer is what the link list is for each bucket so this is going to be a list of those linked lists okay you think about this you basically have a list over here which are all the buckets and then each element here is basically a linked list okay it should it should be a linkless nut and added this let me redraw this over here it's going to be like this okay this is going to be the link list and this is going to be the parent list okay that's what this is this is the parent list and each element here is a is going to be a linked list i've just marked a list of integer but the implementation that i'm going to choose is going to be a linked list okay so my constructor which is where i initialize the data structure i'm going to create a an array list for the parent list okay and the other list is going to be initialized with every size i know what the size is so i don't have to you know have it you know readjust as we add stuff i'm going to force it to 100 elements so the parent list is going to be an array list all right each element in this pair plus is a linked list but i'm not going to initialize that right away all right what i'm going to do is i'm going to make them as null okay i'm going to initialize them as null so that i don't have to do checks later right i don't have to check if the size is met and all that stuff initialize it well i'm going to force initialize it with a bunch of nulls so i have that much elements in my arraylist you know it's basically 100 elements in my arraylist all of them could contain each one of them could potentially contain a list of integer but i have initialized them as a null for now okay so this is my initialization now what matters is what i do with the add remove and contains okay so here's my ad add takes in an integer okay now what do i do here first thing i need to do is map it to one bucket in my parent list okay i need to know which bucket it needs to go to so what i do i do a modular reminder of hundreds this happens to be 100 but i could change it later which is the benefit of using this constant over here now i do a modulo so i can map it to one of the buckets here okay and now what i do is i get the child list basically i see what is the list in that bucket is it an empty list or does it already have a link list it really doesn't matter i'm going to get the child list for that bucket where this key needs to go now if that is null well nobody has been there before for that bucket so i need to initialize a new linked list so i'm going to create a new linked list which is a list of integer i'm going to call it list and i'm going to add this key to this list all right so it's going to be the first element in that list and i'm going to set it to that bucket so this index what i've mapped over here is going to contain my newly inserted list all right so it's going to be like this so i have a bunch of buckets here and here is my element an element maps to i do a reminder to map it to one bucket turns out this bucket does not have anything so i'm going to create a new linked list add this thing here and i'm going to set this element to this linked list okay this there's the first element there okay now if this is null that's centralization that i need to do but let's say it's not there is already a link list there but what do i need to do then i just need to append it to that list but only if that list does not already contain the value okay so let's say i have this thing here i have an element and i map it to this bucket and this bucket already happens to contain a linked list okay now what do i need to do i can't just append it at the end i first need to check if this link list already contains e okay so i'm going to check that this linked list contains this element okay that's what i'm doing over here if it does not contain the element only then do i add it to that child list okay there's a list already there and it does not contain the element so i'm going to add it if it does contain the element what do i need to do i don't need to do anything it's done my job is done all right that is add okay but let's look at the remove use case now again remove takes in a key which is an integer and it needs to check if that value exists so the first thing you need to do is find the bucket okay again i'm going to use the same formula reminder of key divided by array size is going to give me the index where is the bucket where that value might likely be stored and now what i'm going to do is i'm going to get the list that is at that particular location i'm going to do parentless dot get off index now this can be null or it can be a linked list now if it is not equal to null what i'm going to do is i'm going to remove that key from this list i'm going to do integer.value off because i don't want it to remove from a location i don't want it to remove let's say key is 5 right i don't want this removed to remove the fifth element i want it to remove the element 5 wherever it is okay so that's the reason why i convert it into an integer i box it into an integer to make it very explicit i want to remove that object that integer object and not remove that place whatever the value that contains right it's a tricky thing and this is a problem with autoboxing and unboxing with these kind of collection apis right remove is going to remove at that location and it's not going to remove the thing that you're passing in so i'm boxing it to force it to remove the thing that i'm passing okay so child listed remove off the integer object of that value is going to remove that value from child list if it exists if it doesn't exist it's going to not going to do anything right so i'm going to do this if there is a link list in that bucket that i've mapped to it okay if there is no link as well then this is really nothing to do right that element was not added there that bucket is empty so i can just exit quietly okay so that's remove next this contains right i'm given a key i need to check if this contains it again what do i need to do same stuff right i first find the bucket so i do index and then i first get next i get the child list all right now what do i need to do i need to check if this child list contains the key so simple i return if child list is not equal to null i return child start contains of key right i'm using the contains method on this linked list to do the so that i don't have to do this myself okay so this is the contains api and what's the time and space complexity of this thing well it really depends on the table size it is not off n with hash set because it really depends on the size of the list okay so there are ways in which you can calculate time complexity of hash and it really depends on the load factor it depends on the number of buckets that you have for example you can have the number of buckets equal to the range of n and you get off one complexity which is what we did with the boolean array remember we basically had the bucket to be same as the range of inputs and the time complexity was of one right there was no need for a linked list because every element had a space in the parent list itself so it depends on the the bucket size number of buckets and there's a good discussion about load factor and rehashing that you can get into with hash set again like i said hash set has a built-in load factor which is basically the number of inputs and the number of it's a ratio of the number of buckets if it's i think 0.75 is the default load factor if it's if it goes beyond that if there are more inputs that uh you know that kind of increases the load factor well then it's going to um it's going to do a rehashing right it's going to double the number of buckets and it's going to rehash so that's a good discussion to have with the interviewer to kind of convey that you know you know what this is if you don't know this definitely look up how a hash set works but this is your solution to the problem of implementing your own hash set again there are multiple ways in which you can take this right if there is a lot of time you can say how you're going to uh you're going to redo this how are you going to rehash this by increasing the bucket size and then basically picking all the elements in all of those buckets and then doing an ad on them all over again okay so that's one way to rehash um so with this i'm going to conclude this video do check out the playlist for other such lead code challenges in an interview setting videos i hope you found this video helpful and thanks for watching
Info
Channel: Java Brains
Views: 56,052
Rating: 4.9412432 out of 5
Keywords: java brains, java, interview, coding, coding challenge, java interview, java interview questions, interview coding challenge, interview practice, koushik, java interview videos, java interview challenge videos, kothagal, kaushik, tutorial, programming, beginner, java programming, java tutorial for beginners, java tutorial, java programming tutorial, leetcode, hackerrank, java programming for beginners, java interview questions and answers, cracking the coding interview, hashset tutorial
Id: NrMaQL_4Npo
Channel Id: undefined
Length: 24min 46sec (1486 seconds)
Published: Wed Sep 16 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.