EFFICIENT COUNTING USING BITMAPS FOR SYSTEM DESIGN

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone my name is Marian I need the session let's learn about quick maps and counting using bitmaps suppose I have a system like Spotify where people log in to the system and listen to songs and now I have gained about 1 billion users really yeah 1 billion users and now I want to understand some of the key metrics ability to my system first one is daily active users and also daily inactive users for any given day or n number of days in the past maybe could be for a month or a week or something like that and I'm also interested in learning the same key metrics categorized by device type or operating system or platform type now how do I do this the key thing we need to consider to solve this problem is I want to solve this problem very memory efficient and also lightning-fast and how do we do this as a programmer I think ok let's use hash table and set users key as a key of a hash and set different values maybe I maintain multiple hash tables per day and then save this information so that I can access faster perform faster lightning-fast right yeah it could work or if I'm thinking like a data engineer I might be thinking streaming technologies are I might be dumping all of the data into some data Lake or some database where I can access all of this data faster and then query and then show the information and now let's explore some different solutions to solve this using very less memory and also high performance how do we do this so before solving that let's understand why it takes more memory so I said that I have 1 billion users ok 1 billion leads to 1 and 9 zeros in front of it ok so now 1 billion users if the key of the user is you know integer whatever then one he takes four bytes okay let's take 4 bytes then how much data we need just to save all the keys in the memory in the hash for all the users which I have in the system so 1 billion users I have so one key requires 4 bytes in total I will be needing four billion bytes ok so that would lead to hello mega and Giga so four gigabytes of memory just to save all of the users keys in one hash table if you're using one or more hash tables then these keys will multiply by it those many number of hash tables and I'm not even talking about the values memory at all so even if you're just saving another four bytes in the value it would double it right so I don't want to spend this much of memory to solve this problem and how efficiently we can solve that's where bitmaps come to rescue so what are bitmaps bitmaps also called a spit array is a data structure it just holds the bits or it's just an array of which however you can think so to give an example what is a bitmap is just a array of bits so if this is a bitmap how much memory is consuming so it is basically consuming seven bits only it's not even consuming one byte if I just add one more here it's now consuming eight bits which is equal to one byte of memory only so this is how this data structure is so much efficient now every bit can save some information right it can save two states either it's 0 & 1 this data structure can help us to solve the problem which we just discussed and how if you ask let's discuss that suppose I want to calculate active and inactive users per day then all I need is the data structure called as bitmap how many bits we need is it we need exactly the total number of users we have in a system so we had 1 billion users and now I need a bitmap with 1 billion bits in it so let's take on bits 0 0 0 0 0 0 0 0 0 0 0 ends in 1 billion ok last 0 total we have 1 billion zeroes now how much memory is consuming now we have 1 billion bits or 1 billion bits so let's consider in two bites we just need to divide by eight we get approximately 125 mega bites okay so we basically get 125 megabytes of memory so earlier we actually got just to save the keys in the hash map we consumed about 4gb of information and in this case we are just consuming 125 MB of enough memory to save that information so the same information now how do we solve that problem over here so now we have a bitmap it all zeros in it so we have 1 billion zeros now each bit in the speed map actually maps to a user so this is the zeroth bit and this one is the first base second base so now the mapping is also safe so this bit is mapping to user with ID 0 this bit is mapping to user with ID 1 this is 4 2 3 and so on so this one is mapping to the user 1 billion user or user with ID 1 billion so if there are more users say for example 1 billion 10 users then we will need 1 billion and 10 zeros in our data structure so that way we have one bit assigned for every user now what we have to do is very simple if user has logged in today we just need to set this bit maybe in real time or maybe when you you know go through the data which is in the hard disk or somewhere we just need to set this particular value if the end user is logged in then we just need to set that nth bit to 1 so if user with ID 2 is logged in then we just intercepted second bait in this bitmap to 1 if the user with ID 5 is logged in then for 5 so we just need to set this v babe in the bitmap to 1 so how this will help is so suppose if I want to calculate the total number of active users this day or today all I have to do is go through this bitmap we just have 1 billion bits and count all the bit with value 1 it's that simple in this case consider how many we have the user who is logged in five is login and whatever it is so we're in account all the ones it actually gives us the total number of user active to day okay so now this map is giving the information of all the number of users active today and how do you calculate all the users which are inactive or the user didn't log into our system they just need to count all the total number of zeros in this bitmap not easy so we just consumed 125 MB of memory and then we were able to get the total number of users active today and total number of users inactive today and now you might be asking question in your requirement you said that I want to calculate all the users who were active last end days then how do you saw that for that we just need to see you keep on saving these bitmaps in the memory so this is for day one so I call this as d1 so if I want to calculate all the total number of users active in last three days basically every day I will be calculating the bitmaps and then I keep saving these bitmaps basically and we say 125 MB of the data every day so this is for the day - this is for the day three so suppose on the day two so these were the users logged in and on the day three everyone logged in high-poverty then we just need to calculate all the total number of users active in the last three days okay or maybe if I want to calculate give me all the users who are active consistently across all the three days all I have to do is just do the end operation on all this bitmap we just get that information so how do we do that so this user was not active so 0 0 1 gives us 2 0 this user were consistently active from last 3 days 1 0 0 0 0 0 so we got to know only this user was active all the three days so now if my is something like this instead of finding all the users who are consistently active from the last three days if I just want to calculate you know give me the total number of users who are active in any one of the day of in the last three years then all I have to do is do our operation so one one one okay this is like one okay one one one it means that basically everyone was active because you know on day three everyone was active so obviously everyone is active from last three days suppose if this user was not really active on day three as well so now this bit we could become zero so when you count you basically have one two three four five six just fill this data you can basically six users for active out of seven users so that's how you can calculate different metrics by just using this information which is there in the bit arrays so other thing other requirements which I asked was okay how do you calculate active users for device are kind of not wrong in that case what we have to do this will have to calculate or save different bitmaps for everything say for example I will have one more bit map so okay day one in Mac OS I'll be having one more bitmap just to stay with that information I can have one more bit man say Windows and D one on the day one and Linux so I can haul of this so by doing this we can get different KPIs Power Platform or operating system so it's about saving information in different bitmaps and then doing and or extra operation to get you know some more metrics so now how much data we are saving for this information we are today with 125 and be 125 V so even if you are saving about what 10 so we are consuming 10 different bitmaps we are consuming 1.25 GB only to save all of this information so if you're reduced hashmap just to save the key in the hash mark it's we were consumed how much 4gb and I'm not talking about the values at all to get all of this different information we would have saved more values right that would have been definitely gone to more than 20 GB or something like that so by using bitmaps we are saving a lot of memory and the computation is much faster because we can save all of this information in the in memory so we can complete it much faster so you might be thinking how much time does it really take to calculate all of this information when we have these bitmaps saved already in memory or somewhere in the computer so I was not able to really compute a bitmap with 1 million bits in it so what I was able to get some information suppose if I have about 200 million users then I will be having a bitmap with 200 million bits in it so in that case if I want to just calculate all the active users in that bitmap it took me about 100 milli seconds only so it's that fast ok even if you have 200 million users just to get all the active users for that day I you know I was able to get that in hundred millisecond then all of this information all of the different metrics will be running in millisecond or you know less than a second so it's that cool so next time when you have some problems to solve definitely try to consider bitmaps if you can swap using bitmap then you don't need to spend a lot of time writing hash map I now say we're spending a lot of memory and then processing part maybe you can simply solve using bitmaps so if you liked this video please like share and subscribe I think I also learned something new and I guess you guys - thank you
Info
Channel: Tech Dummies Narendra L
Views: 13,719
Rating: undefined out of 5
Keywords: Amazon interview question, interview questions, interview preparations, algo and ds interview question, software interview preparation, developer interview questions, Facebook interview question, google interview question, Technical interview question, software architecture, system design, learn System design, bitmaps, counter usingbitmaps, count distinct bitmaps, efficient counting bitmaps
Id: 8ZgRW0DNus4
Channel Id: undefined
Length: 12min 54sec (774 seconds)
Published: Tue Apr 21 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.