LIVE-6: Dynamic Programming & Python in-built data-structures

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi folks I guess we are life let me just check if everything is working as expected just give me a minute just to see things of working all right okay so okay again please confirm in the chat window if everybody is able to hear me clearly yeah I can hear myself playing back so please confirm if everybody is able to hear what I'm what I'm and everybody can see my screen being shared if that's the case I think we are doing good okay so hey folks thank you thank you for joining in since since we are a few minutes early let's wait for everyone to join in and yeah thank you for the confirmation thanks that everybody can hear and see hear my voice and see my screen so any questions I can take a couple of questions before the live session itself starts I'll try to do my best okay thank you folks thank you for the confirmation that I'm both my screen is visible and my audio is audible sounds good we are doing good okay we'll get started in a few minutes so any questions do let me know I'll try to answer a couple of them before we jump into the live session itself so one thing that I have done is the dynamic programming problem that we were solving yesterday I rewrote the code just to make it much more easier to understand I'll explain instead of using an external code resource I thought it's best if I just write the code myself in a much more simplified format of course it's less optimal but easier to understand how often do we get data in a SQL form in the real world again you do get it a few times in machine learning what happens is most companies right if you're working in a large company let me let me tell from my own experience at companies like Amazon Yahoo Labs etc these companies store enormous amounts of data right so this data is I get depending on the data store this data could be stored in a standard relational database like an oracle DB or a my sequel DB or it could be store Hadoop cluster or a spark cluster right so a spark cluster so again if it's loaded as traditional relational DB or even some non-relational non non SQL based DB's most of them have SQL interface even if it is stored in a in a data warehouse right so it's either a traditional relational database or a data warehouse most databases and warehouses have an SQL interface similarly if you are using Hadoop or spark right Hadoop has something called as hive which is very close to SQL similarly spark has something called a spark SQL right even for Big Data technologies SQL is super important because most of the querying languages that are used even on big data platforms even when you have terabytes scale data sets is a small modification on SQL so again depending on the application it's very common to come across SQL based interfaces to data either traditional databases data warehouses or big data stores right so sounds good so yeah somebody is saying Rick it's a good question that you asked till date everyone who completed the course in applied area course have got at least one job many of them have got multiple jobs and some of them joined in a job right after the course and then they switched jobs six months later so I mean this is a good party love to love okay till of everyone who completed all the assignments on time for a player course we have been successfully being able to place them at least in one job in many cases more than one job okay so somebody says which one is better to learn SQL or MongoDB I think you should first learn SQL then you should think of other SQL is a very simple thing and SQL is an extremely generic powerful system with which you can query lots of data stores don't go for this fancy stuff again there is this tendency amongst youngsters even experienced folk sometimes to go behind fancy stuff if your fundamentals are not strong however fancy technologies you learn you are not going to succeed so before you learn MongoDB or any other no sequel database again these are phenomenal data stores they have their own place their there they are extremely useful in some instances but if you don't know SQL I think you should first learn SQL before you go and learn other complex technologies right sounds good somebody's asking how to compute out of bag score in random forests rigorous and again this will take me at least half an hour to explain in this context probably we should do it when we discuss random follows themselves so any good book for probability and statistics probability and statistics as a subject has this fundamental problem which is there are either so for probability and stats you'll come across two types of books you'll come across one set of books that are very strong on theory other set of books that are very strong and practical it's very hard to find something that does a good jump right your standard textbooks at undergraduate level cover theory very well but it's all theory right you don't know how to connect it to practical applications one book that I really like from a practical aspect because I learnt a lot of theory in college but one book that really helped me connect theory to practical stuff is this book called think stats it's a very nice book I really like that book I prefer I've think stats oh yeah so think stats is a is a is a is a simple book or it's also available in PDF freely available you can download it what I like about this book is this book teaches you the basic concepts of probability and statistics using some real-world examples okay whether it's distributions density is all the concepts a very nice book actually so if you want to learn from a practical standpoint think stats is one of my favorite books but the theory in it is slightly on the weaker side but that's okay for an applied for applied stuff you should also learn again I've seen a lot of people who can derive things but you can't connect it to real world so this is one this has always been one of my favorite books it's a slightly old book but again probability doesn't change every now and then right it's been the same for a few decades enough of course there are some advances but the foundations are the same so this is one of my favorite books okay sounds good I think let's see okay I think it's only two minutes past 6:00 what what should we do if you are stuck with a concept somebody says I suggest a book for data structures algorithms I think that's important so CLRS this is the best book that I know of for data structures algorithms in general just google search CLRS I mean this is this is okay don't type CLRS code path and just type CLRS you'll get this book or introduction to algorithms this is the standard textbook at most universities of the world and this book might feel a little theoretical but the way they cover the concept I told you in the edge today's live session also this has been a book that I've used since my second year first semester at undergraduate level I have the same copy I've written I've rewritten I mean this has been again when I taught the the data structures algorithms force course for gate computer science students and also for our interview prep course I covered everything from this book because I think this book is one of the bit at least because I learned from it of course I'm biased I've read other books also over the course of time but this has always been the best book that I have seen from for somebody who is a beginner level if you're an advanced level guy there are other books which are good but for a beginner level I think introduction to algorithms CLRS it's called CLR is because carmen leiserson rivest and stein for computer scientists right I mean it's it's a terrific book I can't I I mean if I had to pick one book in whole of algorithms data structures this will made sounds good guys so let's let's let's get into it let's get into the whole life session where we stopped yesterday I will continue from there again I'm changing the code dynamic programming problem that we saw I when I when I was rethinking about it I felt that the code that that I suggested yesterday I think I think I will have it here right so one second let me just open that for you code walkthroughs okay so yesterday we were looking at a different code right so yesterday I was telling you so I told you about this overlapping sub-problem all of that and i was trying to show you a dynamic programming with this type of diagram again this code I showed you that I borrowed it from leaked code but then when I realized it this explai is slightly on the more complex side again since many of you may not be very comfortable with dynamic programming I thought I'd rewrite this using using the previous code that we saw right so yesterday we saw a simple recursive code right we saw the recursive code I explained it step by step I will modify this code using dynamic programming now right I think that will be easier for everyone to understand right so let's go through that okay so okay where is that okay so today the agenda is simple and straightforward again I'm sorry I complicated the dynamic programming slightly yesterday I'll simplify it now so the focus is the dynamic programming example that we were solving yesterday we'll complete that then one of the big questions in most programming is which data structure to use when again this is a very very important question because you'll come across lots of that there are basically a bunch of data structures right you have lists tuple Dyck set and strings we've already seen strings strings are easy to understand right we have seen strings and reg X's till yesterday there are four other very important data structures which are inbuilt in Python now which data structure to use we're right so that's a very important question so what I will do today is I'll complete the dynamic programming example and we'll see which data structure to use where I'm assuming that you have gone through the previous code session still of and you have basic knowledge of inbuilt data structures I'm going to assume that right if you do not have that again I'm giving you the documentation from from Python itself this is Python version 3 documentation there's a very nice tutorial on all these data structures if you don't know this tutorial is very good it explains a lot of details of course there are also some nice tutorials on geeks for geeks okay sounds good with that let's get started where were we yesterday okay this was the Reg X matching problem that we were solving right very clear question mark matches single character star matches zero or more characters we saw this what did we do we wrote a simple recursive algorithm okay this is this is fairly clean algorithm if you notice how many lines of code it has let's ignore the prints and let's Inc ignore the print statements let's ignore the comments you have one two three four five six seven eight nine ten eleven twelve thirteen it's literally under fifteen lines of code again I've shown you how this works but I have also explained it to you that this code I've shown it to you yesterday that this code has an exponential time complexity right and again that's very simple because if you look at it in the worst case again when you're computing time complexity you have to always think about the worst case write in the worst case if you notice this code snippet here in the worst case a problem of size n let's say you these two are same size n or similar size n can be broken down into two problems of size n minus 1 all right so what happens your recursive equation looks like this okay your recursive equation looks like that a problem of time taken to solve a problem of size n is time taken to solve a problem of size n minus 1 into 2 right because you have to solve two problems plus order of 1 because you have to compute the logical order right and using the recursion tree approach again many there are many strategies to solve this recursive equation recursion tree is one of the simplest again we discuss this in our course videos also so using this approach we saw that the simple code that we wrote has exponential time complexity which is always bad right which is which is which is run at all good so then I said okay how can we improve it right so what I've shown you is imagine if I have my pattern and string my pattern has let's say n characters let's say you my string also has n characters I drew the recursion tree here right and I've shown you that there is overlapping sub-problem what is again first and foremost we have seen that this problem can be solved using a recursion that's very important and I've shown you with this example yesterday again a problem where P has n n characters S has n characters if you in the worst case let's let's write the worst case in the worst case this is this is your recursion tree this problem will get solved if you solve this problem and this problem write this problem you can solve if you solve this problem in this problem this problem you can solve if you solve this problem in this first thing you'll notice here very easily is the same sub problem here is being solved twice once when you're solving this sub problem you have to solve this again this will get broken down into much into smaller parts right so what it what it has is something called as overlapping sub problem why is it called overlapping sub problem because the same sub problem you're solving over and over again right when you see a structure like this when when your solution is a recursive solution with overlapping sub problem this should tell you that you can solve this problem using dynamic programming again dynamic programming is one of the very very popular ideas in algorithmic design and this is one of the most favorite concepts that people ask in product based complete is the interviews right in software development roles again if you interview at Amazon Facebook all of these top product based companies it's very likely that you'll encounter at least one dynamic programming problem and how do you recognize the problem is dynamic programming first you have to recognize that it is recursive and by just drawing this tree you can recognize that it is overlapping sub problem the moment you recognize both of them again this is often not asked for again if you're a machine learning engineer or a data scientist or a machine learning scientist at product based companies problems like this are asked if you have prior experience in software engineering or if you come from computer science background right so I thought I will cover this for the interest of people again if you're not a CS guy if you're not a computer science guy if you don't have prior experience then even if you write this code you're more than good again you should every programmer everybody who is applying for machine learning rules the expectation is that you can write recursive code let's not miss that point recursion is mandatory dynamic programming is not mandatory but again it's not rocket science okay so first and foremost when you have overlapping subproblems the idea is very simple the idea is very simple whenever I am solving a problem why don't I store the problem that I have already solved for example look at this let me show you with this example right so for example let's assume I am solving this problem while solving this problem let's say you first I solved this problem when I solved this problem let's say you first I solved this problem then I solved this problem when I solved this problem okay what if I store the result again solving a problem basically means how to determine whether it is true or false right in our case whatever is the result of it if I store it somewhere if I store this result somewhere next time what happens after I solve again look at look at the sequence to solve this problem first I solve this to solve this I solve this there again bunch of subproblems here that I'll have to solve then I solve this then I want to solve this which is for to solve this again I have to solve this right and I have to solve this this is for this is the fifth sub problem this is a sixth sub problem to solve this enough because I have already solved the same problem earlier and I've stored the result I can just fetch this result without again calling the recursive function and again and again so the whole idea of memorization okay the whole idea of memorization is if we have solved a sub problem once just store the result somewhere so that if you have to solve the same sub problem again instead of again calling recursive function just call just get it from the store that's it that's that's a very simple common sensical idea that's the whole idea behind dynamic programming right so the whole idea here is why don't we store solutions that we have already computed already solved stuff let's store it somewhere one way to store it again there are many ways of storing it one way to store it is in a data structure like a dictionary again in dictionary I'm assuming that you know the basics of Python here a dictionary is like a hash table right it has key value pairs right it has key value pairs and if you search it again what is one of the properties of hash table you can give in a key you can find the value and you can search it very fast or what this is a fundamental property of dictionary right dictionaries you can search very very efficiently in order of one time that's the most beautiful property of of a dictionary that's what they are most used for again in the real world if you ever become a software engineer or if you are a software engineer dicts or hash tables again Python called some dictionaries the standard terminology data structures algorithms is hash tables dictionary is nothing but an implementation of hash table in Python that's all it is hash tables are the most widely used of all the code that I have seen in my life in production hash tables are the most widely used data structures in the industry because they're extremely powerful to search based on the key value in just order of one time again the mathematics behind hash tables is beautiful but it's much beyond the scope of this discussion okay so what is dynamic programming it is basically if you have recursion and overlapping sub-problem that's it and again as I mentioned yesterday we will in our course in our applied records we have covered this concept when we learn back propagation and be planning right so you'll again see that when we go to back propagation okay so now again let me give you one more concept here okay so dynamic programming can be solved two ways in general given any dynamic programming we can solve it using something called as top-down dynamic programming or bottom-up dynamic programming okay there are two ways right there two ways both of them have their own advantages and disadvantages yesterday the code that I wrote or that that that I that I showed you from lead code is actually a bottom-up DP again a top-down DP basically means that I am going to still use recursion I'm going to use a recursion plus I'm going to use some storage to store my subproblem solutions bottom-up DP says instead of using a recursion because the recursion has some disadvantages I'm going to use I duration right and I'm going to use table and I'm good so this is also called as tabulation methods so bottom-up DP in some textbooks it's also called as dynamic programming by tabulation top-down DP in some textbooks is also called as dynamic programming using memorization right or it's also called as memorization based memorization based dynamic programming so yesterday the code that I showed you which was trying to explain with that tape with the table is bottom-up again the biggest disadvantage with bottom-up it pop up bottom-up DP is it is harder to understand it is slightly harder to understand but it is computationally slightly more efficient top-down BP is easier to understand slightly less computationally efficient so instead of using bottom-up because some of your beginners in this I thought why don't we just use top-down DP because it is easier to understand so I've written the code using top-down DP so if you recall this code what I've done actually ok let me go to the let me go to this ok so what I've done is this is the code that we saw yesterday right so I just modified this code slightly ok again this is a recursive we saw this recursion right I just modified it slightly today now let me explain how this works ok let me go step by step okay I'm calling this function E's match DP okay I'm YZ match using dynamic programming the idea is this so whenever I solve a subproblem ok look at this diagram here look at this diagram here whenever I solve is the key idea is this whenever I solve a subproblem imagine if I solve a subproblem like this imagine if I solve is a problem like this how do I store this data if you notice the starting the starting index itself is changing the ending int index stays the same the ending index always stays the same in each of the sub problems you can draw the subproblems the ending index doesn't change it's only starting index that changes so what I'll do now is let's call this as P start or pattern start and let's call this as string start ok let me call them as pattern start which basically means pattern start index and string start index so string start index right so again I have to again I have to store this information right I have to store suppose for example if I solve this subproblem and the result of this subproblem is let's say true let's just say they just say the result of this subproblem is true after I've solved it I want to store it somewhere how do I store it one data structure that I can use here is a dictionary so how will I design my dictionary a dictionary has key value pairs write a dictionary looks like this right a dictionary has key value key value key value key value etcetera right so you have a key and you have a value this is how the dictionary looks like right so what do we store in the key the key should be this index comma this index so I'm creating I'm storing the result of off of this subproblem in a dictionary in a dictionary I'm called I'm naming this dictionary as DP okay DP small DP means it's the name of the dictionary in this dictionary my key is basically a tuple and what is the first entry in the tupple the first entry is the pattern start the second entry is the string start and what is the value in this dictionary the value is the solution true or false right this is what I'm storing in my dictionary now if it's a you might solve this problem how do I store this in my dictionary I'll say in my dictionary I will create an entry 2 comma 2 and I will say the value for this is true and of course there could be other entries also right there could be other entries also this is how I'm storing in my dictionary now when I'm storing this in my dictionary when I have to solve the problem again right look at this I've already solved this sub-problem so I've stored the result in my dictionary DP now if the same problem I'm sorry if the same problem has to be solved again before I solve this problem recursively first I'll check if my dictionary contains this entry 2 comma 2 if it contains this entry 2 comma 2 if I already have the result for it I don't have to again recursively solve this problem right very simple and again why am I using a dictionary here and not some of the data structure because in a dictionary if I just if I'm given the pattern start and pattern so a pattern start and strength start indices if I'm just given these two indices I can quickly search in my dictionary I can remember in dictionary if you give a key you can search in order of one time this is one of the most important properties of prediction Airy or a hash table again that's what we are going to use enough right so let me go into the code and explain your code line by line okay so my recursive function I'm changing it slightly my input is pattern string the pattern start the path the string start and the data structure DP again what is DP here BP is a dictionary pattern start basically means if I'm solving a subproblem with my pattern given that i am a pattern let's from 2:00 to end if I want to match if I want to match to to end and let's say string if I want a string if I want to match 3 to end this is the pattern that I want to match with this string then the pattern start will be to the string start will be 3 right so basically these two tell me from which index in the pattern to the end should I match with the from which index in the string to end should I match it with right again it takes a couple of minutes to get used to it but then rest of code is very simple look at look at what I am doing here I'm saying if pattern start comma string start if it is already there look at this what am i doing first this is very important let me use a different color i let me use red color first given the pattern start index and the string start index i am creating a couple here because i'm creating a couple here using this tuple i am saying is this tuple present in my DP what does the syntax tell me the syntax says he's supposed suppose if i got 2 comma 2 okay I got my pattern I got my whole string I got two - let's say you might go to die let's say you might got a dictionary here now what it says is what it says first thing what that I'm checking here is is have I already solved a problem where I'm taking the pattern from the second index the string from the second index have I already solved it if I've solved it I would have stored the result in my dictionary again remember that DP is a dictionary I would have stored it in my dictionary right so first thing that I check before I even go into recursion is have I already solved this problem earlier so this is the syntax for it right how do you check given a key how do you check its presence in a dictionary this is how you do it if it is already present just to return that again what does this syntax tell you this syntax will tell you again remember that I am constructing a I am constructing a tuple using pattern start and string start because my key remember in my dictionary my key is my pattern start and string start sorry my pattern start and string start my value is the solution to that subproblem either true or false and how do i get the value given a key this is syntax right in in dictionaries again I am assuming that you know the basics of dictionaries you know the basics of lists tuples etc if you don't know please go and read from the resource that I just mentioned at the start of this discussion okay cool if this is there I'm done I'm not even calling a recursion enough right if I've already pre solved the question if I already solve this same problem earlier in my overlapping subproblems look at this if I've already solved this problem and if I've stored this solution in a dictionary I don't again have to solve it and that that's the beauty of DP right that's that's a whole idea of DP otherwise if that is not the case what am i doing I'm just saying my pattern and string that I want to match now is my pattern from the pattern start index my string start index again this whole code if you notice this whole code is very similar all these cases that we have written here look at this all these cases that we have written here are exactly the same as this code snippet actually I've modified this code only look at this all these quote all these cases that were discussed yesterday same cases we have here if P equals to s if P equals to start if P equals to empty or s equals to empty if p 0 equals to s 0 or p 0 equals to star if p 0 is stuck if p 0 not equals to s 0 same cases now all these cases we discussed yesterday so the recursion part more or less stays the same these are the boundary cases these 3 plus this these are four boundary cases and two recursion cases that we saw yesterday same thing here also the only difference between yesterday's solution and dynamic programming solution is whenever we compute any result we should store it in our dictionary and before calling recursion I am going to first check in my dictionary if I already solved the problem or not otherwise this whole thing is the same and remember unlike yesterday look at this code look at this snippet of code the only difference now is look at this snippet of code and this snippet of code it should have what did we do if P equals 2 has just returned true we are not storing it anywhere in yesterday's solution in a simple recursion based solution in DP based solution whenever you come up with a solution you should store it in your dictionary so before you return think before you return this step basically stores the result stores the result in our dictionary which is DP right similarly before you return you should store it in additionally whenever you compute a solution you should store it in a dictionary again here also you you're calling EES match DP which means you have solved this problem that whatever is the solution you should store it first whatever is the solution you should store it first whatever is the solution you store it first so the only difference between this dynamic programming based recursions recursive solution or memorization based recursive solution and the simple recursive solution that we saw yesterday is very simple the only difference here is we are using a D we are using a dictionary to store all all subproblems that we have solved right and before we go into recursion first we check if the sub problem has been solved if it is solved we just return that and before we return anything we store the results in our dictionary and we pass a dictionary also very simple again this is a again if you look at this code actually took me like five minutes to modify this code into this code if you are careful you can do it in like probably even less than five minutes again I am NOT attending interviews nowadays so I have not very comfortable writing this code but it shouldn't take you like if you know the concept of DP if you have recognized overlapping sub-problem this should not take you more than maximum ten minutes even if you're a beginner it should not take you more than ten minutes because the logic is simple right you should first always check if you've already solved it and then whenever you solve a problem just store it in a dictionary that's it it's very simple so it will take you like fight first you can write the recursive solution argue that it is exponential time complexity observe overlapping sub-problem argue that it is order of 2 power n time complexity show that it has overlapping subproblem because it has overlapping subproblems you say I'm going to use dynamic programming and I'm going to use a recursion based dynamic programming and let's test it he is match DP if you tested it works again remember I have to pass a dictionary right look at this what are all the parameters how do i how do I solve this this is my pattern this is my string I want to start when I call it the first time right I want this whole pattern to be matched with this whole string that's why my pattern start index is 0 my string start index is also 0 and initially my dictionary is empty initially my dictionary is empty that's how you call this function that's very important right again it works in all the cases ok same again I'm printing just like yesterday's cases all the case the important thing here is you have to pass this because what is happening here all this whole string I want to match it from index 0 this ugh this whole pattern I want to match from index 0 this whole string I want to match from index 0 and I'm passing a dictionary which is empty at the very beginning beginning mate should be empty right obviously again I have a bunch of cases just to test if everything is working it's working fairly well now interestingly the dynamic programming problem that we solved is often called as a two-dimensional dynamic programming problem why is it called two-dimensional dynamic programming because we have look at look at this look at how our done look at how our problem is how our problem works right your pattern and string are fixed here there are two variables that are changing as part of your recursion there is pattern start that is changing there is string start that is changing in your recursion right there are two variables if even if you if you recall if you even if you just see this even if you see this there are there are two in the recursive call look at this in the recursive call these two values this value increased by 1 this value stayed the same in this case this value stayed the same this value increased the end values don't matter actually the end values actually don't matter I just drew this diagram with end values so that it will be easier for you to understand the end values are always n right but if you notice there are two variables here there are two variables that are changing in your recursion unlike this if you look at if you look at your standard Fibonacci or a look at factorial let's say simpler one right how does your factorial function work your factorial n right again I am writing pseudocode here if n equals to 1 return 1 write return 1 else return n into factorial of n minus 1 right this is this is the pseudocode right again remember in this code snippet you only have 1 you only have one value that is changing this value is changing from n to n minus 1 so this is often called as a one-dimensional DP because there is only one variable that is changing as you go from a larger problem to a smaller problem the two dimensional dynamic programming basically is in our case the length of the pattern write the pattern start the pattern start and the string start and the string start both there are two variables and as I keep solving the subproblems both of these will change both of them will change right because there are two variables that are changing as part of your recursion this is what is called as a two dimensional DP this is what is called as a one dimensional DP again I did not want to introduce the concept of dynamic programming from a theoretical standpoint that's why I've taken this simple problem involving reg X's again remember we remember that like axis is a very very important tool for machine learning right or for data signs or even for any text processing for any text processing reg X's are important and within reg XS reg X's internally use dynamic programming extensively right I want to introduce this concept from a more real-world perspective instead of just giving you some type problem which which doesn't correlate with reality right again if dynamic programming is if you are if you are interviewing for software development engineer roles at product based companies DPS or dynamic programming is super duper important if you are interviewing for machine learning roles not so important but if you come if you have prior software engineer experience or if you come from computer science background top product based companies expect you to write dynamic programming again only top product based come nice right even there are some companies where some very good startups which are saying ok let's not do dynamic programming every time ok sounds good but if you want to learn again here is a very nice blog from geeks we're geeks on how to think about solving dynamic programming problems and there are 50 practice problems again if you just go here there are 50 problems in this I really like this list ok so ok we can always go back ok there is a list and for each of them there is a solution given in some other blog with some nice explanation again there is code in C++ Java but you can always Google for it you can say longest common subsequence Python code you will get it right so those of you who want to practice more dynamic programming this is the strategy but remember the dynamic programming is all about recursion if you can if you can break a given problem into recursion and if you can recognize overlapping subproblems that's it you can solve dynamic programming problems very easily that's all this is the crux again even within this recognizing overlapping subproblems is easy breaking the problem into recursion often is the most time taking right that's why actually even in lot of interviews for non machine for non CSC people also people give recursion programs especially at product based companies to see how they are thinking from a recursion standpoint ok so with that out of our way let's go - let's go to pythons inbuilt inner structures right ok I am making some assumptions now let's make it interactive right so any questions so can we solve factorial problem using DP again you can just solve it instead of DP right if you just solve it using AI iteration right and make it very computationally efficient sounds good where can I find a copy of the resources that is discussed here as always I will put it in the description section you can see it I'll put this whole ipython notebook and a PDF version of it at the end of this discussion so don't worry about it as usual okay next comes my fun part okay so pythons inbuilt data structures again apart from these four there is also string but we have already discussed string in the previous live sessions so my assumption here is that you have basic understanding about these data structures I am NOT going to repeat the basics of them because behind all these live sessions the assumption is that you already know the basics right now the big practical question that you often encounter here is which data structure to use well I'll ask a few interesting questions and would like to hear hear your thoughts okay again quick recap let's just do a quick recap for those of you who may have forgotten some basics again a list is like this a list is an ordered thing this order is important this order is always maintained in a list it is indexable which means this is at index 0 this is at index 1 this is at index 2 and so on so forth it is also a terrible we have discussed what is an i terrible write in in the previous live sessions write anything and I terrible again I've talked about it in the previous live sessions and I terrible data structure is something from which I can pick one item at a time right again this is very useful if you want to do this for I in list list 1 right so when you want to do this your your lists have to be high durable otherwise it won't work so lists in general the simple concept here is they give you an ordered an ordered sequence of objects or numbers or whatever again there's a subtle caveat which is lists can hold objects on different data types but oftentimes we just use it to hold only one datatype oftentimes ok next list is very useful very powerful so is dictionary dictionary you have this this is the key this is the value this is the key this is the value what is the most powerful thing about dictionaries given a key if you give jack but for example imagine this dictionary is in D so if I say B Jack if I just say D Jack this will return 4098 within order of one time so using key values you can such a dictionary in order of one-time as I told you dictionaries are the most widely used data structures in the industry if there is I mean I think 90 plus percent of the times I've seen digital is being used okay it is in my own professional experience okay so dictionary is basically key value pairs this is the key this is the value this is the key this is the value dictionary is nothing but a hash table again there I tell bill I told you what I trouble is earlier they were never ordered again what is ordered basically mean order basically means the order in which you input into the dictionary the same order was not maintained till 3.7 but one of the biggest changes in Python 3.7 is again had I also realized it very recently when I was using some code look at this what is new in Python 3.7 one of the most important things is where is a dictionary where is a dictionary okay look at this the insertion order preservation nature look at this this is one of the most important changes in Python 3.7 the insertion order preservation nature of big disk dict objects has been declared to be an official part of the Python language specification so if you are using a version which is older than 3.7 then the in the insertion order right so the order in which you insert elements in the dictionary is not the same in which you can extract it that's not guaranteed right but since since version 3.7 dictionary is also ordered insertion order right next you have tupple tuple again remember lists are represented in square braces dictionaries in curly braces with the key and value separated using a column tuples are like this if you want to order anything if you want to order K same objects you store you use it to use it you store that information using tuples tuples are immutable which means you can't change a couple just like strings again strings are immutable objects if you want to cry you tupple from this that's great but immutability again that's a very important property immutable those of you were not sure that just google such again i'm also immutable python okay so mutable again you can find tons of resources in this again couple is a standard example of immutable for example look at this double one is this if you want to change this tuple 0 this entry to 4 it won't let you it won't let you do that very simple right tuples are immutable which means once you create this tuple of four numbers if you try to assign couple 0 which means this value if you assign it if you try to assign a value of 4 to it it will throw an error which means you can't edit it like that it's immutable mute immutable means something that doesn't change right so that's an important thing sets right sets are like this sets are just like our mathematical sets they're unordered ok there I terrible there are no duplicates so in a set if you insert 1 1 2 3 that will be equivalent to the set 1 2 3 again there is a concept called as multi sets where multiple repetitions of the same element are allowed but in the standard set that's not allowed they are unordered which means the order in which you insert and the order in which so they do not maintain the order they are I terrible there are no duplicates again there is something called multi sets where multiple entries multiple ones are possible but in a standard set that's not what it says just a quick recap of important properties there are many other properties which are important but these are the key ones for our discussion again the most important lesson in whole of using these data structures is the data structure that you want to use is decided based on what operations you want to perform right it depends a lot on what operations you want to perform and how fast you want to perform these operations for example let's take this example right that we just saw a dynamic programming example right what do I want to do given a sub-problem that i have stored that i have already solved right given us a problem that i've already solved using pattern start and string start i call pattern start string star given pattern start and string start if I have already solved this problem it can either be true or false right if I've already solved it I want to be able to search this information very fast that again you might you might say why the heck did you use a dictionary here why not some other data structure I could have used many other data structures also but a dictionary has this nice property that given this tuple given the pattern start and the string start I can find whether I have solved this problem or not in order of one time because my task here is I want to search if I already solve this problem whenever you have search problems and you want to search fast you very often use dictionaries right again a data structure should always be decided based on what operations you want to perform and how fast you want to perform these operations again I'll show you a bunch of examples right okay sounds good so let's go let's go ahead and ask a few questions so the discussion the rest of the discussion is which data structure will you use given a situation and why I want you to I want you guys to answer this for me okay okay I think this whole thing came into this thing okay suppose okay let me just move back okay so the question here is simple you want to represent a 3d point you want to represent a 3d point what data structure will you use very simple right you want to represent a three point right again in machine learning in linear algebra representing n dimensional points is very common what inbuilt data structure do you use of course there there are also other stuff that you can use but what will you use amongst these four what do we have we have list we have double we have dictionary and we have set amongst these four data structures that is a core of a discussion which data structure will you use to represent a 3d point any ideas and suggestions why somebody says array somebody says couple okay why tupple why Eric white Apple why ra any-any 3-tuple okay tuples again I'm just starting with simple stuff okay couple as it is immutable okay somebody says list yeah why not list why not array why not list what stops me from doing that right why should I store it with a list again one thing that that we all know is in a three dimensional point when I want to represent a 3d point again this is a mathematical notation not the Python notation the x coordinate y coordinate and Z coordinate this order is important right so I can always store it in a couple that's good that's one option the other option is list again I am NOT going into arrays because array is not a basic data structure here let's only focus on the four data structure I can also store it as a list right because list also it's it's ordered list is also ordered how do I pick between a tuple and a list both of them seem to make sense right so any suggestions Oh actually why not why not additionally any ideas on how we can use a dictionary let me go here why not a dictionary or somebody says dictionary again somebody says nd array I knew people would say this so you can use an ND array but remember ND array is in numpy it's actually a numpy class all I'm saying is amongst the four that we have we have list we have dictionary we have set and we have tuple amongst these four which can I use and why so let's let's remove nd address for enough right amongst these four so some of you have said list some of you have said couple some of you have even said dictionary right yeah some of you have said dictionary some are saying couple is immutable hence we should use it okay but he's in the applicator application-specific maybe I want to modify my 3d point okay maybe I want to modify my 3d point if I represent it as a couple I can't modify this 3d point now right if I want to send this to us to some funky let's say you might have a function f that wants to take this 3d point as an input and if I want to modify this if I want to modify this 3d point in this code then you stop will the right idea a pleasing mutable rate so I can't I can't modify it here right so yes somebody says we can use dictionary like this x axis y axis and z axis yes so I think all of us are on the right track we can use any of these three we can use a couple we can use a dict we can also use a list we can't use asset because set is not ordered I hope that is clear right now let's see again whenever you read documentation this is a very important lesson to be learnt ok suppose if I want a function f which inputs a 3d point let's understand how numpy again remember the people who design numpy are brilliant programmers this is the there are some very important lessons that we can learn from people who have designed these phenomenal libraries lot of software engineering that I personally learnt lot of good practices that I learned I learned by reading others could either my colleagues or open source code like this look at numpy in numpy if you look at the number and the added documentation shape is their right shape tells you what is the shape of the numpy array that you want to create right it's it and how we shape represented again I just copied the documentation how we shape itself represented shape itself is represented using a couple of hints remember the people who design numpy are phenomenal software engineers they would have thought very deeply about this choice the shape is a couple because and why did they choose couple here again it is these questions that will make you a better programmer in the long run whenever you see open-source code or code written by others ask these questions why did they choose this data structure here again you cannot learn this in textbooks what textbooks teach you is theory right which is certainly important I am NOT denying that fact but if you really want to learn the hands-on aspect to software engineering or writing good code this is the again this how I learned to be honest with you this is how I actually learnt observed InDesign or this how I learned good programming habits okay the shape if you think of numpy nd array the shape is what you give as the shape of this numpy array right nobody changes this remember when I create an umpire array the shape will never the shape variable will never get changed within this within this function it will never get changed that's why they said let's use a couple because it represents how many coordinates we want look at this up it clearly says how many we want on the first dimension second dimension third dimension etc right so this is again whenever you read code ask yourself why didn't they use this data structure or not some other restructure yet they used shape because shape will not be changed anywhere within this function and it requires an ordered set a sorry ordered ordered it requires order should be maintained so what is immutable and order where order does not get changed it is tuple okay so now one second one second do I have a function here okay okay a small change here okay suppose if I if I want to write a function like this look at this I want to write a function like this a function which takes a 3d point as input a 3d point as input within this function of I want to change this 3d point again there is this I can call this again this is this is the function definition alright I can call this function with with the 3d point right and I expect this 3d point to be changed suppose let's assume I have a 3d point three coordinates right suppose I have these three coordinates and I want to increment this by plus one I want to increment this by plus one I want to increment this by plus one let's let's say this function is increment each coordinate that's a function that's what the function does then what should i what should my data structure be any suggestions list of tuples how will it list of tuples how will that work somebody says list of tuples how can we make that work somebody says list ok vilest list or dictionary okay-y list or dictionary list or a dick can you just give a little justification also please so that we become more fun the discussion becomes much more richer I know why your argument is list or dictionary please just write your justification in just a line the core reasons why list or why dictionary ok everybody is saying list dictionary nobody tupple thank you you got it list or lists okay why lists why do I need multiple lists a list or dictionary all there would be list couple or dictionary okay all three would be how come tupple okay couple is a problematic thing right again you can do it you can do it of Korda pension how we are passing the parameters to the function of what you are returning again it also depends on the structure of this function itself let's just assume that standard Python write whatever you pass I hope all of you know how pass by reference and pass by value works in Python write whatever you pass if it is immutable it is passed again you J so suppose if you if you if you send an object here let's say you send a tuple suppose you send a couple couple is immutable the couple is immutable so whatever is the value here is copied here and you can't modify it inside but instead of couple if you pass a list or a dictionary remember lists and dictionaries are in mute as a are mutable which means you can change them so you can pass a list or dictionary change this list or dictionary and return in the calling function that changes will be visible right I hope all of you know that basic stuff right so obviously you cannot use again let's just try cough right you can't you can you can use a list for sure you can use a dictionary because it is mutable I want to change the point right so I need mutable structures you can't use set because set order itself is messed up you can't use a you can't use a couple because tuple is immutable so you're left with these that's it very simple again you can use elimination also sometimes a good strategy okay so so I've written some with small snippet of example look at this suppose if I pass okay so I oh I'm happy I wrote this code okay I didn't I didn't recall writing this code but anyways good suppose if I pass a couple right look at this T is zero one two I pass this tuple here I say T zero is T 0 plus 1 T 1 is T 1 plus 1 T 2 is T 2 plus 1 and return right it gives you a type error what does it say ok I don't think I can read it here but we can go down here okay so what does it say it says couple object does not support item assignment you can't assign something to a tuple so what is this this is an assignment statement right I'm assigning something to the value of T 0 so tuples do not allow that right so that's the error very simple so that's the code snippet just wanted to show you a code snippet to convince you that yes they are actually tuples are immutable ok again instead of that if you did it with a with a list right so here look at this here my T is a list right my T is a list I'm passing my P okay I'm passing my T here right I'm modifying my T I'm returning I'm not returning T I'm just returning so from the calling function so this T now gets modified if I print T enough look at this if I print so before I call F T first let's let's see let's see let's see what I have done here first I have T 0 1 2 I printed it okay so this print statement printed this I also printed T 0 ok cool got this then I called F T when I call F T what happens the reference to this object T the reference is passed here right are actually a copy of the reference again Python uses a slightly confusing terminology but logically if you think about it whatever changes I do here are are also visible here nutshell that's what it is because P is a is a mutable object right so I let's say you my increment all the 3 coordinates now if I print T look at this the values went from 0 1 2 to 1 three right lids are immutable and ordered that's why when you pass it to a function whatever changes you make here are also visible from the calling region very simple example okay again if you are not comfortable whether about immutability mutability parameter passing there is a very nice post that I've seen on Wikipedia I'm sorry on Stack Overflow Stack Overflow is almost like Wikipedia no it is okay so it says list is a mutable list is mutable he writes a code to show you that list is mutable right string is immutable so you shows you some small snippet of code right so I like this explanation so if you have time please go through this explanation I'll again arguments are passed by assignment I get the term is assignment again people who learnt C programming first we know pass by reference we know pass by value right this is called pass by assignment okay that slightly confusing terminology but it is very similar to pass by reference those who know C programming right if you don't know that's okay those of you are confused or how arguments are passed please go through this it's a very nice post on Stack Overflow I'll share it with all of you very simple code very easy to understand very nice explanation actually okay I closed it okay so let me just start the give me a second I stopped the I stopped the slides okay I stopped the slideshow by mistake by closing that window okay let me just go ahead key properties again I'm starting with some simple examples we'll go to slightly more complicated ones slowly and again throughout this series will see lot of examples okay this is a good question imagine if you were to design numpy from scratch imagine you are the designer of numpy again I really like these types of questions these are my personal favorites because eventually everybody grows up and becomes a good software engineer machine learning engineering imagine at some point you have to design numpy from scratch which inbuilt data structure would you use to design and D array and why I hope the question is clear if you were to design numpy from scratch you cannot use numpy you're designing numpy from scratch does not exist let's assume and you're designing numpy from scratch and you have to use one of the inbuilt data structures for nd array what data structure will you use any suggestions and why obviously why also there are other yeah somebody is talking about multi core environment yes there are other again I'm not even going into more complex like multi-threading multi processing distributed computing I'm not even going there all right so that's so that's a big beast onto itself will probably do one one of those in the in this session also I'll probably do one session on multi-core programming but little later multi-core programming and/or multi-threading we'll discuss that later okay somebody says a list okay list list list list list dictionary would work because of array okay yeah you can make dictionary also work list of list okay that's good you'll use lists of lists exactly what are two-dimensional lists good good we all are together see if you want what is your nd array if you think about it let's think of 2d 2d and DNA again you can do any any dimensional nd array right so if you have lists of lists right if you have lists of lists list of lists can be thought of as a 2d matrix right so lists of lists is what it what you can use to design simple nah nd array now let's go to a slightly more interesting question those of you who are experienced software engineer should take a shot at it ok so good so what will you use list of lists very simple question now comes the fun problem how do you think data is actually internally stored in numpy RS okay so in the previous question there is a subtle difference right in the previous question my question was if you were to design numpy from scratch what inbuilt data structure will you use good next question here is how does numpy actually store number and DRS by number arrays I mean endianness how do you think it actually stores them have you ever seen the documentation of numpy right so again it doesn't store it as lists of lists let me make that clear it does it using something much more much more ingenious again if you are a fresher or if you are a youngster you may not know this but experienced software engineers should take a shot at it how do you think data is internally stored in non Paris ok class yes it's a class obviously non-pay arrays a class yes numpy and D array as I discussed in one of the previous live sessions also is a class but how is the actual data at the end of the day numpy India array has to store a bunch of numbers it has to store a bunch of numbers what data structure does it use to store this bunch of numbers dictionary hash table no no no no no very complex as numpy is implemented static arrays in C or Fortran list is closest yes interesting ok that's a good idea so somebody says ok this is this is a good idea ok let's hear others also hash table dictionary in the form of a dictionary ok Lincoln lists ok let's hear it out key value dictionary key value pairs ok dictionary index has lists and lists as values ok again there is a problem again I know where all of you are going from but let me tell you a problem with the dictionary right let me tell you what could go wrong suppose if I store an NB array as a dictionary suppose I store it using a dictionary and the dictionary will work like this the key will contain the the coordinates right the the basically the index suppose let's assume it's a 2d ok then just say it's a 2d instead of 3d let's assume I have a to D and D array right any value that I want to access I can access using the row number and the column number so this is your row this is your column number that's my key my value stores the actual number actual number right this is this is what I think all of you are hinting at there is one problem with this imagine ok again I'm happy you're thinking what this solves is it will make accessing very fast right it will make accessing very fast because dictionaries are super fast accessible cool I'm very happy you thought about it but there is one problem with this ok so the problem here is this okay let me use a different color okay suppose if I have 2 ND arrays a 1 and a 2 if I want to add them and store the result in B this is a two dimensional array this is a two dimensional array or in other words the matrices this is a matrix this is a matrix right now or this is one operation and the other operation more interesting if I want to resize okay that's a more tricky one forget about this if I want to resize okay suppose if I have matrix a which is 3 cross 5 we have shown in our first live session that I can resize this into a 1 cross 15 resizing resizing is something that is done a lot or numpy RS right resizing I'm just saying changing the shape from 3 cross 5 to 1 cross 15 can this be done fast can this be done fast if I use a dictionary because the data structure should depend on what operations you want to be able to do fast you can do accessing very fast using a dictionary but can you do resizing which is a very very common operation on numpy RS numpy ND RS can you do this fast can we do this fast let's here anyways dictionaries with index as lists and lists okay I know that's complex list of lists with homogeneous functionality contiguous allocation byte like object pointers sequential balanced BST ok why did we get binary search trees here balanced BST why are we using a BST here I can't imagine this anyway that's okay anyway everything has to be stored in binary guys that we know right stack okay how come a stack dictionary makes reshaped logic complex yes I'm happy you got it Prasad Raju yes slicing I think from my deep mind slicing what is slicing I didn't understand the context I don't think numpy is written in Python I think it would have been written in a low-level language yes you're right numpy if you look at the documentation of numpy again if you have time you should actually again there are like list of tuples none of that works by the way okay so the copied as small again there's a very nice ipython notebook called as understanding the internals of let me open this it's a very nice cookbook ok this is a book called this is about ipython all of that but here this post is all about how numpy works internally okay so again broadcasting there tons of techniques here why is again the question here is why is numpy array sufficient i just google search for it and i found this very nice resource i've pasted what is there here a numpy array is basically metadata ok blah blah blah and the actual data this is the fun part as somebody rightly point out all the core underlying functionality of numpy is actually implemented in c c++ ok because c c++ is where you get speed for any numerical computation for any numerical computation the underlying thing has to be C C++ because that's where you get speed from and C C++ gives you lot of low-level control on your RAM or main memory right those of you again if you know tensorflow a very very popular library for deep learning the core of tensorflow is actually written in c++ there is a Python interface to it you can call most of the functionality from Python but all the low-level stuff is actually operating on C++ right so in a nutshell what it does is it gets a block contiguous block of memory ok again those of you who know C you can get a contiguous block of memory using ml ock again if you don't know this don't worry I'm just for those of you who know C again I think all of you must have learnt senior me Tech first here right most of you at least even if you're MC a student I am sure you will not see right so ml lock you can get you can allocate a chunk of memory or a block of memory and that's what is used right because the core everything resizing everything and numpy data num I like for example all the other information is actually stored in a class and all the destroyed is metadata for example the number of rows the number of columns suppose imagine if I have a chunk of memory this is in my RAM let's say within my RAM I have a bunch of space which can store 15 numbers okay some some bunch of memory where I can where I'm storing some 15 numbers right if I want to resize I just change the number of rows and columns the way I access this memory location will change the data in the memory location itself need not change if I use the dictionary like like what we saw earlier if you use a dictionary I'll have to recreate the whole dictionary so this resize operation will be order of M if I have n elements in my ND array why is it order of n because you have to take all the elements here from one dictionary and create a new dictionary let's call it from b1 you have to create a new dictionary called D to restore everything if you're storing it as a chunk of memory it's literally like raw memory that you can access in C C++ nothing changes here the memory doesn't change corresponding to NB array you'll just say for this ND array how many rows are there how many columns are there again these are like attributes that you have in the class ND array again tomorrow session is actually classes basics classes basics of all general programming in Python where we learn this in more lot more detail so earlier I had three comma five I just change it to 115 the memory doesn't change only the representation of the memory changes so I can do it in order of one time right so very interestingly again if you want again if you want to go into the depths of numpy you can spend years learning numpy to be honest with you again you can you can get access to the C code also but it will take you yes but I just wanted to I just wanted to throw this question at you to understand the limitations of in building structures also you cannot use them everywhere okay okay next if you want to write a program to get various combinations this was an assignment problem if you recall when we learn permutations I think in our session to live session 2 2 or 3 I don't remember I think three probably two or three one of these two live sessions right I said you want to print various combinations what is combinations again from permutations and combinations in if I have one two three four five right and if I want let's say combinations of two what are combinations you get 1 2 you get 1 3 you get 1 4 1 5 2 3 blah blah blah blah all these are combinations so if you want various combinations from a given set of numbers given a set of numbers right let's say you know number is repeating that's why I'm calling it a set let's assume you have a set of numbers and you want to create all the two combinations how will you store the results right how will you store the results now suppose you are writing a function right you're writing a function called find Combinator find combinations okay so this is your function okay all this the initial set of numbers is given as a set and the whether you want two combinations or three combinations is given as an integer how do you want to give the result back okay what data structure will you use any suggestions what where is order important Python arrays as there is C concept there yeah at the end of the day numpy is all implemented in c c++ okay somebody says a list of tuples why is it a list of tuples any justification guys so somebody says I will return a list of tuples 1 comma 2 1 comma 3 1 comma 4 and so on and so forth this is what a list of tuples is right sets of tuples sets of tuples sets of tuples what is sets of tuples you have a set you have a set of tuples 1 2 1 3 1 4 so on so forth this is what you'll return ok set of tuples ok so this is this is set of tuples SOT this is a list of tuples ok is that the best way is there something better than this is there something better than this so dictionary again please don't throw dictionary at every problem okay there are some problems intentionally that I have picked with dictionaries couple of tuples okay so tuple of tuples what does couple of tuples look like or sorry couple of tuples 1 comma 2 1 comma 3 1 comma 4 blah blah blah blah couple of tuples ok set what set you have you have set of what see people are suggesting list of tuples set of tuples couple of tuples because you get you get you get multiple answers here right so if you if you notice oh where will a okay so in the previous one couple of sets tuples of set tuples of set okay tuples of set what does tuples of set look like tuples of sets okay 1 2 1 3 1 4 blah blah blah blah blah ok why is it so i mean i would like to hear your set as it holds uniqueness ok sets to some extent make sense but can you please give one-line justification can you please give one-line justification please sets of tuples might be right note sets of tuples sets of tuples that's what somebody suggesting i think we have written a lot of them okay again depends on the application but if you think about it look at each of the results is the order important first let's start with basics if you're returning this is the order important in combinations what is one of the most important properties of combinations that the order is not important right so suppose if you have okay let me show you this slide ok so this order is not important right whether it's 1 2 or 2 1 it's the same thing in combinations permutations the order is important in combinations the order is not important so each of these results that you want to store we want to store it as a couple or a list because at a polar list order is order is important right do you want to store it in a tuple or do you want to store it so think about it think about it it's a very simple one if you think okay sets of tuples again there are no right answers yet but for any answer you have a justified that's important combination does not require order yes so this does not require order so each of these results their order independent so why are we using it why are we using tuples here why are you using tuples or why are we using lists right why are we using tuples we should be using sets because in a set set is order independent right so to store each of these combinations using a set make sense obviously good okay so one idea here one idea here is let's store each of the results in a set again this is not the only right solution it all depends on more complex cases or operations that you want to run right so if I store each of these like this because the order is not in the nod order is not in important next these results in one data structure do I store if I store it in a list remember if I store it in a list order again becomes important right if I store it in a if I store it in a couple the order again becomes important now remember that combinations if you encounter one comma two months you will not encounter one comma two again right because each entry will be unique right you want to print each combination only once right so one way I am NOT saying this is the only right way but one way to think about it is a set of sets a set of cells mathematically it makes a lot of logical sense because why is it set of sets first we are understood why this should be a sent because the order it's independent of the order why this has to be a set because you will not encounter one comma two again any time each Alip each entry here or each object here is unique right so set of sets is one option right for our case again there many the many right solutions I am just forcing you to think of other options that's all okay so now if you look at it okay now let's look at what what professionals have done okay so let's go to a python document this is Python to documentation but that's okay because if you recall in Python when we other example there is Idol tools if you recall right so there is ITER tools which can give you a list of combinations how is the input given here the input is any I terrible remember again this is implemented by professionals this is implemented by professionals in ITER tools the input is any I terrible it could be I can give it as a set I can give it as a certain I can't I can I can give it as a list any I terrible object is okay it could be any I terrible object doesn't matter and why I troubles because they want to iterate over each of this the way they have written the code internally for combinations requires this to be I terrible it could be anything so the ITER tools combinations in the in the input right again mathematically it doesn't make sense but in the implementation you can have elements repeating for example you can give this input you can give this input and say I want combinations of two okay then it will treat these as unique values okay that depends on the implementation now again what does it output ok returns are length some sequences of elements from the input I terrible again you can you can you can again you can the best way to learn is thinking through yourself also reading others documentation and trying it out okay I tell tools let's actually try it right let's why not try it just give me a minute I just open something here blah blah blah okay so let me just open this combinations okay insert cell below okay let's let's write it import ITER tools and again I come from SeaWorld so I have a habit of putting semicolons at the end so the syntax is simple right just okay just print it okay so let's assume ITER tools okay let's just say result equals to i2 tools.com Nations I can pass aya combinations writer tools Road car again you can always type just to get it suppose if I pass one two three four five right and if I want combinations - okay this is my result okay so print what is the type of the result what is the type of the result let's just run this okay it is okay it has a different thing called I tell tools that are combinations so the data type it is resulting is another class called combinations it is not resulting a set of sets now you can go and see how is itit tools that are combinations what is the how is it stored okay I tell tools.com Binet shion's where is combinations again it depends on how it is storing it internally right either tools.com Binet shion's ITER tools.com Binet shion's no no no no okay again we can spend some time on this but just go through this process now we know what is the data type that it is returning it is returning ITER tools or combinations an object of type I turtles or combinations now we can go into ITER tool source code or it--as tools documentation and say how is this data type constructed is it is it a set of sets or is it a set of tuples or it they may not have used any of the inbuilt data structures for optimization alright so again this is a good exercise in general to go through professionally implemented snippets of code to understand more but in the interest of time I'm just moving ahead but just try to see how itit tools or combination stores this right okay we have seen a bunch of questions now I want you I want to take you through a real-world problem okay suppose again this is a very very popular interview a question how do you build a tiny URL service I hope all of you understand what is tiny URL right so tiny URL services this for example I can say I applied a course comm right make tiny URL if you give any long URL it will take a long URL and it will shorten it to this so now if I go and if I go and do this look at this if I go ahead and do this if I click enter it will take me to apply to a course right if you have long URLs called as URL shorteners or tiny URL right so how do you build a service like tinyurl think about it it's a very very popular interview question I mean I have personally asked variations of this question in my own interviews earlier again this requires you to know only basic knowledge of Python and data structures basic data structures nothing complex it's a very popular interview question any ideas how we solve this okay we have about 40 minutes right I hope we can complete it yeah we'll complete it don't worry sorry linkedlist okay all that is okay please print okay you it's returning tuples okay Oh somebody is concerned about this question again the print itself could look whichever way you want suppose if but the data type is the important thing right if I print it as the rest if I print result here okay print dress what happened okay print the result dot I think I think it will have some print again I'll have to look up the documentation I don't remember it at the top of my head but what is important is the data type that's right the reason I print I press is I want to understand what is the data type that it is returning it is returning a custom data type that it has defined not a standard set of sets or tuple of stuff right so back to our question imagine we want to build a tiny URL service okay how do you build it okay let's what data structures when you use how do you build it any any suggestions somebody says dictionary okay why dictionary dictionary tinyurl full URL okay let's ask ourselves what are the two operations that we want to perform what's important right there are two key operations the first operation is create a tiny URL so given a long URL so there - there are two operations that we again whenever somebody says design something you should you should first ask yourself what are the two operations of three operations that are critical here right the first thing here or somebody says print result zipping URL yeah please try it I mean I mean there are many we can always go into the documentation and read it also but anyway somebody suggested it I mean I'm stuck with this so somebody says let's type print dress let's see what happens here oh it's actually storing it as a list of but what is this Aharon doing here okay okay but again the data type is the more important thing as I see it right so because when you print it probably this is how it is stored certainly one way again I'm not saying one way is wrong or right you just have to justify your answer there is no right or wrong I mentioned this earlier also right it all depends on what operations you want to perform beyond this okay so okay back to our question what operations do we want to perform here first operation that we want to perform here is given a long URL given a long URL we want to obtain a short URL that's the first task the second task here is given a short URL we want to perform the long URL fast very importantly we want to because when somebody types you look at how this works right suppose if I go to my browser and if I type the short URL this is going to take me to the web server where tiny URL is hosted the tiny URL has to given the short URL it has to give me the long URL and it has to redirect me to the long URL right so there are two operations that are important so first operation is this operation I enter a long URL it should give me a short URL second operation is when I enter a short URL it should quickly convert that into long URL and take me to that website basically this is what I want okay so where are we okay so these are the two operations very okay hmm okay here okay we are here right so store long URL s key short URL okay one second one second okay somebody is saying let's use dictionaries okay cool but that seems like a good idea somebody says tuple also will see tuples also so in a dictionary you should always ask yourself what is the key what is the value so what somebody is suggesting here is that they will create long URL as the key and short URL as the valuable that's that's the suggestion right but will this work can anybody tell me or someone else is saying the opposite okay is this the correct is this approach correct a is correct or is B correct where you have K a short URL and value as long URL these are the two options that I see here storing this information somebody says red black tree why the heck should I use a red black tree dude tell me red black tree is still order of log n I want the order of one search why should I use a red black tree I'm being honest with you I've seen people saying I'll use a binary search trail use all this why the heck I have a dictionary which is order of one amortized order of one I just go ahead use it why should I use some fancy I have done this in some interviews somebody says suggested let's use a red black tree or an AVL tree I asked him explain me and write the code for deletion from red black tree I actually ask this in a question so some people say okay I will use a red bag I've done this in some interviews again I don't want to convict them I said okay you said red black tree tell me how do you delete a note from a red black tree and write the code for it done the interview is done with that right because it's extremely you have to be extremely careful to write that code okay somebody says option B option B somebody says option A okay so there is some confusion let me clarify that option B is a more sensible thing right because given a short URL we want to convert to a long URL fast very fast because I want to visit my websites first so my my search is based on short URL given a short URL I should get long URL fast right that's important this has to happen very fast so this representation makes a lot of logical sense but now the other question is given a long URL how do you generate a short URL first right they're two parts to it to it right given a short URL you can get a long URL by storing short URL s key so key is short URL I'll just write s URL value is long URL you're done this is a dictionary and your dumb problem sort this is order of one right long URL to short URL how do you design this both ways Oh somebody says both ways okay how long is a short URL how long is the short URL stored for good question let's say you you want to store it for a few months that's a good question that's a very good follow-up question let's say you're given a suppose samples I go to tiny URL let's say your tiny URL wants to store this for let's say at least 90 days at least you said you would wants to store for 90 days that's a good follow-up question but again we will not go into software design software optimization distributed hash tables all of that stuff right how do we go from long URL to short URL that's that's my question so given a short URL we can find the long URL if you just store it in dictionary very quickly how do we go from long URL to short URL hashing hash function of what base-16 encode hashing yes these are certainly some approaches anything simple much more simpler than this regular expressions for what what how do you use regular expressions here I can't even imagine or am I missing something I know how to do it using base64 I know how to use it using some hashing yes that's it somebody gave a very good suggestion just create a random length string problem solved so when somebody gives you a long URL the idea is very ingenious right when somebody gives you our own very good I think you also given good solutions in the previous life sessions so given a long URL just create a random string that's sort of the simplest place again I'm not saying that's the only way of doing it but there are many other ways of doing it base64 encoding yes I agree there are some hashing based solutions the simplest is let's just generate some random strength problem salt okay now let's go to let's go through the code for it let's actually do it right so let's actually do it step-by-step okay cool dictionary her hash table key equals to short URL value equals to long URL and how do i generate the key it's basically a random variable length alphanumeric subs alphabetic subs suffix right I'm using basically or random alphabetic I'm using random alphabets here but instead of making it fixed length I'll make it variable length again you might say why are you using variable length because if I fix the length to only six characters right the number of characters the number of strings that I have here is fixed right imagine if I'm using only lowercase ASCII values right or a to Z basically let's assume I'm using A to Z I have 26 26 power 6 this is all I have right 26 power 6 by 26 power 6 years that's all I have what is 26 power 6 let's actually compute that 26 power 6 what is oh come on come on come on 26 power 6 come on Google ok so this is this is only 30 million web pages too small for the internet ok this this is what this is what 300 million I'm sorry 300 million URLs that's too small if you go to tiny URL web page they say look at this they say making over a billion URLs use usable right they're serving literally billions of redirects per month with billions of URLs right so the idea here is instead of just using fixed length let's make it variable length this length let's assume I'll randomly generate but the length can be anywhere from let's say 6 to 10 they just say variable length but of course the minimum length will be 6 maximum length will be 10 now I can generate billions of alphabetic suffixes or random numbers right let's go to the code of it very simple nothing fancy here is my dictionary where I will store everything look at this I have a dictionary where I'll store everything I'm just inputting random and string because that those are the operations that I will do here is my function which is get me short URL right here I am generating a random integer between 6 to 10 again randint is there in random right randant basically generates a random number between 6 to 10 and that random number is L if I print it I will get it right next I want to generate a random characters into a string of so into a string of length L so what do I do this is a very interesting quote there are many ways of writing it I have written it very succinctly to teach you more approaches so in string dot ASCII lowercase look at this in string dot ASCII lowercase I can get all the characters that are lowercase characters only first I get that now look at this now look at this this line is very interesting now what am i doing I'm saying you know all these characters random dot choice what does random dot choice do if you don't know it just google search read the documentation given these characters it will pick one of them randomly right it will pick one of them very simple okay now I'm also joining look at this I have a string dot join I hope all of you know what join does right it combines these characters it simply combines these characters now before we go in here look at this this part is important so this code will execute first because it is in the parenthesis then comes the joint so random dot choice cars so what this does is it generates a random character from all the characters that I have here this basically is all the characters which are asking lowercase from them and picking one of them but for I in range L so I am repeating this operation I am repeating this operation any times L itself was randomly generated remember again this is a very nice one-liner to generate the short URL again you can write this using a simple for loop if you are not comfortable in this notation you can write a for loop but I wanted you to get comfortable because you will see syntax like this in others could write so my suggestion here is so what what happens I am first picking a random character but I am repeating it all times so what do I get I get L characters right now I'm joining look at this MP string dot join all these L characters so what do I get I basically get now a string of L characters that's what my shot you are LS I just put it again as usual as you know I always print what is happening now now comes the most important part if short URL is already present in my dictionary what does this do this says is there a key in my dictionary already we're in the short URL is odd where is there is there an existing key with this short URL well with the short URL right is there a is there already a key which is equal to the short URL that's what this does if it is if it already exists right the randomly generated string let's say you my l equals to six I generated a random string it happens very rarely but what if what if the same string is generated again right it was generated once I've stored it in my dictionary I've generated again again because I'm randomly generating a string right suppose the same string got generated very low probability let's assume it happen then what do I do then I will simply call this function again recursively I'm going to simply call this function again because the next time it does make same it generates random string the probability that next time also such a such a short URL is present is very low first of all finding it the very first time itself the probability is low but again this is a boundary case that you should always handle lot of people miss this boundary case right this is an important boundary case that you should handle if the short URL that I randomly generated is already there in the dictionary just regenerated just call this function again next time it will generate a completely new random number generator sorry it completely generate a different short URL next time also if it is already if it's doing the same thing then go back again again again this is a very important boundary case that people miss if it is not there just add it this is how you add something to a dictionary right done right and you just print it and you return it in my return let's say you my my my company name is not tiny you are a little bit short URL I'll just take short URL and I'll add it dumb problems all right so let's see let's see how this works okay suppose if I give get short URL applied a course comm again I always print what is there so this is there this is what is there in my dictionary it entered again the random number generated L is 8 it generated a string of length 8 it entered this into my dictionary and it returned this because I have the print statement so I want to understand how my code always works next suppose again if I same long URL I'm giving again look at the same long URL I'm giving again right so most URL so tiny URL right so we just generated something right now suppose if I if I try to say generate again it gives me a completely new you see look at this it gave me this URL because the current URL is WJZ edge right let's just remember that if I generate again oh it gives me the same okay make tiny URL okay it's it's actually searching if that already exists and it's giving me okay so that's a that's the next level of improvisation but let's assume we don't care every time I just generate a random string two random strings and point to the same URL let's assume that's what we are doing of course we can optimize this code further and say if if a long URL already exists in my data in my dictionary I don't want to again create it in such a case you might have to store one more dictionary you might have to store dictionary to where the key is the long URL and the value is the short URL right so this is useful again what do you do here whenever you get a long URL first you search in this dictionary if that long URL already has a short URL that that is already a sign if it is not assigned then you create this random string all right so that's that's the next level of optimization right it's a very simple again an interviewer can say okay this is cool how do you optimize it further then you can say I'll have two dictionaries problem solved okay now how do we get long URL very simple very simple so I'll just say extract so K equals to short URL right so when somebody gives me a short URL the short URL will have short URL com front slash this right so from this I should extract I should just ignore all of this I should only take this part so I'm taking from the 25th character onwards right this is my short URL I just print it and I will return DK problem solved is it does this work is there a problem in this code right I hope the code again here what is the function get long URL given a short URL we should return the long URL right what am i doing given this input this is my short URL right this is what I put on my web browser I'm removing this whole thing I am just taking this this random string and I am searching for this random string just return DK is there a problem in this code can anybody suggest me a bug in this is there a problem in here is there a problem in this code that I just showed you this is my get long URL code right I showed you get short URL code right just a while ago this is my get long URL your basically what are you doing you are just trimming away all this prefix you're only taking the random string and you're searching for that in your dictionary and returning it is there a problem here is there any problem if I use so the short URL doesn't exist good yes that's a problem right so the short URL doesn't exist in such a case what happens if the short URL if case suppose if I give some ABCD let's actually see that right I have written some okay so get long URL I got this oh sorry also I'm just testing my code okay so this input of code I'm just testing okay so if I just modify it like this get long URL mx blah blah blah it gives me applied a course cool very good enough if I give something ABCDEF if I give ABCDEF what happens it gives me an error it gives me something called as a key error now what is a key error this key doesn't exist in the dictionary and where does this whole thing occur return BK right ABCDEF as a key doesn't exist that's why it's given me an error how do we fix it just handle that boundary case very simple just handle the boundary case again this is a good exercise in boundary case testing right you just say okay you first truncate you just remove this whole thing just get the random string you first say if the K is present here return BK else return none problem solved that's it right so always handle your errors carefully now if I give a ABCDF entry okay you have to handle boundary cases if you return a valid valid valid random string it returns as expected okay okay so next okay so let me just set you up for the next live session also so in this example that we have seen we had a dictionary D right we had a dictionary D right we had a dictionary D where we are storing some data and we had two functions get short URL and get long URL get long URL right so these are the two functions that we had now if you realize these two functions are operating on this data that's how we have written our code right now if you think logically all these things if you can somehow put them into a single logical structure imagine if you can put this whole thing again I have some data I have some functions which modify or which use this data right this is like a variable rate this is basically a variable of time dictionary right so I have some data or some variables and I have some functions which operate on this data now if I can somehow group this whole thing logically group them into one one one system logically right or a single block logically then you get what is called as a class again the next live session that we will have is called the basics of objected programming in Python here I assume that you already know what is a class what is again I'll recap some of these concepts I will recap some basic concepts of object loaded programming in Python but I will tell you in the context of real-world problems so tomorrow what we will do is we'll take this URL shortener okay we will take our short URL service so we will create a class called short URL right because that's the problem we are solving it again URL shortening is a interesting real world problem it's not some cooked up Thai problem for interviews it's a real problem again I personally prefer asking more real-world problems like this my personal preference not everybody has that but my choice so we will understand object programming you by first building and class for this understanding in this real-world context that's why I post I've postponed object programming tomorrow's are tomorrow again I don't want to promise tomorrow but the next live session whichever we conduct online publicly to everyone on YouTube we will cover basics of original object hundred programming again I will not be covering all the boundary cases of object programming in Python I will cover what is required in the context of machine learning because two R's is not sufficient to dive deep into object oded programming in Python but I will give you everything that is needed for you to be a good programmer using basic concepts of object programming I will not go into complex object and design all of that stuff but I'll give you an intuition on again what we'll do is we will solve this problem the short URL the short URL from the whatever you have just discussed right now we'll put everything into a class I'll also show some examples from other major libraries in Python on how object node programming is used by these libraries because that's the best way for you to appreciate instead of taking up toy examples right sounds good so that that's the next session but we are not yet done okay one more again I'm just giving you some simple questions here we have 20 minutes right let's solve a few simple questions suppose I have three lists here okay list one list two lists three okay I want to find the common element in these three lists very simple question okay I want to find all the common elements in these three lists now if you notice one is there here one is there here but one is not here three is here three is here three is not here only four is the common element right only four is a common element in this three list now what data structure will you use to solve this problem and how do you do it very simple question nothing fancy so somebody's asking can we do graph theory for machine learning also I think hopefully in this series actually in our course videos we already have covered some of it when we solve the problem of when we solve the problem of social network analysis because we have designed some features based on graph theory algorithms if time for we'll try to do something on graph theory in the future sessions convert two sets and take intersection that's it simple it's two seconds I mean it's literally one line code you can write it in one line but anyway very few suppose imagine if I have these three lists I convert the lists in two sets using this again tomorrow I'll also tell you about constructors all of those concepts also when I when I teach about object programming right but please if you can please revise the basics of oriented programming the tons of resources online you can pick any one of them but I will not teach from a toy example perspective oops Python just Google search for it you'll get tons of resources I mean go to any of these pages I have heard real Python is actually good optional programming Python 3 again oh yeah I've seen some of these examples so just go through something like this ok again I think they're using a simple toy example yeah they're doing you dog and cat type of example we'll try to pick up a real problem or we'll try to pick up real examples instead of toy examples so that you can connect why this is working again just refresh the basics of objective programming so that we can spend more time understanding why it is used how it is used etc right so I created three sets all I am doing here is set 1 dot intersection s2 this will result on another set so intersection s3 all that result I am converting into a list and I'm sending it that's it problem solved literally like this is the keyline set 1 intersection s 2 intersection s 3 convert into list and print why am i converting into lists because I want the output to be in list because I wanted the output to be in list and so I'm converting it to list right very simple example ok so do I have something nothing here ok so this is a good exercise problem right since we have seen some examples I'll explain you how to solve this problem this is an interesting real-world problem suppose you are given a bunch of let me explain the problem I will not write code for this I want you two guys to do it right so take a bunch of documents any text documents if you want just download a bunch of pages from Wikipedia that's ok so imagine if I am given a bunch of documents Tajima have document one document to document three document for so on so forth let's say some 10,000 documents okay I want you to build a very simple search engine nothing very fancy I shouldn't even call it a search engine a simple search functionality wherein I want to enter let's assume a bunch of keywords I want to say let's say machine space learning right amongst all these documents I want to return all the documents which contain the which contain machine learning how do we solve it right how do I again let's assume this is my query 1 let's write multiple queries right suppose my query 2 is let's assume it's like slightly something lengthier so let's assume my next query is machine learning and artificial intelligence right artificial intelligence ok I'm just writing in short here so this is my query 1 this is my query 2 so given this query I want to I want my function he said you may have a function I want this function to return all the documents that contain machine learning again this whole thing has to be there and I want it to return all the documents which contain again sometimes you may not have all the keywords also right so think about how do you solve this I'm leaving it slightly open-ended machine learning and artificial religions I wanted to return all the documents which contain this how do you tackle this any suggestions any suggestions reg X on every file oh cool that's good ok let's say ok good ok so let's say you might have reg X on all files how much time does it take okay so let's say you my have document one document 2 so on so forth some document n let's assume n is large I'm obviously in the real world look at Google search right there like billions of documents suppose my my regular expression is let's say machine learning ok this is my regular expression if I have to search this regular expression in each of these documents the time complexity will be crazy because this document can be extremely long right right this this document can be extremely long this there could be thousands of characters here again the length of this is small but again each of these documents can be very long and the number of documents can be very large right again remember I want to build a solution since the point came up I want to build a solution wherein okay so where would be okay I want to build a solution where I'm going to ask like in Google right I'm going to ask let's say you many many many many queries in an hour ok the solution that I want to build here is let's assume I ask one hundred or thousand queries per our thousand queries per hour this is also called as or okay so companies like Google actually measure in QPS which is queries per second not even hard right but let's say you I have all these documents and I'm getting thousand queries per hour right and I want to extract all the documents given any query wherever this is there now imagine building it using regular expressions that's a good idea but the moment you build it using regular expressions the moment you do it using regular expressions for each query there are thousands of documents document length is crazy so how do I do it now somebody says string methods yeah whatever you do I hope I hope you got the context right if I have let's say thousand queries per our thousand qph thousand queries per hour and if I have let's say 1 million documents you will require like a crazy large computer to do it right so reg X's will not scale to this problem well it's a good idea reg XS will work if the number of documents is small and if the length of each document is small and if the queries per R is small then it will work but imagine if each document is of a page large page and I have millions of documents and I get thousands of queries per hour then what do you do somebody says dictionaries how do you use a dictionary here how do you design a dictionary can tell me can somebody tell you what is the value what is the key in the dictionary here please I've not read all the answers of course it's impossible for me to read everything somebody says indexing what indexing database indexing what indexing because indexing is a very general term right there is indexing concept in database also indexing using sorting right indexing using V+ trees no no I'm again somebody says we want to maintain ranking of pages we are not going to that level that's the next level I'm simplifying the problem right of course there is a if you want to go into the search algorithms that's a huge area onto itself I'm only solving a small sub problem in the search right so that's a huge area why do you want to go to tf-idf let's I mean what okay you compute tf-idf then what do you do and somebody says we want to convert it beautiful soup Joby and a beautiful soup Olson will lose wrecks like stuff only somebody says bag-of-words and then such how do you search so some of you have suggested ice please don't jump to tf-idf right because he is machine learning value is yes okay so what see what do I want what is my input output let's not forget that right my input is my key my input is my search search keywords my input are my search keywords my output is a list of documents a list of documents that contain the search keyword I don't care about the ordering if you want to order them that's it that's a different problem I don't care about the ordering I only am saying give in a search keyword just give me the list of documents again Greg B is nothing but your regular expressions only no so it or no grep also won't work indexing is key with dictionary okay what is key what is key again some of you're suggesting dictionary right what is the key and what is the value in the dictionary I didn't understand that somebody some of you are saying inverted indices again what again an inverted indices is also a dictionary right at the end of the day it's a dictionary it's a fancy dictionary that's it so imagine again let's I understand that inverted indices are is the answer but let's try to instead of giving names let's try to understand using that using the concepts that we already know because most people may not know inverted indices right using dynamic programming what do you do with dynamic programming guys if you if you tell you a dynamic programming and say what is the recursion what is the where is the where is the overlapping subproblem somebody says locality-sensitive hashing we did locality-sensitive hashing come here we are not finding a nearest neighbors no here somebody says using graphs to store guys I think you're going everywhere ok let's answer the question very clearly ok I mean I think all of you are like going in like every direction possible throwing everything at it ok let's think of a very simple thing imagine dictionary sees this is one way of solving this problem I'm not saying the only way let's say human dictionary sees one way of solving this problem what do you store as the key what do you store as a value huh key words in the document value number of types present number of types word what is types word dictionary unique words value list of documents simple very good I mean again multiple many of you have suggested this so I can't do one person credit very simple right look at this my dictionary will be my key is the word each word individual words this is one way my value is my value is basically a list or a set into the Flitz I can also store it in set no I can also store it as set of documents which contain this word so first what do I do look at this this is very important first what do I do I go through each word I go to for each document first I read each document for each document for each word for each word in the document again I'm not just writing so for each word you enter first you search if that word is present in the dictionary as a key if it is present go to the set of values go to the values and this document every document give it a number let's assume this is document D I just add it to this set so what are you doing here first you're pre-processing first you're pre-processing your whole text corpus corpus basically means your whole text data right or the set of all the documents that you have first you are pre-processing your whole corpus right once you just do it for each document for each word just find that word if that word is not present in the deck in the dictionary create an entry and create a set of documents so this will say for this word let's assume there is a word machine this word is present in document one document six document twelve so on so forth right first this is called pre-processing is the first stage then there is a second stage in the second stage you can say given any query enough suppose I am given a query which is machine learning right machine learning first I'll search for machine right in my document I'll search for machine so when I search for machine I'll use this as the key I'll search in this dictionary it will give me a bunch of documents it will give me document one document six document twelve it will give me a set of documents then I'll search for learning this will give me a set of documents right now let's take the intersection of both this because we want documents which contain both machine and learning so this will give me a list of documents let's assume it gave me document six and document well one solution is return this as your result now the problem if you returned this as your solution is there is a problem this is one way of doing it the problem with doing it is ud6 could be a document which has machine blah blah blah then intelligence in a different context machine and intelligence are not next to each other so I am not finding this phrase what is the phrase phrase is basically a sequence of words so this machine is there somewhere in this document they take him document six machine is there somewhere in the document learning is there somewhere in the document but it is able to find this whole document cool not bad this is a first cut solution this is your first cut solution sorry this is your first cut solution again this is very simple you just write a function called pre process construct this dictionary second you say query give these words given this find the set of all the documents in which machine is there find the set of documents in which learning is there take the intersection return but you could have this case also so what do you do instead of just making your key okay that's a problem right so what do you do so let's say you might have document one I have word one word two word three word for word five so once a document is basically a sequence of words right so what did we do till of we said let's take each word and let's create in our dictionary in our dictionary we said word one right word one corresponds word one has a document one and other documents in which word one is present then in our dictionary we have word two and all the others that are present this is the key this is the value okay let me write it instead of writing it in this notation okay so this is the key this is the value right this is this is a Python notation right then word three sorry not d2 d1 I am sorry d1 so it its present in d1 and other documents also this is also presented d1 and other documents so here your your your key is basically single individual words individual words in nature language processing are called as one grounds right so again we'll learn this when we learn NLP in future sessions but instead of just using one word why don't I also create two words so what if my dictionary in addition to containing only one words it also contains word one space word two as my key so these are called as two grams right these are called as two grams so if my dictionary contains one grams which basically means individual words with the values containing the document number two grams which means I'll take w1 w2 and I'll add a d1 corresponding to it I'll also take w2 w3 all all all pairs of sequential words so w-2 space W 3 again I'll make an entry of obviously when I do 2 grams the size of the dictionary will become very large again actual search engines not just to do one Gramps and two grams they do 3 grams 4 grams even 5 grams so the size of the dictionary becomes very very very very large when the size of the dictionary becomes large ok think of think of something like Google right the dictionary will be in terabytes or probably even larger right you cannot I'm sure Google dictionary will be more than terabytes if this dictionary is too large you can't store it in the RAM of a computer you can't store it in the RAM of a single computer then what do you do you take this dictionary you store it in a distributed fashion in the big in the RAM of multiple computers that is what is called as distributed hash tables those are called as distributed hash tables or distributed dictionaries those of you of where we've heard of this term again if you know just wanted to give you a context here right so there is this very powerful open source tool called Redis distribute sorry sorry not ready Sam sorry what am I saying this is called what is it called one second I have to Google search I forgot its name there is a very powerful open source inverted index again this is what is called as a leucine is one but there is a better one off leucine is an old one come on come on come on applications compression okay there is a one second I can just go to Amazon AWS aw is such cloud search what does cloud search use apart from cloud search there is one more service product compute containers blockchain analytics feature surface says come on come on come on I'm not getting it it's called let's go it's called something huh elasticsearch yes thank God yes it's gonna elasticsearch so there is an open-source tool called as elasticsearch right which a very powerful tool it's available on many many many many cloud computing platforms like AWS etc which give you a very powerful way of constructing these and distributed hash tables distributed hash tables for such distributed hash tables for the search again this this type of data structure where you are taking the words and you're storing the documents in which the words are that this type of this special case of a dictionary is what is called as an inverted index this is what is called as inverted index search engines actually use a lot of inverted index right again the way to think about it is elasticsearch is a distributed inverted index based system right there are also distributed hash tables there are distributed hash tables there's something called as Redis and man cache right they're very very powerful this the very very powerful distributed systems distributed systems that can give you like a hash table but distribute across multiple computers so elasticsearch gives you inverted indices while Redis and memcache gives you a general-purpose dictionary or a hash table which is distributed multiple computers anyway so that's that's I just wanted to connect the dots for you most importantly I want you to build a simple system nothing fancy okay try to write code wherein you take some 10 documents just copy the text from just copy the text from Wikipedia take some 10 Wikipedia articles copy the text so first you should write the pre process first you should have a dictionary D you should have a dictionary D right let me not write this notation so you should have a dictionary D in which you're going to store the way I just mentioned key value pairs you you given a query you want to search for this the way that I just explained try to do it with what grams and two grams this is not complex given that we have seen how to process drinks given that we have seen some examples on how to use dictionaries I think this is a good exercise it's also very satisfying for you because you're doing something that is non-trivial I understand this is non-trivial but again if you sit on it for a couple of hours you should be able to do it right sounds good I think I've overshot by a few minutes yeah thank you a Google bigquery yes yeah just like elasticsearch Google has an internal tool called as Google bigquery yes that's true that's true okay folks thank you very much so next live session just wanted to make that clear I hope I have a slight no I don't have a slight okay so the next live session is basics of Audion root programming from a machine learning perspective right so see you see you in the next live session I'll make the I'll make I'll make the announcement on YouTube okay thank you very much for joining in again have a have a safe time I hope everybody is staying safe it's very important that all of you are safe right so please stay safe
Info
Channel: Applied AI Course
Views: 14,620
Rating: 4.9159665 out of 5
Keywords:
Id: VjgLdhcmB7I
Channel Id: undefined
Length: 127min 52sec (7672 seconds)
Published: Sun Apr 05 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.