Live Day 2- Bag Of Words, TF-IDF, Word2Vec NLP And Quiz-5000Inr Give Away

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello guys i hope everybody is able to hear me out just give me a quick yes if you are able to hear me out and i hope everybody is rocking you know and i hope you have seen the study session that is the day one session today again we'll be having the day two session and again we're going to have a lot of fun because we'll try to learn in such a way that uh everybody understands and the agenda will be a lot of things that we are going to see today and understand one thing is that today also we are going to have a quiz where again i'll give you 5 000 rupees so please make sure that you watch this session till the end be with me till the end then only you'll be able to win okay because again the quiz competition is somewhat like we have done it like yesterday you know so i hope the winners may be here uh but it was quite fun right so yes do hit like at least make 200 likes before we start okay now let me go ahead and let me share my screen and we will go with respect to the agenda that we have planned for okay okay so here we go i hope everybody is able to hear me you see my screen so the day to agenda uh we will go with it so day two nlp for machine learning and deep learning okay so what is the agenda that we are going to cover today uh we are going to focus one on something called as text pre-processing now when i talk about text pre-processing yes cleaning the text making it as a lower sentence and all is there right um yesterday notes are available in the dashboard the link is given in the description you have to login into the dashboard because all the materials i'll be putting over there itself along with the video link okay now if i talk about text preprocessing mainly yeah cleaning of the text will definitely be there okay but if i talk mainly over here is that from here we are basically going to use this words and convert into vectors so what are the different ways to do it that is what we are going to discuss about there are multiple ways that we are going to focus on one is one hot encoding one hot encoding so we will go and understand what is one hot encoding the second step that we are basically going to see something called as bag of words which we also say it as bow the third technique uh after the bag of words that we are going to see is something called as tf idf we're going to understand how all this actually happens okay uh tf idf you're going to see okay and in tf idf this is nothing but it is basically called as term frequency and inverse document frequency this thing we are actually going to see there are some disadvantages uh there is also one more technique that we are going to focus on is something called as word to wack okay now understand one thing guys this entire thing i will be covering it today okay along with practical implementation okay we will also be doing lot of practical implementation okay everything today we are going to start with a library which is called as nltk an ltk library is a very important library with respect to nlp okay so we can basically do all these things and then finally after all these things gets over in the second main thing we are going to have quiz okay and everybody i think you can participate in the quiz itself okay and uh this quiz will be live okay the winners again the first price that the person will be winning first price will be 2000 rupees inr whoever will be the first price second price will be 1500 1500 or 15 000 1500 rupees inr and the third price is basically 1500 rupees in r so second and third are one and the same overall we are going to give somewhere around five thousand rupees i you know okay so definitely if you have some of your friends whom you really want to call for participating in the quiz you can definitely call them okay but this quiz will be with respect to all the things we have actually discussed over here okay so all those things will be actually completed and we'll try to learn in this specific way okay so i hope everybody is clear mohan krishna says hi krish because of your previous live streaming i got a job with 22 lakhs per annum as a data scientist amazing congratulations i'm always happy when you are doing some amazing work right so super super happy super super happy okay so congratulations for the people who have actually done it and who have actually cleared it that's quite amazing okay okay now uh just give me a go ahead like everybody's ready you can take out your paper pen whatever you want and we are actually going to do a lot of things okay and again uh the intro will be having some kind of intro as we go ahead and okay one more topic that i specifically missed in bag of words we are also going to learn about something called as n graphs okay n grams so this part also we are going to cover up and we are going to understand all these things okay so yes i hope christopher says please say sudan sir to stop the class okay classes is going on guys not to worry you can you know learn from both of them okay okay let's go ahead uh first of all we really need to understand some of the terminologies okay so i'm just going to write it over here basic terminologies used in nlp so you everybody should know this probably this is the basic thing that you really need to know whenever we basically start okay now before i start this please hit like and let's start the session okay the first terminology that we specifically use is something called as corpus okay the second that we use is something called as documents third that we specifically use is something called as vocabulary vocabulary okay so these are some of the basic terminologies that everybody should know and if you know all these terminologies then it is always good because through this you will be able to understand uh what i'm actually talking about okay and this is important because whenever we discuss about this some or the other way you will definitely come to know that okay i'm talking about this specific thing if i take a specific data set so some of the important things are there and one fourth thing that we are also going to see is something called as words okay so most of the nlp task you know whenever we are talking about data sets and all we can actually divide this with respect to these things okay now let's say that i have a sentiment analysis uh sentiment sentiment analysis problem statement sentiment analysis problem statement now in sentiment analysis what will be there you will be having a text and you will be having a output okay sentiment analysis basically means that whether it is a positive sentiment or negative sentiment okay all those things will be there now if let's say my first data point let's say this is my first data point okay like d1 okay and let's say over here uh i i will just say that the food is good okay so when i basically say the food is good over here let's say the output is 1 because it shows a positive sentiment okay let's say my d2 is that here i'm going to say the food is bad so my output will basically be zero like this i may have lot of different different documents or i can i can say different different data points okay i can say that pizza is good pizza is is amazing so again a positive sentiment okay then again if i talk about d4 i may say that burger is bad okay so here i'm going to basically write 0 now let's understand all these words that i have actually told you okay so here we basically have corpus we have documents we have workable vocabulary we have words now whenever we have this entire data set suppose if i combine all this data points if i combine all these data points then that specifically becomes corpus okay corpus basically means that you can also say that this can be called as paragraph and paragraph you know that we really need to have many sentences involved in that so in this case i will combine all these data points and if i group this data points like d1 d2 d3 d4 right or whatever data points i have okay now in this specific case what will happen if i combine all these things this will become actually a paragraph and this specific thing we can basically called it as corpus okay now coming to the next one what is documents documents is nothing but it's just like sentence okay sentence why you don't say it as sentence just for understanding purpose we basically say it as sentence but in nlp we specifically use it as document so documents this is one document this is the other document this is the third document this is the fourth document like this we have lot of data points and you know all this will be separate separate documents and sentences okay so here specifically d1 will be one document d2 will be another document and all okay so uh this is one point that you specifically have now if i go with respect to vocabulary vocabulary basically means what suppose if i consider a dictionary book let's say in a dictionary book how many words may be present okay let's say there are how many unique words may be present see over here this number of documents that may be available in this data set maybe millions and millions of data set right but how many number of unique words are there let's say that inside this data set i have 10k unique words 10k unique words okay so when i say 10k unique words that basically means inside this entire data set or inside this dictionary that i am going to consider let's say i am going to consider this dictionary book which has 10k unique words i know that is not possible because we can have many number of unique words right but let's say that in this dictionary i actually have 10k unique words okay whenever i say this 10k unique words this actually becomes a vocabulary that basically means i have 10k unique words in this vocabulary okay 10k unique words in this vocabulary okay so the total number of unique words is nothing but it is called as vocabulary okay now coming to the word section i hope word is very much simple so whatever words is present inside this unique word i can separately take this as a separate separate word okay so this is one of the thing i'll say it as words okay so i hope you are able to understand this basic terminologies that is actually required in nlp whenever i say corpus the first thing that should basically come in your mind okay there is some kind of paragraph where you have combined many many statements okay if i talk about documents there is something like sentence sentence is definitely there okay sentence data points or you can say data data points right now when you have vocabulary then that basically means how many number of unique words is present inside this data set okay there are there 10 000 words are the 100 words are they 200 words are there 400 words and all if you come to the next one that is specifically with respect to word okay now usually this is my data set okay so i have my data set let's say i want to create a model okay which is called as sentiment analysis model okay by just getting this text you should be able to predict whether it is a positive sentiment or negative sentiment or an average sentiment right now i am just considering a binary classification now from the data set what all the steps that we usually perform in an nlp task so if i have this specific data set the second step that we specifically do is something called as text preprocessing now in text preprocessing what all things we can do okay suppose let us say this is text pre-processing one so the first thing that i will definitely do is that lowering the let's say if i have a data set which is in the form of paragraph or sentence i may basically do tokenization okay then i may add some more things i may add some more things like after doing tokenization i mean lowering the words lowering the use lowering the case of the word like it may not be a capital letter it may be all all the words will be a small letter so lowering the case of the words okay i have to make it small and we can basically do these are some of the basic things let's say this is my text preprocessing one then i go to my next step which is called as text preprocessing too okay text pre-processing two two in this step what i perform i make sure that i do something called as stemming second step that i can actually do is something called as limitization okay so stemming and limitization one thing is there and i can also use something called as stop words i'll talk about what is stop words okay everything will be covered up in this okay so stemming slammatization and stop words okay i will i will explain you this in practical also now this is done with this text preprocessing step two now what about the next step in the next step what i will focus on is that i will now focus on converting these words into vectors converting this words into vectors so after text preprocessing you will be able to see that all the data will be clean okay and now my main aim will be that i'll be trying to convert the vectors into words into vectors so here the first step i can use is something called a bag of words second step that i can actually use is something called stf idf okay second i can definitely use something called as tf idf the third step that i can basically use is something called as word to make okay so all these techniques we will probably learn as we go ahead okay and all these things are there now here we are focusing on what two vectors in today's class will focus on all these things okay i will not cover word to work today probably in the next class this will be covered but till here with respect to practical implementation everything will be covered from tokenization to lowering the case word for applying some regular expression stemming limitation applying stop words then we'll try to understand bag of words then dfidf and then we'll try to do a lot of practical replication okay now coming to the first step okay i hope yesterday in a study session i have explained you about tokenization i've explained you about stemming i've explained you about limitation and all right practically i'll show you today okay now if i go to bag of words now the first thing that we are going to discuss is something called as bag of words but before understanding bag of words i really want to finish off one important topic which is called as one hot encoding okay one hot encoding now let's say okay now let's say i have a simple uh data set okay i have a simple data set which is which is basically something like this let's say this is my corpse corpus okay this is my corpus the corpus is basically a man eat food okay then cat eat food people watch krish youtube channel okay so let's say this is my entire corpus okay and this is my entire corpus okay corpus whenever i say let's say this is my entire paragraph now tell me now tell me over here if i say what should be my vocabulary vocabulary basically means what should be the count of the verb capability vocabulary basically says how many unique words are there so over here you can see a man eat food four five 8 is repeated so 6 i will not count food is also repeated so 1 2 3 4 5 okay and then 6 7 8 9. so here you can see that if i do the overall count the vocabulary is nine so nine is my number of words right nine is my number of words so with respect to words i can write like this which is my features a man eat okay a man eat cat people watch krish whitey okay so this is how these are the main words or vocabulary words that you can basically see okay so total 9 is there now each and every word will basically be represented in the form of one hot encoded okay one hot encoded so wherever the er is present so ah a a will get converted into something like this see this i will convert this entire sentences now a will get converted so first let's say this is my document one okay document one because document one is this document two is this document three is this right so that you actually know that okay the food is missing uh okay fine let's add food over here no need to worry no need to worry no need to worry okay so over here you basically have a a a will get converted into a one hot encoded format of one one and a will become one and remaining all will become zeros right so one two three four five six seven eight right so this is how we are going to basically do it okay so this will become my first vector okay first word vector okay first word vector so that basically means first word one hot encoded vector oh okay so here you can see a man eat food cat people watch krish whitey so these are my unique vocabulary vocabularies so here i can basically write wherever a is present that will become one and remaining all will become zeros now coming to the second word man is there right man over here it is present so this will become 0 1 0 0 0 0 0 0 0 so this will be my next vocabulary and then the third vocabulary you can see that i will try to combine this and this will become eat eat is present over here right so this is basically going to become 0 0 1 and remaining all will become zeros so this we have already done in one hot encoded okay so here again we have something called as food so food is basically present in the fourth letter so it will become zero zero zero one zero zero zero zero c okay so here this is my entire one sentence uh one hot encoded format the d1 ohe one hot encoded format okay so this you can basically do with get dummies there a lot of options and by just writing a simple python code you can actually do this okay and uh this i really wanted to teach you because using this we can learn the further things that we are going to go ahead but just understand why i am teaching you this i don't want to implement this this is nowhere right now used okay unless and until you are just doing one hot encoding for some categories okay but here you have an entire text sentence okay so this is called as one hard encoded format now this has a lot of issues okay i cannot just directly use this but this has really really lot of issues okay and what are the issues we are going to discuss about it uh now okay first of all uh the major issue that you can probably find out over here is that sparsity okay yes if i say about the advantages and disadvantages can anybody tell me what are the advantages in this case and what are the disadvantages in this case advantage is that okay it is quite simple right this is very very simple to implement simple to implement okay easy to implement simple to implement and it is also intuitive intuitive right this we can definitely understand that it is very very much intuitive you can definitely understand here we are just doing one not encoded right but major disadvantage that you are basically going to have that it really creates a sparse matrix what is sparse matrix sparse matrix is that you'll have this kind of matrix where one value is zero one value is one and remaining all are zero okay remaining all are zero so this just imagine over here if if probably in this particular case let's say uh i have all these things right one zero zero zero zero like this kind of one hot encoded format right and here probably i'm going to use this or create a sparse matrix that will become a very huge thing and tomorrow if you're using some kind of machine learning model then if you try to train that specific model with the with a specific uh how much ram you may be requiring and how much time it will take right so a lot much more com computation right sparse metrics okay one one one thing i really want to change with respect to uh one hot encoded also guys and probably i just made a little bit a mistake in this case suppose if i rub this okay forget about the vocabularies we know that vocabularies they are 10 okay but in case of one hot encoding we will not consider the entire vocabulary okay so this this is just one small change that i really want to make so in this document one how many words are there one two three four right so a a will basically be represented something like this one zero zero zero because in this sentence i just have four words okay so i really want to give the one hot encoding with respect to only these four words okay so this was the change because in the upcoming one we'll be considering vocabulary but in one hot encoding we don't do that okay then the next word man is there i can also make this as 0 1 1 0 okay and third one for it i can basically make 0 0 1 0 and food i can basically make 0 0 0 1 so this will be my entire document for one thing okay but again understand that much only we are again creating a sparse matrix okay let's say this is for d1 if i do it for d2 can anybody tell me how it will become for d2 cat eat food so here i'll be having 1 0 0 then here i will be having 0 1 0 and then here i will 0 1 0 and one more i will be having something called as zero zero one okay okay so over here uh you can basically see for d2 i've just used three words one zero zero zero one zero zero zero one and this is basically the one hot encoded format sparse matrix will definitely be there and sparse matrix you know unnecessary saves of lot of memory you know it'll computation will probably time will be more you know when you're training the specific models okay now understand one very simple thing over here the other disadvantage that you see that there is a there is a saying which is called as uh out of vocab vocabulary okay i'll say oov out of vocabulary now what is this out of vocabulary understand one thing guys over here do you see the size of this sentence this sentence is four words right so when you have four words over here you have four different position where you're filling this kind of ones and zeros like one hot encoded format right but if you go to the second sentence there are only three words one zero zero zero one zero zero zero one okay now understand if this vocabulary size decreases we cannot train the model we cannot train the model we cannot train this specific model okay no we cannot do it because over here you can see that right the size is decreasing always remember whenever you're training a machine learning model the inputs will be fixed that many number of features will be there if there are 10 features and if you're training a model we have to make sure that each and every data point has 10 features but in this particular case the second sentence here you can see that it is just having three features right so here we cannot train the model so this particular problem is nothing but out of vocabulary in short you can also say that my sentences are not fixed right not fixed sentences i'm not fixed size you can basically say that right so this is the major major thing right the next thing is that over here you can see one word is one and remaining all are zeros then suddenly in one sentence you can see that you know somewhere one is there and remaining all are zero so between the words semantic meaning is not captured so the first fourth disadvantage that you can basically find that semantics between semantic meaning between word is not captured between word is not captured okay what does this basically mean that we are not able to find out the relationship between a man and eat or man food right no relationship wherever that specific word is that is only present as one over there okay so all these things are issues over there right so uh i hope everybody is able to understand i guess right so can you just give me a quick yes if you are able to understand till here with respect to the advantages and disadvantages okay advantages and disadvantages everyone yes and here you can just see that wherever that specific word is only one is present otherwise zero zeros are present so that is a major issue okay so just let me know whether you have understood till here so the first major disadvantage is that sparse matrix sparse matrix wherever it is present it is again a problem out of vocabulary basically says that uh over here you can basically see that a man eat food food is that right so here you can see that whenever i write like a man eat food you can see the vectors that is present is one four zero zero zero zero okay one four zeros but in this second case you can see that cat eat food right cat eat food is there and here first of all the size is not matching size is not matching so i cannot train the specific model the other thing is out of vocabulary out of vocabulary basically says that let's say tomorrow in my test data i have on one more word i can say that cat eat food dog so if dog is present do you think we can create this kind of vectors this kind of vectors no right in our training data set let's say i don't have dog but in my test data set i have dog so do we can we create vectors of dog also can we write 0 additional 0 no it is not possible so after like all the words that are present in this training data set if any data comes new data comes in the test data we will not be able to handle it okay okay i hope everybody is understanding about out of vacability right extra words that may be coming in the test data will not be able to handle it right so this is the major issue okay so uh the third thing is that not fixed size you know over here you can basically see that uh the size is not fixed away four words are there over here three words are there so we cannot definitely train a model okay and then semantic meaning between the word is not captured now when i say semantic meaning between the word over here you can see a man eats food right in this particular case a vector is actually created as one and remaining all are zeros there is nowhere a relationship between a man eat food no relationship we just have ones and zeros right so when we have ones and zeros we cannot definitely relate it okay so uh i hope everybody has got it now yes everybody clear with this now let's go and focus on the next one which is called as bag of words so the second technique we will try to fix it which is called as bag of words okay now bag of words what is bag of words so guys can i get a quick confirmation if you are able to understand till here yes can i get a quick confirmation if you're able to understand till here see guys if we have a huge sparse matrix we cannot like for training a model you know it will be highly computational right okay no sandishwar this i think this is correct okay so one not encoding this is correct okay now let's consider bag of words now in bag of words let's say that i have a sentence one like this let's say this is my sentence one or document one d1 d1 basically says he is a good boy okay he's a good boy d2 says that she is a good girl we'll try to understand what does bag of words basically say and how do we convert these words into a vectors using bag of word and let's say my d3 sentence is boys and girls are good okay so these are my three sentences okay so let's say i have applied a stop words or forget about stop words also i don't want to apply over here first of all we will try to find out the vocabulary okay now before finding out the vocabulary what we do is that we have to remove the unnecessary words you can see this this is he uh you know she is a you know and and are these are words that are not that much useful you know and we usually skip this whenever we are doing sentiment analysis because these are most of the generic words you know it does not provide that much importance and how do we remove this words we apply stop words okay i will show you practically how many stop words are there and which stop words you can basically use and how you can efficiently use each and every stop words and everything okay so if i apply stop words then this words will get actually removed so now my new words will my my d1 will now become let's say after i apply stop words is we will be remaining with two word which is called as good boy my d2 will be remaining with two word which is called as good girl okay my d3 will be remaining with how many words boys girls and good right so unnecessary word with the help of stop word i'm actually trying to remove it okay so uh no use of this specific word i have cleaned it and i'm actually getting all these words okay so i hope everybody has got uh clear on this okay now tell me with respect to vocabulary vocabulary how many different different vocabulary are present okay how many different different vocabulary are present why do they get removed because this words will not play an important role right understanding text pre-processing the small words like is the uh of she he they should get removed because these are some not important words for many of the use cases like sentiment analysis toxic classification you know but yes some of the use cases are there like chat bots and all we'll keep it but just to make sure that our promotion our uh you know over here i'm able to make you understand quickly in an easiest way that will be basically good right now in this particular case you can see that vocabulary how many are present one is good one is boy one is girl okay one is girl how many vocabulary are actually present over here okay okay i will just make one small change let's make it as boys and girl okay otherwise again it will get more words will be coming over here boy and girl okay boy and girl so i don't have to write this s over here s over here so this will become boy and girl okay so you can see that i will be having three words right now if i try to find out the frequency how many time good is present how many times good is present one two three right so this count will be three how many time boy is present one two how many times girl is present one two okay so the based on the frequency you can see that i have got three vocabulary good boy and girl and the number of time these words are actually present it is basically three two two okay now let's see this now how this words will get converted into vectors see this now whatever words are actually becoming are present in the vocabulary this will become my feature one like feature one feature two and feature three and what is feature one over here good okay what is feature two boy and always remember guys i have to basically the order of the feature will be based on this frequency okay in bag of words based on this frequency so maximum number of time good is present so i'm going to put good over here then we are going to have boy over here and then we are going to have girl over here okay okay girl over here then this will be my sentence one or i can basically say my document one my document one my document two and my document three okay now tell me what is my document one over here you can see what is the sentence that is present over here good boy okay so i will go to document one wherever there is good wherever there is good i will increase the count by one okay so over here you can see good boy is present so good will become one boy will become one whether girl is present over there girl is not present in sentence one okay girl is not present in sentence one so what i am going to do i am basically going to make it as zero okay now people are asking what about case sensitive understand guys always make sure that whenever we are doing the stop words right we also lower all the words lower all the words okay lower case in short lower case so that we don't get repeated words okay so here we are basically having one one zero so go good will become one boy will become one and girl will become zero because girl is not present over here okay now similarly if i go to d2 good girl is there so good is going to become one boy is not there so this is going to become zero girl will be there so it will become one so in this particular case it will become one zero one okay one zero one and finally you can see good boy and girl boy girl good all are present so what i'm going to do i'm going to make this count as 1 1 1 okay so i hope you are able to understand so in short what we have done and finally you will be having one output feature over here which will have zeros and ones and zeros so now you can basically uh you can basically use this and use this for your classification problem because this now you need to just apply your machine learning algorithm this also i'll show you but let's focus on text pre-processing text free processing now tell me guys let's say there is one more word which is like good again in this now what will happen in d1 yeah boys and boys will be different so i can add one more word but here in this particular case i have removed it okay now if one more word good is present so what will happen over here can you say that i have to increase the count so here what i will do i will increase the count okay i will increase the count okay whenever there is additional what you can increase the count but in bag of words we have an option to make this as binary bag of words okay in binary bag of words we just we just say that if there is anything more than one then also we make it as zero or sorry we will make it as 1 okay so in binary bag of words we'll have either ones or zeros if a word is present in the sentence we'll directly make it one okay because at the end of the day we really need to understand whether it is ones and zeros it is just indicating that whether the word is present in the statement sentence or not okay so this is what is there okay now let's talk i hope everybody has understood about uh this specific thing right this this entire bag of words because there are a lot of things that we need to learn over here okay so binary bag of words i hope you have got an idea whenever we have a binary bag of words we will not find numbers like 2 3 based on the frequency that is basically there okay everybody clear yes can i get a quick yes if everybody is able to understand very simple right first this is my sentence he's a good boy she's a good girl boy boy and girl are good you know then we applied stop words we lowered all the words okay and then we used good boy good uh i'll remove this code then good girl boy girl good right and then you will be able to see that finally i converted this into feature one two three based on the frequency i have kept that order okay and then i have one one zero because in document one i have good boy good okay okay and then document two document two is nothing but good girl so wherever good is there one girl is there zero oh sorry girl is there one remaining all zero in document three you can see boy girl good so everything is one one one okay this output i have just defined my way okay assumed let's say i have assumed but forget about this output if you're getting confused don't worry about the output i'm saying in your data set you will have some output right in our data set you'll have some output no the output can be anything okay the output can be anything okay very much clearly the output can be anything guys again understand i've just defined the output okay so it just with but in your data set you will be having an output now very simple assumptions the output is my own assumptions of my disneyland okay okay so everybody clear with this now let's talk about the advantages and disadvantages and uh this thing you definitely need to know it and you need to know in such a way that uh advantage and disadvantage are very much important okay now let's say talk about the advantages now you tell me what are the advantages and disadvantages what are the advantages and disadvantages again what is the advantages and disadvantages tell me so the advantages first is that it is simple right simple and intuitive very much simple right you don't have to basically worry about all these things this will be the most advantage simple and intuitive but there are a lot of disadvantages first of all tell me sparsity will be there or not if i talk with respect to the number of vectors right now i've just taken this three sentences let's imagine if our data set is very huge thousands and thousands of records then what will happen over here what will happen over here then what will happen uh you will be finding many words like this right so sparsity issue is not getting fixed then also you'll be having ones and zeros right yes up to some instance it can become better only for some specific data set or for that data set you will be knowing that how many number of vectors will be there but in a use case scenario uh this is not that much uh solving the problem there is still a problem it will reduce but not that much okay so sparsity is still a problem what about oov out of vocabulary now out of vocabulary in this scenario also you may have right let's say there is one more word there is one more bird good girl good girl and let's say this is a cat word cat so cat will not get handled by this vocabularies right because this vocabulary is again you know cat is if cat is not present away how it will be able to handle so what we are going to do this entire word will get rejected now when this word is getting rejected understand the entire sentence meaning may get changed okay so sit out of vocabulary problem exist you know if some one word is not there it may get removed and the entire meaning of the sentence may get changed or the most important information that needs to get captured will not get captured okay so this is the first second point now the third point here you can see that very important very very important right now good boy over here you can see good boy girl right good boy girl over here the ordering of the words has completely changed i'm going to basically say that over here ordering of the words ordering of the words has completely changed ordering of the words has completely changed okay ordering of the words now why i'm specifically saying ordering of the words see based on the frequency i am using the feature like good boy girl right based on the frequency suppose if boy frequency was high then it had become boy good girl right something like this so that that ordering that is here right he is a good boy good boy ordering is good right there you'll be able to understand more meaningful information and if i take this three dimension points and probably let's say i'm just going to plot it in a three dimension let's say i'm going to plot it in the three dimension okay let's say this is the point which says one zero zero this is the point which says one zero one or this is the point which says uh one one one okay whenever i try to plot this point i will be able to capture the distance between this two point right how do we capture the distance between two this point there are two ways one is euclidean distance and one is uh the other other way that we can basically use is something called as cosine similarity i think uh you may be knowing about this okay this distance basically says that how similar this and this sentence are or any sentences are if you have 1 1 0 1 0 1 okay and if you have one one one right this two sentence may look a little bit similar right let's say let's say i have a point like this and this is heavily used in recommendation system okay i really want to teach you this cosine similarity okay cosine similarity okay and whenever you want to find out cosine similarity there's a technique how do we do it first of all we'll apply cosine rule cosine rule basically means what let's say there are two points let's say this is one point this is two point okay this is my p p1 and this is my p2 now if i try to calculate the distance between p1 and p2 okay how do i say this point is nearer or far okay so there are two ways directly i can calculate the euclidean distance the second way is that i can basically use a cosine rule i can use a cosine rule in cosine rule what i'll do i will just try to calculate the distance between them let's say this angle is something around 45 degree so how do we find out 45 degrees so 45 degree is nothing but cos 45 how much cos 45 is it is somewhere around 0.53 okay now if i really want to find out the similarity between p1 and p2 it is very simple i am just going to basically say 1 minus 1 minus cosine similarity is equal to just a second 1 minus 0.3 will basically give my cosine similarity so if i probably check out the cosine similarity it is nothing but 0.47 so that basically means p1 and p2 is only 0.47 similar okay now if i change this problem statement little bit more let's say this is my point one this is my point one let's say this is my point one this is my point one and this is my point two let's say there is one point which is mentioned as uh zero comma 1 and this is p2 this is mentioned as 1 comma 0. now how similar these points are if i really need to calculate it i just need to find out the angle between this the angle is nothing but 90 degree let's say so what is cos 90 cos 90 is 0 right so if i really want to find out the cosine similarity it is nothing but 1 minus 0 which is nothing but 1 whenever i get 1 this or sorry cos 90 is cos 90 is how much let me calculate cos 90 cos 90 cos 90 is 0 okay fine so 1 minus 0 is nothing but 1 right so here we are able to find out how much is the cos cosine similarity over here okay between this point and this point okay so this basically says that okay the difference is huge this point is completely different to this point and the distance is also quite huge if i try to calculate the distance okay one more thing if i really want to check it out okay let's say there is also one more point again another point which is present over here okay if i try to calculate a distance between this point and this point then how much i will be able to get c what is the angle that it is forming between here to here it will be 0 so what is cos 0 cos 0 is nothing but it is 1 so 1 minus 1 is going to be 0 whenever i get 0 that basically means these two points are almost similar right almost similar i hope everybody is able to understand till here right so why i'm teaching you this is that because let's say in recommendation system there is one one one movie which is called as avengers if a person sees avengers will he get the recommendation so avengers if if it is in this point let's say this is an iron man if person is seeing avengers he'll get recommendation of iron man because these two are almost similar okay these two movies are almost similar suppose there is another movement another another movie which is like minions let's say minions is a comedy movie if you are seeing this kind of action movie minions will not get recommended okay so similarly all the other points are also present here and there so this distance we i will probably take some more class regarding cosine similarity so but just understand that the distance it basically talks about the distance and whenever now tell me whether this 1 1 1 0 and 1 1 will be near or not yes it will be near ok this will be near but understand it will not be able to capture the ordering ordering of the words will be very difficult to capture okay so i hope everybody is able to understand till here and these are some of the disadvantages with respect to this let me continue okay so ordering of the words is there and the fourth condition is that semantic meaning is lost semantic meaning it is not able to capture the semantic meaning okay not able to capture not able to capture okay now semantic meaning is basically with respect to this cosine similarity all the words may be far from each other it may be saying that okay sentence one and sentence two will be similar but it will not be able to represent in a proper way okay so these are some of the advantages and disadvantages with respect to bag of words why i'll tell you see here we are specifically taking good boy girl right good boy girl now in order to capture the semantic information capture the semantic information in order to capture the semantic information what we will do is that we will be using something called as n grams now what is this n grams we will try to understand okay i hope everybody is able to understand guys yes everybody is able to understand uh shall i go with respect to n grams yes yes yes we can use bag of words cfidf with the list team okay but should i go ahead everyone if you are able to understand please hit like if you are able to understand clearly happy or not first of all say say some hot sign something symbols right right something say something okay it is good to say something it's okay cos 45 also 1 by root 2 you can actually use i guess a cost 45 you can find out okay perfect now how does ng grams actually help you to form the semantic meaning okay now ngrams just see over here guys how many words i have good boy girl right let's say our feature one is good boy girl okay now my sentence one had let's say i'm just going to use this okay 1 1 0 1 0 1 and 1 1 okay 1 1 0 1 1 0 one zero one and triple one okay so this is my sentence two and then my sentence three okay now let's go to the next feature now if i use ng grams ng grams basically means they are different different things if i am using a combination of two then becomes bi grams or i can use trigrams or quad grams and like this i can use n grams now what does brygram basically say what does bigram basically say bigram says that apart from only single features we'll be using combination of features a combination of features in this case how it will become good boy and good girl okay so this will be my feature one this will be my feature two this will be my feature three this will be my feature four and this will be my feature file okay this will be my picture feature phi now i will go to the sentence is there good boy present in over here is there good boy present in sentence one see good boy is actually present right good boy is actually present so whenever good boy is there then this count will become one whether good girl is there no it is not there so this will become zero if i go to the sentence to sentence two what is there good girl is there right good girl is there so one combination is definitely there so i am going to basically make it as zero and this has one okay and then let's go to sentence three whether there is any good boy a good girl no so i will go and make it 0 0 okay so this combination this combination you can see by gram it actually becomes so suppose if i have a sentence krish eats food if i tell you to make a trigram or if i tell you to make a bigram how many bigrams you can actually make how many by gram you can actually make boy girl combination will not happen guys understand with respect to bi grams this and this combination will happen and then this and this combination will happen you may be thinking can we make this combination also or not i will check it practically you i will show you practically this will not be possible i guess but if i have a question like this krish eats food how many big grams will be available there are two biograms okay there are two bigrams so here what will happen it will say krish eats will be one and then krish eats eats food will be another bigram i hope everybody is able to understand good girl is not present in sentence 3 guys good girl good is there girl good is there see always understand why we are creating bigram because i really want to follow the same sentence same word okay so here you can see krish eats and eats food can be my second diagram so overall if i get this question so there will be two bi-grams okay okay i am not feeling well so how many different bike trigrams can be here tell me how many different trigrams will be here this will be the question for you how many different trigrams will be there so trigrams basically says that one two three okay one two three then one two three right right so totally how many one i am not okay so this will be my one word then m not feeling this will be another diagram not feeling well this can be three trigraphs so are we getting three trigrams are we getting three trigrams so this will be three y graphs okay then if i do feeling well okay now always remember one thing guys okay in practically i can basically say one comma three okay if i do one comma three then this will become like this i am not so this will be my feature one this will be my feature two this is my feature three then i will go ahead with feeling will be my forward or let let's say that i'm going to make it this combination let's say let's say i have a combination which looks like this krish is not feeling well okay so if i say my combination of n grams everybody see over here everybody if my combination of n grams is one comma three then how this combination will happen so my future one will become krish feature 2 will become is feature 3 will become not feature 4 will become feeling feature 5 will become well now after this i will first of all use bigram so bigram will become krish is see one comma three that basically means from uni to trigram you have to use all the combination okay from uni to all the diagram you have to basically use all the combination okay just a second guys water have fallen over here so just give me a second okay okay so if i go ahead with this you will be able to see that i will be able to make combination of krish is different different combinations okay now quickly let's open one jupiter notebook and i will try to show you over here practically okay let's go ahead and let's let me show you practically completely how you can basically do it okay combination should be in serial okay so let's start practically everyone yes let's do it okay let's do it so first of all install pip install nltk okay first of all everybody install nltk so you have installed an ltk everyone okay now let me open and search for any anything i probably will search for any corpse okay so let's go to the wikipedia anything you can pick up so this is wikipedia of narendra modi sir so here i'm just going to copy and paste all these things let's say this is my paragraph that i really want to work with okay you can also do with google collab guys not a problem now i'm just going to consider this as my entire paragraph okay so this will be my paragraph okay or corpus so i'm just going to copy and paste it over here let's see i'll be using multi-line comments okay and i'll execute it so this becomes my paragraph so if you want to see my paragraph you can basically see this over here okay now the first thing over here is that you can see all the words that are present i've just copied and pasted it okay okay everybody is clear you can pick up anything that you basically want okay you can you can pick up anything that you want okay now if this is my paragraph now the first thing that you see is that how can i use nltk okay so i will be installing some basic libraries with respect to nltk okay the first library that i really want to install is something called as import nltk and from nltk if you really want to apply stemming i will just say dot stem import porter stemmer okay so stemming is basically done from this library which is called as porter stem and then i will say from nltk dot corpus import stop words so all these libraries i'm going to basically use it and this will be used for some of the other purposes okay yeah you can do it in google collab wherever you want okay so i hope everybody is clear till here very simple i did not do anything you can take up any text that you want it need not be the same text we'll try to learn everything how to do stemming and all tokenization everything and all we'll try to do it okay now first thing first if i really want to use tokenization tokenization basically says what tokenization says that i need to convert i need to convert paragraph into sentences and then focus on the words okay okay so these are all things i'm actually going to do now in order to use this tokenization process in nltk we have something called a sent send tokenize basically means you have to probably it returns a sentence tokenized copy of the text so in short this is going to convert the paragraph into sentences okay and in order to use this you need to download one package in nltk and that package is something called as punk p u n k t okay so this punk you have to download it then only you will be able to apply this now inside this sent underscore tokenize what you have to do is that you have to basically use paragraph paragraph basically means whatever paragraph you have given over here so this entire paragraph will now get converted into sentences so here what i'm going to do i'm going to basically use sentences so here you can see packet is already up to date now if i probably go and print my sentences you will be able to see that i am getting this okay so here uh sent tokenization [Music] okay nothing no changes that happened or what so here i'm actually getting a list what is this type of sentences let's see it is a list okay so if i write sentences of okay fine so here you can basically see that i'm actually being able to get the sentences and uh now after doing this what we are going to do is that we are going to convert this into something else now okay so this is what we have actually done and you can see this is the entire copy that we are actually going to get okay now let's go to the next step quickly okay yeah i will be attaching the ipynb file in the description box in the dashboard okay now everybody's clear with this i'm just going to apply this tokenization process now let's see how to apply stemming now see we have actually used this porter stemmer and this is specifically used for stemming purpose what does stemming do it actually helps you to find out the base root word okay now here i'm going to basically write stemmer dot see over here you can see i have used porter stemmer okay so i'm just going to copy this and basically write stemmer is equal to initialize port stemmer and if i do this and then if i write stemmer dot stem and give any word like going then it will convert into the base root you can see go if i probably use facial here you can see i'm getting facial fine if i write thinking you can see that i'm able to convert it over here as think right if i write a drinking you will be able to see that i'm actually getting drinking right drink if i probably write something called as history here you can see that i'm actually getting history okay so all these things are actually available over here and you can basically see this okay okay so you can join the dashboard that is given in the description of this particular video guys okay to find out the notes and all okay now similarly if you want to apply limitization limitation is also possible and for that for the limitation you just have to import another library which is called as word net lamentizer so here you just need to import from sklearn dot stem import word net word net lemmatizer so if you import this you will be able to from sk learn so from nltk dot step you import word net nemetizer now this lemmatizer will actually help you to do limitization which we have actually learned so in lemmatizer always understand it will actually help you to find out a very good words with the base word but with proper spelling right now i give stemmer dot stem with history you can see i'm getting history now if i give lemmatizer.lemmatize with history here you will be able to get the proper word see this so if i execute here you can see i'm getting history okay if i'm giving going i will be getting the proper word going if i'm giving drinking then you can see that i'm getting a proper word called as drinking right so here spelling is fully or completely focused on okay do it for goes so goes is there and here you can see i'm getting go okay base root word form otherwise if that specific word is properly present okay here we can basically use it now all these things i'm going to apply in this text whatever is present over here okay everybody i hope it is clear till here okay now first thing first what i am actually going to do is that i am going to probably clean this entire text how to clean this entire text uh because there are so many special characters right like this uh like this this new line special characters these all new lines the special characters is there you know so we we we should try to clean this all special characters so how do we do it i will be using regular expression so import re and let's say this is my corpus my new corpus after cleaning i will say for i in range of length of state sentence length of sentences let's see how much is the length of the sentences that i'm actually getting length of sentences so they are 16 16 i'm getting the count with respect to this all sentences there are 16 sentences in short so here i'm basically going to write from uh this to this and then i'm going to apply some regular regular expression for cleaning the data i'll write re.sub re dot sub if i press shift tab if i press shift tab on re plus re dot sub let's see this okay re dot sub and if i press shift tab here you can see that it returns a string obtained by replacing the left most character so i need to replace all the special character okay how do i replace all the special character other than a to z and all right so here i will write a condition which says that other than other than which symbol should i use for other than where it is i'm not able to get the symbol okay this is the symbol other than a2 small a2 small z or capital a to capital z whatever you get replace that with a blank character okay and where you have to specific apply i have to apply on my sentences of i okay so this is basically saying that whatever things you are getting other than a to z try to remove it and try to keep it as a blank space so this i will save it in my new variable which is called as review okay review or i can basically say text whatever you want you can basically write and the next one i have to also make sure that i have to convert this into lower so here i will just say dot lower okay dot lower sentence is already inbuilt function is there it will make sure that all the text will be changed to lower and then i will say review dot split and then i'm also going to make a split onto that the reason why i'm making split or let's say i just want to make this all things okay and then finally what i will do is that i will just say corpus dot append corpus dot append review so i have actually cleaned the sentence entirely and made it lower so if i probably show you corpus now here you can see now i'm getting a very good text when compared to all the other texts right all that special character went right simple by using simple regular expressions right simple regular expression i have actually used over here everybody clear everybody clear yes okay okay now hardly i'll take for five another minutes five minutes we'll do the coding just i'll show you how to apply the bag of words and then we can move forward with the quiz okay that is what we are going to do now okay now first of all let's apply stemming how do i apply stemming already i have shown you over here i have shown you how to apply stemming stemmer.stem now similarly can i apply stemming on this entire corpus which is belonging which is coming up with sentences yes very simple how do i how do i do that how do i do that it's just like running a for loop right can anybody try it out or should i take it tomorrow or continue tell me how do i apply stemming very simple no i will just say for i in corpus right and then if i write print i here you can see that i'm getting all the sentences right all this sentence this is my first sentence this is my second sentence it is printing on top of it i want to apply stemming or limitization how do i do it how do i do it right how do i do it very simple no how do i do it how do i apply stemming first of all can i say every time i am getting a sentence so i can write for word in i right for every words i iterate right i i trade so what i do over here i will just write stemmer dot stem right that's it then automatically the stemming process will happen guys give me some time now i see if you don't give me some time then how you'll you'll be able to not learn it properly right okay now everybody see this this is the sentence that i have written for i in corpus okay for i in corpus nltk dot word tokenize will make sure that from the sentence i am getting the words then for word in words if word not set in stop words dot words english print stemmer dot stem words so if i execute this here you can see for every word we have applied stemming every word we have applied stemming and similarly you can basically also apply limitation this is the code okay first of all i am converting the sentence into words and then i'm basically setting up a english stop word if you want to see english top word just go and execute like this stop words dot words with respect to english so here you can see these are all my stop words i me myself we our you are you would yours himself she her itself themselves these was these are all you can see more most other some not only can will this this is there and here you can see haven't hasn't hasn't is also there not will also be there right these are all the stop words so i am just saying that whatever words are not present in the stop words just try to apply stemming in that okay okay you can see over here very clearly now what i did over here for i in corpus words is equal to nlp dot word tokenized word tokenized basically converts a sentence into a words okay it converts a sentence into the words okay and then for words in words if word not in set stop words dot words english so in what i am actually doing over here is that whichever words is not present in the english top word i am just going to apply stemming for that now if you similarly want to do lemmetization for this how will you do limitation so here i will basically say lemmatization and here instead of writing stemmer dot this i will say lemmatizer dot lemmatize that's it and now you can see that i'm getting all perfect words right so here you can actually see lemmatization stemming i have actually done with the help of two words okay now let's try bag of words for bag of words it is very much simple i will be using a library which is called as for sklearn dot let's see the library so there is something called as count vectorizer sklearn count vectorizer so guys you have to give me some time see i know sometime times gets over extended but just please uh stay with me okay for the people who wants to drop please drop okay no problem but i i definitely want to teach in a better way so from scale on dot extraction dot text you have count vectorizer okay so i'm just going to use the count vectorizer and probably do it okay so here you can see that if you want to import count vectorizer how how do you write it i will just show you so dot text import count vectorizer so here i will initialize count vectorizer and there will be lot of features that can be used in count vectorizer okay and now if i apply to the corpus so i will say fit transform to the corpus so if i just execute this let's say this is my x okay and now if i write cv dot vocabulary see this guys you will be able to see all the things narendra has an index see cv dot vocabulary narendra has an index of 95 dhammur das has an index of 31 93 53 all these things are there now if you really want to see the bag of words for the first text x of 0 or for the first sentence in the corpus so here you can basically see this is my corpus right if i write corpus of 0 so for this sentence if you want to see how the bag of words is basically created here you can just write x of 0 dot to array and here if you execute it here you can see this see this right here you can see all very much clearly now here narendra moda says which number or which index you can see that it will be present as one so if you go in 95 index so 95 index is nothing but uh from here you'll be able to see this right so total index how much it is and 95th index probably this will be there right this will be there if i probably see dot dot shape here you will be able to see it see 0 1 2 3 4 5 5 is which number which value over here so if i see 5 somewhere you will be able to see six is and okay six is and sixth index is and five indexes and is and present over here so here you can see and will be present somewhere here or end is present n is also present so here you can find out all the information which index you will be getting ones and zeros and here you have two right index index this is the index not frequency guys index index feature number okay index feature number f1 is what bond is 17 feature right so if i probably consider gujarati let's say gujarati is where over here gujarati is 53 index so somewhere here in 53 index you'll be able to find out one just make a count okay now right now here you can get that you're getting as twos right so if i really want to convert this into a binary bag of words press shift tab over here and here you will be having a parameter which is called as binary so binary right now is false if you make binary as true so here you can basically apply it again and now you will not be able to see any any any twos over here that will become one okay binary for binary you just make it as binary is equal to true see here i made it as binary is equal to true did we apply stop words yes we did apply stop words over here see over here right but in this we did not apply so if you want to apply also in stop words over here we can do that okay how do i apply stop words everything i'll show you around right now here also you can apply stop word see in the corpus i'll try to apply stop words so i'll say for words in this this this this is let me write a code for you now let me just copy and paste the code so that we can apply stop words also from the corpus we'll try to do it okay so from here we'll try to do it so let's apply stop words anyhow i'll give you this materials lemmetization so here it is a2z lower split and then finally i'm applying stopwords english and join review okay fine now if i do it let's now you can see over here now this will go stopwards has went all unnecessary things now you can see after applying stopwatch my number of values became less right my sparse matrix has become smaller okay how to apply stop words here you can see i've applied it lemmatizer.limitizer if for review okay so all these materials will be anyhow given okay okay so quiz time everybody ready for the quiz yes let's start you should give me some time you know if you don't give me time then how will i be able to teach you quiz time shall we go with quiz so let's go with the quiz now quickly go over here to this website okay something is loading let me check okay go to mendy.com over here and enroll uh login quickly and we'll start the quiz okay go go go to the quiz section i want all the 500 to at least participate hit like guys quickly 144 have joined go with it quickly quickly another two minutes is given to you go to mentee.com and also make sure that you follow me on instagram then only i'll be able to validate you have to send me the screenshot you have to send me the screenshot of your scores at the last then only i'll be able to give you the reward the first price will get 2000 rupees second price will basically get 1500 rupees okay the third price will be 1500 rupees go to mentee.com use the code 1698310 and make sure that in instagram you follow me and make sure that you like this video okay guys otherwise you will not be able to you know send me the things i need to validate with your name and give you a proper name okay when we start if you don't have insta then try to communicate me through link gmail then again it will take time i will not be able to transfer right now i have to validate right then only so everybody please uh follow me in instagram at least so that i can drop me a message with respect to the chat okay so uh 373 people still facing problems for joining everyone how many of you have joined it's okay if you are not on insta please make sure that you drop me a mail with your screenshot please take the screenshot at the last because with insta i will be able to do it here on itself yeah picture basically means the output the the last final rank you have to probably share if you are in the top three then only you will get money right not able to join is not able to log in why go to mentee.com 1698310 is your how many of you have joined okay quickly quickly i can see four zero nine people have joined you have to start fast we will not waste any time okay and guys it is a very small request please try to share the community sessions with everyone not everybody is so much lucky to get all this content you have to keep on sharing okay okay now let's go to the next screen then again people who have not joined go to mentee.com and use the code which city do you belong okay go ahead and write the city name go ahead and write uh the city name let's see from today or like how many people are there or from where they are actually participating okay many of them are from bangalore there's this write your city name so that we can start please make sure that you follow me on instagram so that you can share the result guys okay still still people are joining quickly everybody insta id krishna let's start okay everybody i hope everybody's joined let's go to the next one please everybody see your screen and let's start the first question everyone okay many people have actually joined now and here we go answer fast to get more points okay if you remove the words like uh the is and we we use tell me the answer guys quickly so many people have said the right answer guys connection lost why guys i hope everybody is able to participate so many of them have said the right answer it is 330 okay so i hope this was a very simple question that everybody got and probably you have answered it so many people have answered it why people are getting connection lost i don't know okay i think i hope everything is fine right if not again join again join okay again join let's go with the next question everyone just a second everyone just a second i think we are facing some issues okay okay we'll restart the result quiz again don't worry i think there is some issue okay let's start it once again everyone okay we'll do it once again okay we'll do it once again cool cool cool cool cool okay okay quickly it's okay we can do it again we can do it again quickly everyone code is 16898310 i have restarted the quiz due to some internet connection i guess or i don't know there was some problem with the mentee itself guys if the connection is lost just reload the page nah simple um okay now everybody do it and give me a quick thing if the connection is lost reload the page my dear friends where to enter the code 1698310 is the code right if you want link directly hit this link hit the link hit the link i can see people are joining it's working now okay perfect join join join join quickly join quickly join join join everybody wait okay join join join everyone everybody quickly join let's make it 500 quickly it shows a heart i know it shows the heart everybody join quickly let's go with which city do you belong to quickly how many people are from different different places okay if you cannot join just go to the pinned comment the link is there should i also join from my mobile phone and participate in the quiz let's participate in the quiz i will also join if i come first i will also get price i guess so the code is 1698310 which city do you belong to i belong to bangalore okay so now i will also play and i will be the winner i'll take the money home okay perfect let's start let's start everyone you join but answer late enter your name everyone enter your proper name and i am i i'm giving wrong answers don't worry you know why you are worrying so much that will be pure cheating i do not cheat no okay let's go with the first question if we remove the words like uh the is and we use i hope everybody knows the answer stop word limitization stemming all of the wow i'm not planning to okay this time all of the above eight people have said okay so here also some amount of cheating is there okay i'll give wrong answer don't worry okay so first question i think this was simple uh let's go with the next one the second question in your screen now bag of words in text pre-processing is a feature selection scaling technique feature selection technique feature extraction technique none i am saying none okay so let's see how many of you will be able to answer still 14 seconds are there okay i know you all are very intelligent you will be able to answer this and finally you will be able to see wow time's up wow so my friends 305 have said feature extraction technique then we have feature selection technique 118 80 feature scaling technique none 43 amazing congratulations for the people who have said right now let's go to the leaderboard do you see krishnak somewhere kosher is in the first position russell is in the second position chava teja venkata in the third position amazing okay so please make sure that you follow me on instagram then only you will be able to participate okay because you have to send the screenshot of this kind of ranking okay so krishna x06 is my instagram id let's go again if you are planning to join you can actually join over here one six nine eight zero three one zero okay connect me on insta krishna x06 so that you can send me the screenshot okay of the mark so that i can give you money now coming to the third question everyone and total seven questions are there here we go so the question is for the sentence cat eat food how many bigrams can be created two three four five i think it is five i will i will definitely type five okay fastest fastest right i am definitely going to give 5 10 seconds 9 8 7 6 5 4 3 2 1 [Music] amazing 395 has said right this is nice very nice good right i like it so people are actually being able to say the right answers coming the next question on your screen okay the next question on your screen and here we go which is collection of documents tokenization engineering stop words corpus so i will i have given tokenization as my answer what about you who has actually given nine eight seven six five four three two one what is the answer now time's up my goodness 442 have said right 46 wrong 43 wrong and stop what no one said okay at least you know this right now let's go back to the leaderboard is caution first again no yes yes koshel is still leading russia is second vivek srinivasan is third amazing one question i could not take it today so caution and russell from ph is second vivek is third okay good going on guys so we have rohit sanam venkata i think he's also there and he's probably in this position let's see whether he'll be also able to come or not here comes the fifth question to you all which of these are not english stop words sad happy angry about against between into through during none which of these are not english stop words just now we saw that in the code also and time's up so 265 have said right it is sad happy and angry amazing okay so go and check in that is sad happy and angry so 265 you can see over here it's amazing now let's go ahead and check the leaderboard let's go ahead and check the leaderboard okay question number six after this leaderboard will be there coming to that sixth question tf idf is used in very simple in text pre-processing page ranking in search engine both first and second none i have not explained you this because you did not give me time so face the consequences so what should be your answer so your answer seven six five four three two one yes 156 have said right both one and two so tf idf is also used in page ranking so i say you that please let me teach you all say no chris tomorrow tomorrow now take tomorrow people who knew it was able to answer it okay but good job 156 people i have not taught them but yes this was the most difficult question i guess because i have not taught them amazing now who is in the top ranker oh the ranking has completely changed now pews first aditya agarwal vivek third god last people have gone now okay now let's go with the next question i am in the rank of 409 which is good for nothing the final question to everyone the competition is quite tight what does cp dot vocabulary gives dictionary content word frequency mapping dictionary contents word index mapping both none of the above now tell me the answer you have another ten seconds ten nine eight seven six five four three two one and that's it my goodness dictionary contains word index mapping i taught you this i taught you this entire thing okay so 205 people have said right let's see who wins it now guys before i show you the leaderboard please first of all if i really want to see it you have to follow me on instagram so that you can send me the screenshot after this the screenshot will be there okay i have taught this no today only did i teach or not if you had given me more time i could have taught you about tf idf also okay so yes uh follow me on instagram so that in the next screenshot whoever is i have answered it right will be so let's go to the leaderboard who is the ranker munis ostwal i think he's the first person congratulations manish so you are first sritank second third please message me on instagram everyone please message me in instagram please take the screenshot please say uh okay who is the first monish please tell me manish send me the screenshot manish kumar okay people are everybody sending now no no please manish you send me then only i'll be able to give you send me the screenshot okay i have got taroun's uh message thirun your up id okay tharon's up id i have got it tharoon is going to get 1500 rupees okay karun is going to get 1500 rupees so i am going to give him through his upid okay tharon is the third winner so 1500 quiz the congratulation the money is in within seconds you'll be getting it so i have transferred manish okay guys manish has not sent yet sritank has not sent yet shaytan has given okay perfect you have to give your mobile mobile showtank sink is there other that side shotan kumar is there ah cheater see people are doing cheating please send screenshot from your mobile phone shuttang i cannot you have taken the screenshot what i am teaching are you sure you are please send rank please send rank screenshot showtank i've got it uh but i've not got manish ostwald where is manish ostwal okay finally i've got manish oswald yes so manish i have got it manish up id you can send so i've got okay perfect so i have got uh sritank also information so i'm going to give him google pay this is nice right i'm going to send you don't worry you are my friend but i will try to find out the genuine person right so upid for this i have done it money is transferred quiz okay now one last is remaining munis okay manish also have sent now shwatang your transfer is done manish your upid please manish send your upid right so i hope you like this monish has sent from rahul's id what is this okay so guys just one request everybody please share it with everyone that i'm taking the community session and that would be definitely helpful because i really want everyone to use this thing where i am specifically giving you rewards also i am teaching you also and you are not giving me anything you are saying me to complete quickly huh so manish is the winner winner quiz and i'm finally sending him the money and here we go the last person okay congratulations everyone uh so manish also is gone so again my 5000 loss but unless until you don't share don't follow me on instagram how we can do these things right okay so i've transferred to all the people out there and i hope you like this amazing sessions and all uh yeah the support is required from your end also just make sure that you tell everyone to join this kind of session whatever youtube money i will get i'm giving it to you only okay every day so yes this was it tomorrow i'm not going to have that session because tomorrow i have some kind of work so i will not be available day after day tomorrow we will try to do it okay so tomorrow thursday i'll not be available we'll do it on friday okay so everyone thank you this was it and uh yeah keep on rocking and thank you bye bye have a great day
Info
Channel: Krish Naik
Views: 72,374
Rating: undefined out of 5
Keywords: yt:cc=on, nlp session, natural language processing for machine learning, nl for deep learning, krish naik data science
Id: VO6PeW6AePs
Channel Id: undefined
Length: 115min 40sec (6940 seconds)
Published: Wed Jun 15 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.