Live Session- Encoder Decoder,Attention Models, Transformers, Bert Part 1

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we are live uh yeah yeah uh we'll wait for some time so that will happen okay i think i don't have a link actually so let me check i've sent you the name okay yeah sure okay i got it okay now people are joining yeah sure okay hello hello everyone hello hello hello hello guys so today's the main agenda is basically to cover encoder decoders attention model transformers but probably we will be dividing this series into part two or two parts uh today two hours is fine right so then sure uh two hours yeah one one half hour today and then like a one one and a half hour like a tomorrow uh no not tomorrow because people people will not be able to like uh sit for that long right maybe uh so i think i think we should give them a rest and uh like we can we can cover these things so right now it is 8 30 so one and a half hour will take probably till 10 o'clock we will uh yes yeah yeah yeah yeah so today we'll try to clarify the fundamentals and so that uh they will be able to like uh start ahead with these things and again so like day after tomorrow so we can try to discuss some of the advanced topics so one's a paper but again the main agenda is to completely cover encoder decoder attention models transformers but theoretical part practical part in the future i'll be uploading it so you don't have to worry this will be a part of a deep learning playlist guys so i think we have more than 162 but before joining please do give a hit a like for subdancer session today uh and i think you'll love this session because i've seen a lot of his sessions so probably you'll also love it okay so then should uh to you uh you can share your screen and uh you can start the session yeah sure sure yeah yeah uh just just wait for the moment so like uh yeah sure four to five minutes sure so we'll wait for two minutes till he joins we'll announce that soon when will you uh okay so i think we can start yeah so you can share your screen and you can start oh yeah sure [Music] are you able to share this screen uh yeah chris okay uh uh can you please give me the access for this one okay just a second let's click on the screen option and then uh multi participant uh where that option will be uh like a shared screen option right so you will be able to see like a one thick mark and uh you have to click multiple partitions okay okay you are doing in your workstation huh amazing yes yeah so i think i'm valuable right clearly yeah yeah really audible okay so fine guys uh hello everyone so like uh my name is dhanja kumar and uh today i will be talking about like uh uh basic of i can say like uh how this language translation works and basic of encoder decoder and in the same sequence so we are going to talk about many more things so first of all i will try to give you a brief introduction about a problem statement right so what kind of a problem that we will be able to solve if we are trying to use this uh encoder decoder or maybe a attention-based model maybe a transformer based model and then probably we can try to like uh get into a but or any kind of a transfer learning based model right so first of all let's try to understand a problem so what is the problem that people were facing and why we used to talk about uh rnn lstm like ls rna lstm or like rna gru right and then we will try to understand that okay so what is encoder what is decoder right what is a encoder with a simple rna or i can say a basic rnn cell or what is the kind of encoder that i will be able to build with the help of bi-directional rnn or like uh i can say like a like a in a deep rnn right so these are the things that we are going to understand and then we can uh we will be able to understand that what is a transformers right so we will try to understand what is transformers because unless and until you will not be able to understand so like what is uh what is rnn what is lstm what is encoder what is decoder so it will be difficult for you to understand a transformer transformer is not that tough but in in terms of like an implementation in terms of architecture but yes these are the basic things encoder decoder rnn lstm gru and based on that you will be able to understand a transformers once you will be able to understand a transformers then we are going to talk about one research paper published in 2018 that is called as attention is all that you need if you are able to understand this particular paper right so let me let me just show you these things like uh there is a paper called as attention so this is the research paper that we are going to discuss we are going to talk about and this was a research paper which has been published by a google around uh in 2017 uh end or 2018 right so this was a breakthrough that people have made which has reversalized the entire nlp right so whatever kind of a model that you will be able to find out whether i'm talking about a bot or maybe a albert or maybe electra maybe a gpt one gpt2 gpt3 right or maybe uh like a transformer excel so all of these models has been derived it's been built on top of this research paper itself which is called as attention is all that you need and here you will be able to find out one of the complex one of the kind of a black box kind of a diagram right so once you will be able to understand this diagram right so what is a multi-headed attention what is a feed forward network in this particular place how these things are placed what is the number of layer of these networks then you will be able to understand what and all of those things which people have published after this paper in an easiest possible way so first of all your base must be very much clear right then only you will be able to understand each and everything so fine let's try to uh just problem statement just uh you can can you zoom in the screen once yes yes screen so don't worry so i'll like uh i'll zoom in the screen yeah short shot uh let me check how it looks like yeah now it's looking fine i think yeah yeah now fine okay so fine guys so first of all what we will try to do is so first of all before discussing about the architecture right because like uh unless and until you are not facing an issue you are not facing some kind of problem so you are not going to do any kind of innovation right so once you will face some kind of issue some kind of a problem then you will look for some kind of innovation so that you will be able to solve your current or existing problem right so this is a kind there are variety of the problem that people was facing right so if i'll try to name one or two problem so the first problem which comes in my mind is language translation right so we see we have seen that like there is a google translator right so once you will try to like uh say something or once you will try to type something right in a google translator so it will try to convert any x language to y language maybe from english to hindi or maybe hindi to some other language or english to french english to german or in some other languages right it will be able to transfer in an easiest possible way that is that is uh particularly fine so language translation is one of the issue that people was facing so they were not able to translate this entire language right in their respective or in their like uh in in their like respective different languages or maybe hindi to english or english to hindi right because whenever we talk about a language right so suppose if i'm trying to say some word or suppose if i'm trying to like uh trade some kind of a sentence right that like uh my name is tanja kumar right so again if i'll have to say a same thing in english right in hindi right so i'll have to say like the sequence of the word right so it's it's not a kind of a one-to-one translation that you will be able to do my name is sudhans kumar so it's not a one-to-one translation or let's suppose if i'm talking about a data science right so data science or ai is an area where you will be able to use a machine learning deep learning nlp computer vision or like advanced nlp reinforcement learning and you can solve many real time problem now if you have to translate this line into some other sentences into some other languages right so for sure this translation is not just a one-to-one translation you will start with okay so data science so you can't translate this data science exactly in the same way like in a one to one one to one way right so here what you have to do so here first of all you will have to understand the sentences you will have to understand a context and then on top of that so you have to generate a new sentence you have to generate a new word which is which must be arranged in such a way that it can make some kind of a sense it will be able to give you some kind of a meaning right so in this way to solve this kind of problem right so people was facing uh lots and lots of different different kind of issues and they were not able to translate any uh any of these kind of a situation or they were not able to transform this kind of sentences in any other sentences that was the first problem second problem if i'll talk about right a question answering system so let's suppose nowadays so almost every website right every website or like every source provider so they used to integrate a chatbot right so now chatbot again so i can ask one question so suppose if i'm trying to look for some kind of a like some kind of material or suppose if i'm looking for some kind of a maybe like a some kind of a maybe like a some kind of subject or something right so i can ask a question that okay fine so i'm looking for a mouse right now a meaning of mouse is different if if i'll try to understand a meaning of mouse in english right and if i'll try to talk about this mouse in terms of like a hardware right so meaning of this mouse is different both the places suppose if if i'm trying to like uh chat with my finance team right and if i'm saying that okay fine so bull is going high right so here if i'll try to find out a meaning of a bull right so i think we all know a meaning of a bull it's basically an animal right a part of dictionary but if i'm trying to say that bull is going high to my finance team right if i'll try to talk about these particular things this particular context to my finance team so basically a meaning of bull is not an animal meaning of bull is basically a share market or a stock market so here you will have to understand a context unless and until you will not be able to understand a context right and this context will not be mentioned inside the sentence right so you will have to understand this context based on the sentence or based on the conversation which is going on and that's again an issue right so again as a human being our brain is trained to do that but what about machine how they will be able to understand it right again if i'll talk about another kind of a context suppose let's suppose i have to like find out a sentiment of some of the words or sentiment of maybe some of the sentences right for sure so unless and until i'm not able to understand the entire context right i will not be able to like find out the exact sentiments maybe suppose if i'm trying to give up like a review to a swiggy right so i can say that okay fine so like delivery was good but uh foot taste was not good right in that case system will be confused whether we are talking about a negative sentence or we are talking about a positive sentence or positive sentiments or negative sentiments right again there will be an issue right so these are the kind of issue different different kind of issues that people were trying to face and keeping these things in a mind keeping these things in our mind so they have started designing a different different kind of a neural network it doesn't mean that we didn't have a neural network we had a neural network even before that we had a neural network which can try to solve a complex problem which will be able to understand our relations that is completely fine but unless and until we are not able to get an architecture which can work for a certain set of a problem statement i can say that i will not be i will be on a point zero i will be on a point zero and i will not be able to provide any kind of a solution and keeping these things in mind right so keeping these things in our mind so people have started designing a different different kind of a network and the first kind of a network that people have designed was lstm long short-term memory but before lstm people have done a lot of research in cnn convolution neural network and people have released a different different kind of architecture as well like a lean it like a lx net which is 16 which is 19 arrest net 50. that's not 32 34 sorry unless net 101 rest at 150 inception net google net darknet so these are the cnn based network now why it will not be able to solve these are also neural network basically but why it will not be able to solve the problem statement which i was talking about it will not be able to solve a problem statement because it will try to train its weight that is fine based on the input data that we are trying to give to the model that is that is completely fine but at the same point of a time if you have worked in cnn you must have seen that that from one end right we try to give an input that is fine and then there will be a weighted weights so we try to get an output then we try to do a backup propagation and then we try to train our weight based on the loss that we have received and the optimizer that we are going to use right that is fine once i will be able so every time in every iteration so we are trying to send a data right from our input side to output side and in a backward propagation so we are trying to train the weights we are trying to understand the relations based on the loss and we are trying to reduce a loss over there right it doesn't matter what kind of a loss that we are talking about whether it's a l1 loss l2 loss hinge loss uber loss cross entropy loss right so there are different different kind of a loss function which has been used with the respective networks now so in this situation in this situation what happens is so we are trying to send a data into a neural network from one end and we are able to get an output from another end that is completely fine but in this case right in this case what is a lag a lag over here is that you are trying to just send a data into a forward direction and then in a backward direction you are just trying to train the weights and in this way it will not be able to remember that what i have sent last time right it will not be able to remember that what kind of a input data that we have given last time and that was an issue with the cnn if you will try to utilize cnn to solve these problems cnns are good to understand the image students are good to interpret our videos that is completely fine for video analytics for image analytics you will be for audio detection for object tracking for object segmentation you will be able to use a variety of a cnn and that is completely fine but here in case of a language right here in case of a language you are not only supposed to train the weight you are supposed to understand you are supposed to remember a context as well suppose if you are going to ask me a question right so for sure whenever you will try to ask me a question as of now right so you will try to ask me a question related to data science related to a neural network right and it's it so i have to understand that particular context and based on that i'll have to give an answer if you are going to change a context let's suppose if you are trying to ask me some kind of a personal question right let's suppose if you're trying to ask me some kind of a question from a finance right from a banking from some kind of a domain so in that case i'll have to change my context and then based on that i'm supposed to give you an answer i'm supposed to give you an output and this is where cnn fails so cnn always try to train a way to understand a relationship of a static data but it will not be able to but it will not be able to memorize it will not be able to understand a context right and if it is not able to understand a context so if i'm talking about a quotient system you will not be able to build it if i'm talking about maybe like a like a sentimental system you will not be able to build it if i'm talking about some kind of a language translation you will not be able to build it if i'm talking about some kind of a fill in the blanks right you will not be able to do it if i'm talking about a text summarization right you will not be able to do it i think you must be using that in thought in sort uh like of where like we used to get a different different news in a like a abbreviation right so you must have seen that that so they they are trying to use basically a dual network based architecture in a back end to summarize a text there are two kind of a summarization that you will be able to get one is extractive and another one is abstractive basically we will talk about that in a later stage for sure that what is the meaning of abstract and distracting but in all of this situation in all of this scenario so you need some kind of a network which will not be able to understand only a weights or only a relationship between the data but you need some kind of a network which will be able to remember a context which will be able to memorize something from a past which should be able to change its context based on the input that you are trying to give and keeping these things in our mind so people have derived a network called as long short-term memory and again inside this network there are like hundreds of variants of a lstm network that you will be able to get right so the popularly known lsdf networks are rnn and lstm now what people have done over there so people have said that that okay so it's completely fine let's suppose i have a neural network right i'm trying to give some kind of input for sure i will be able to get some kind of output just like a cnn right or basic of cnn i'm not getting into a depth of cnn because it will be all sort of different discussion right so let's suppose i'm talking about one kind of a neural network which will be able to get input right so input i will try to give and then i will be able to get some kind of a output that is completely fine but what if right what if if i will try to send some kind of a output back to the network itself right in this case what will happen so basically in this case it will be able to understand that what i have done last time right it will be able to understand it will be able to memorize that what i have done last time and if i will be able to get this kind of a network right for sure i will be able to solve some kind of a problem and keeping these things in our mind so people have like uh people have given you a network called as rn right inside rnn so you will be able to get lstm and inside so again so another variance of rna that you will be able to get is a gru so recurrent neural network people have discovered and people have developed right which will be able to take an input that is completely fine it will be able to give you output but it will be able to take a feedback as well right it will be able to take a feedback from a output to the input for the next one right and in this case for sure you will be able to or for sure your network will be able to understand a context it will be able to memorize something right so people have developed long short term memory and get it recurrent unit in this one so people have designed a different different kind of a gates i'm just trying to give you overview because chris has already uploaded this uh videos lstm and gru so i'm expecting that you must have gone through those things but yeah i have to create a story for that so that like this is what i'm trying to do in this particular place okay so get it recurrent unit so in this unit you will be able to find out a for gate gate you will be able to find out a memory channel you will be able to find out an output so this is just a black box of this lstm network if you will try to search rn lstm you will be able to see a complex architecture a gated architecture where you will be able to find out a forget gate you will be able to find out a memory gate and you will be able to find out output gate so for gate is responsible for like a changing the context memory gate is responsible memorizing something new or adding something new to the context a previously learned context or a memory channel an output gate is responsible for giving you a final output that you need right so this is something that people have derived right now again people have started facing some kind of issue because in case of in case of let's suppose if i have to like uh if i have to build a question answering system or let's suppose if i have to do a language translation so i can go to a google translator and i can try to type my entire sentence and for sure google translator is supposed to understand each and every sentence grammat grammars that i'm like grammars of that particular sentence and then it is supposed to give you some kind of a output this is what google translator is supposed to do right now so here in case of in case of like a lstm if i'm talking about right or in case of a gru if i'm talking about so these networks are not that good alone like alone again i'm not saying that this networks are not good i'm saying that this networks alone are not good to perform this kind of a task where it will be able to take a multiple input right it will be able to learn and then it will be able to give you a respective output for quotient system for any kind of a language translation you must have received you must be receiving a mail nowadays since last one one and half year so google has implemented a feature in a gmail where it will be able to give you a response to your mail as well right four to five response based on the probability factor it try to give you so unless and until unless until this model is not able to understand your entire mail how it will be able to generate an output right so we we want some kind of a model which will be able to take in number of input it will be able to understand and based on the training that has been done on a huge data set so it will be able to give you some kind of output and then people have started exploring about a new different different kind of a technique and people have discovered or people have released one research paper that is called as sect to set or sequence to sequence learning with a neural network and this is what we are going to talk about this is the agenda that we are going to cover so people have basically released this particular research paper which can solve this particular problem till certain extent right i'm not saying that it will be able to solve all of your problem there are like a 25 plus task people used to define in nlp to like uh to to build or like to test any kind of a model right so people came up with this particular idea that is called as sequence to sequence right so iliac sucker who was a main researcher along with like uh oriel and along with avili so they have released this research paper and then they have claimed that that okay so whatever issue that you are facing so maybe this paper can solve that particular issue maybe not 100 but till certain extent right so i will be able to solve your problem and keeping these things in my mind so people have released this particular research paper called as sector sec now so let's try to understand right first of all that what kind of a data input what kind of a data input or what kind of a different different models that you will be able to encounter so let me share my white screen now so whenever you will try to work with nlp right so you are going to face a different different kind of a like a different different kind of a situation different different kind of a scenario so somewhere you will face that okay so somewhere you will be uh you will try to see that there are a multiple input right there are a multiple input and you are looking for basically a multiple output you are looking for basically a multiple output in this particular case right so this is called as sec to sec means we are trying to give input as a sequence and then we are trying to get output as a sequence for example so if i'll talk about your gmail right so whenever you try to fire a mail and uh it will try to generate a response it will try to generate a response for the person to whom you have fired a male right so it will try to give you a multi-word output possible right so you're trying to write something in by using a n words and a for sure google will try to give you some kind of output by using m number of words right so maybe you can have n number of input and m number of output you will be able to generate it could possible right it could possible that you will try to give just you will try to give a multiple input but you will be looking for just one single output right you will be looking for one single output and this is called as sec to vec basically so sequence to vector previous one was sequence to sequence this one is sequence to a vector model basically here you are looking for only one output but as an input you are trying to give a multiple data right now there could be another situation there can be another scenario where you will try to give you will try to give just one input and you can expect a multiple output and this is basically called as vector set so we are trying to give one input and we are expecting a multiple output over here right these are nothing but it's a feedback right it's a feedback from the previous one as i said so to remember the context it always try to store the feedback so there can be a vector set right and it could possible that you will try to give only one input and you will be expecting only one output right so again this situation is called as vector vect so we have sectosec we have sectoback we have vectorsec and we have vectorvic right so these are the kind of a situation that you will be able to find out in terms of input and in terms of a output now so whenever i'm talking about this one right so maybe i can try to replace these things with rm rnn cell right so any kind of iron in cell maybe lstm maybe gru right so this cell can be anything this is basically a cell this is not a neuron many people think that i'm talking about a neurons over here but no this is not a neuron at all it's a cell which i'm talking about right it's a complete cell inside that you will be able to find out a forget gate a memory gate and the output gate right if you will try to go through a chris channel and if you'll try to understand this rna network right that he has already uploaded a long back right so these are nothing but it's a cell right so what we have we have done is so we can try to combine each and every cell and then we can try to build this entire network but still but still here you will be able to find out that we are trying to give our input we are trying to take an output from each and every cell it it is possible that it will not be able to serve a purpose right it is possible that it will not be able to give you a better accuracy or a better result and keeping these things in a mind because we already had a basic of rn basic rncl we already had we already had a lstm we already had a gru which will be able to understand a context which will be able to memorize a previous one right a feedback of the previous one and which will be able to give you an output but people thought of and i people people have started thinking about a new idea and people said that okay so what if if i can try to separate input and output right if i can separate input and output if i'll try to separate the entire network of input and if i'll try to separate entire network of output in this way what will happen whether i will be able to achieve like a better result in terms of language translation in terms of question answering in terms of fill in the blanks in terms of our text summarization takes abbreviation right in in terms of like uh maybe like uh some some kind of fill in the blanks right so people have started thinking about this and keeping these things in a mind so they have released that particular paper that is called as sequence to sequence by ilias satsika and this is one of the breakthrough that people have made right and after that so there are like a many different different kind of a research paper that people have released which is responsible for solving a different different kind of a real world problem that we used to see in a current situation in a current environment now whenever when i talk about this paper right so again so you're not supposed to go through this entire paper you're not supposed to read out this entire each and every line right so you're just supposed to understand architecture and a mathematics behind that if you're able to understand it it's completely fine once just for once if you go through this entire paper you will be able to understand because that is the core that is a base of each and everything right now so let's talk about this sequence to sequence research paper and let's try to understand that how it is going to solve a problem what kind of a problem it is going to solve and how we can try to utilize these things how we can try to use these things by using this code right so how i will be able to train my model how i will be able to build my own chatbot how i will be able to build my own language translator how i will be able to build a kind of a model which will be able to do a text summarization right so let's try to understand the sequence of sequences paper i'll try to restrict myself just uh to this pic this particular paper so that you will be able to get a neat and clean a very clear understanding and along with that you all will be able to understand encoder decoder which is going to help you out to understand transformer and a research paper called as attention is all that you need because this is where we have to reach out right attention is all that that is that we need okay so now let's start talking about this research paper so here right here so from this research paper a concept of encoder decoder came into a picture we already had a sector sector week vectorweg and sectorate right but people said that okay so what what if if i'll try to like a separate a input network and a output network all together right in this case what will happen what i will be able to achieve right so people have separated this input and output network and they said this input network they have named this input network as a encoder so whenever we talk about encoder it's nothing but it's a input network which will be responsible for taking an input and understanding a relationship between the inputs based on the loss based on the difference between y and y hat that you will be able to receive so here in this particular research paper you will be able to find out or you will be able to see two section right two section so here you will be able to find out this section as well as you will be able to find out this particular section right so here you can try to see that that this is a network these are the these are nothing but it's a cell you can say so the basic rn cell lstm or gru that you have studied so far so forth so this is nothing but it's a cell it's basically a rnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn cell it could be lstm it could be gru and a variety a variety of different different kind of a cell so now here you will be able to observe that we are just trying to give some sort of a input inside this network nothing else we are trying to give inside this one now after taking an input so we all know that if i'll talk about rn and lstm right so rn lstm is kind of a neural network which will be able to change so based on the context change or based on the memory gauge so it will be able to give you some output so it will not give you output just based on the weight which is been trained it will try to give you output even along with like feedback that it has taken from the past right so here what people have done is so people said that okay fine so like uh let's try to like separate this input and output let's suppose let's suppose as an input i'm trying to give one sentence right so like i'm trying to give like uh i am fine right so let's suppose this is the input which i have given right now it is supposed to perform one task it is supposed to translate it is supposed to convert this i am fine right into a respective hindi language right so here probably i can look for mera okay sorry i'm not good with like a man i think this is how sorry my bad uh so peak right so this is what a kind of a output which i'm expecting right so i'm trying to give a input and i'm i'm trying to get an output right so this is the kind of output which i'm expecting right now here you will be able to find out one very interesting things right this is a very small and very simple example but the interesting thing over here is that i'm trying to give i fine translation of i is man it's completely fine and right so we are not trying to give or we are not trying to get a exact same output as an input in the same sequence like i said so whenever we deal with the language so it's not like one-to-one mapping so you can try to give any kind of a sequences and then after translation it is supposed to generate a respective meanings right it is not supposed to maintain the sequence sequence can be anything it's not a one-to-one word mapping otherwise we would have done that with the help of some kind of a nltk library as pc augmented libraries right but no we can't do that basically because again so the way i i can like try to like give some input and the way other person will try to give input can be different but it could possible that we are trying to like generate we are trying to form a similar kind of a meaning right so here basically this is the input network right this is the input network that people have proposed and this input network is called as encoder basically which will be responsible just for taking an input nothing else it will try to take the input now here you will be able to find out that we are trying to use some kind of a rn kind of a network right because we have an input it's completely fine and we have a feedback it's not an output so don't try to like a misunderstand that this is representing an output no not at all it is basically trying to represent a feedback basically so it will try to take some input and then it will try to like uh take a multiple input as you can see right from a different effect like a input channel from the different from the different different cell and then all of these things are trying to give up feedback all of these things are a neural network rna and lstm rna gre based network which will try to understand a meaning in between all of these words and here we are able to see end of string right so that it will be able to understand that okay fine so this is the end of a string right this is just a token that people have given right this is just a token so that it will be able to understand now let's suppose if i'm trying to talk about maybe a sentence with a 10 different different words 10 words right so in that case i can try to increase the number of the input network or maybe i can try to create a steps in which i can try to give an input so basically encoder is nothing but it's basically a rnn lstm based network which is responsible for taking an input that's it and then it will try to understand it will try to understand the context it will try to understand our relations means wait basically and then it will try to give you a final context vector or final context weight over here so this is called as a context vector i can say this is the context vector now what is the meaning of context vector right so we are trying to give a multiple input right we have given i am fine right this is something which i have given as an input now so it is trying to understand it is trying to summarize each and every word and then it is trying to send this input right send this context vector as an input to a different network so this is basically a whole sort of a different network that people have released that people have built where so you are trying to give a input from the previous network from the encoder network and here you will be able to find out that we have a output over here and this output network is called as decoder right so this is basically called as a decoder now how decoder works or let's talk about one situation right let's try to understand that how in a very first place training happens right so if i'll talk about training so in case of a training what i will do is so i can try to prepare my data so i can have an english sentence right and i can have a hindi sentence so these are the two these are the data which i can prepare now i can have a multiple english sentences and it's respective hindi sentences i can have this is a data which is going to this is the input data which is going to work for me as a training data at the time of training because let's suppose if i'm trying to train my model from the various scratch right i've just designed the neural network now let's suppose i'm trying to train these things for the very first time so for sure i will try to give what i will try to give english sentence i will try to give its respective hindi sentence right maybe i can try to prepare all of those data manually right fine now after doing that right after doing that so we can try to convert this data right we can try to convert this data into a embedding vectors embedding vectors because for sure your network will not be able to understand right your network will not be able to understand your english word or your hindi word right it's just like some kind of ascii character for a system so your system will be able to understand some kind of a numerical representation so we can try to like uh do a numerical representation so we can try to create our emitting vectors by using some kind of a pre-drain network or maybe by using some kind of a algorithm like a tf idf or maybe a sentence tokenizer maybe i can try to use a globe model maybe i can try to use a word to vector right so by using these things i will be able to generate a respective numerical values of my english sentence and my hindi sentence i will be able to do that now once i will be able to get embedding vectors then this embedding vectors for english right embedding vectors for english will go as an input so it will go as an input over here inside your encoder decoder network so this english sentence will get inside this network as an input that is completely fine and here as an output so you can expect hindi word so you can say that this is my x i and you can say that this is my y i right so whenever you will try to give x i so for sure it is going to give you y i hat right means this is the predicted value this is going to give you a prediction this is a predicted value that it will be able to give you right now this is the expected one it's a predicted one it's the expected one for sure there will be a differences you will be able to find out it means what it means you will be able to find out laws once you will be able to find out loss or cost then what you will do then you will try to back propagate you will try to back propagate from a decoder to encoder by saying that that no whatever input you are trying to give it to me it is trying to give me some kind of a loss right it is giving me some kind of a cost means there is some error there is some difference between my expected y value and the predicted y value there is a difference between y i and y i hat right so in a backward propagation it will try to update the weights it will try to change the weights just try to understand this part so here all of this cell are nothing but it's a rnn cell any any kind of argan cell you can say so these are nothing but it's an iron cell it's not a neuron it's just not a neuron one single perceptron you can't say that right these are basically a rna cell it's completely fine right so you have given english sentence as an input as a x and then you are able to get so you are giving like a hindi sentence as a why right so at the time of training i'm just trying to talk about a situation of a training so at the time of training what will happen is so at the time of training what will happen is so it will try to take input so let me like uh erase all of those things okay okay so at the time of training so what you're trying to do so you are trying to pass basically a english sentence and here so you are expecting some output in in this sentence right so this is the output that you are expecting you can say these things as a y i so y i and then for a respective x i that that you are trying to pass right for sure it is going to give this is just expectation right this test expectation what output that you will be able to get you will be able to get basically y i hat right so based so you will try to find out a difference between y i hat and a y i basically right that is something called as a loss or cost or error that you will be able to get now you are going to send this loss inside a optimizer so in case of optimization what will happen so it will try to back propagate throughout the network across the time axis right so it is going to back propagate basically about the time axis and then it will try to train itself it will try to up when i'm saying train itself it means what it simply means that it will try to update a weight over here here here and here it will try to update the weight so that it will be able to understand relation in a better way again it will try to give some input right so it will keep on giving all the input that we have taken over here right and it will keep on adjusting its bait right even in this network it will keep on adjusting its weight and once this model will be trained once this model will be so here the main idea or the main agenda is to talk about encoder decoder right so this network right this network as i said this is the input network and this is the output network so this is called as encoder and this is called as a decoder now at the time of testing at the time of prediction what will happen so you are going to give input over here so suppose if i have a sentence right so i can try to give an input over here it will try to generate its respective context vector and then it will try to pass this value to this one right it will try to pass this value to this one and it will try to generate some output then this output right this output as you can see this output you are trying to send again over here to the next network this output you are trying to send again to the next network this output you are trying to send again to the next network right so whatever output that you are because when i'm whenever i'm trying to do a language translation right so in case of a language translation in terms of like a question answering so i'm supposed to understand a previous context as well whatever i'm trying to say right if i'm not able to correlate this sentence which i'm trying to say as of now with respect to the sentence which i have just said one second back right for sure i will not be able to make any kind of a sense right so in this this is what this network does so in case of an encoder take an input try to understand a entire relationship between the input sentences with respect to output sentences try to do a backward propagation based on the loss try to train itself and then finally right and then finally what will happen is so then finally it should be able to give this word context vector this is nothing but it's a context vector to the input as an input to the decoder network which will start decoding your sentences one by one and then it will be able to give you a final outcome like if i'm saying i'm fine so it will be able to translate back these things to the like a matic right and then end of the string so end of the string as a end of the string so it can try to give you a full stop or maybe exclamation marks maybe kind of a question marks or any kind of a thing but this is just a notation this is just a notation that okay fine so now you have to stop right after that we don't have to generate any kind of things so whatever it will generate in this area it is going to give you this as an output but but again people have started facing some kind of issue right here if you will just try to like a focus right if you try to focus so here in this area we are trying to give or we are trying to propagate our data just in one direction if you will see right if you'll see over here so we are just able to propagate our data into one direction now whenever we are talking about sentence right so in case of a sentence of if i'm trying to generate some kind of a word so a word before like a some kind of a word before that word and after that word also matters right to understand the context for sure so let's suppose like i have given you example that if i'm talking about like bull is going high right so a bull is going right suppose if i have just trainer centers like bull is going what you will be able to understand i don't think that you will be able to understand that i'm talking about a finance market i'm talking about the stock market right but if i'm saying that bull is going high then only you are able to understand or this this uh like when i'm saying bull is going high and if you're able to understand the context that okay fine i'm talking about these things in terms of finance and so you will be able to understand that okay fine so what is the actual meaning what i'm trying to convey what i'm trying to represent but if i'm just saying that okay fine bull is going you will understand that okay fine so i'm just talking about some animal which is going right which is going somewhere you will not be able to understand that share market is going up right so in case of a sentence right a word which you are going to take before and the word that you are going to take after that matters a lot but here in this network right so you will be able to find out that my data set is propagating just in one direction right but what if right then again people have started thinking about a new approach new idea so i think like we all are able to understand encoder decoder so encoder decoder nothing but input and output kind of a network right inside encoder i can have like any kind of a basic network inside decoder i can have any kind of a basic network in the encoder i can try to give any kind of output i can try to create any number of the cell not a neuron cell which can take a input and in our decoder we can try to create any number of cell which can give me output that is completely fine now so here i was talking about basically like a kind of a situation where you have to understand a context before and after you have to understand a word before and after as well right and this is not a network which will be able to understand your data in both the direction right this is basically a network which can which will be able to understand a data just in one direction right so what we should do in this case then people have proposed right people have proposed that what if if we are going to use a bidirectional lstm or a bi-directional rnn cell over here right in this case maybe i will be able to solve this problem so people have started coming up with the idea that okay so let's try to build a kind of a network let's try to build a kind of a network which will try to take our data right which will try to take a data but they will try to propagate this feedback as well as output right in both the direction so for sure i will i will come to this topic i will try to talk about this one like what is the issue with the bi-directional like uh what is the issue with the but yes this is not the best network that we are talking about right we are just talking about a basic basic of encoder basic of decoder over here then people come up with the bi-directional lstm bi-directional like encoder so inside that a base network can be a bi-directional inside that i can try to keep maybe a deep rnn maybe a deep lstm deep gru means a stack of rna stack of gru i can try to keep in this way it will be able to understand our relations in a better way it will be able to memorize a context in a better way that we will try to explore that we'll try to understand but let's try to complete this topic so this is what people have proposed in terms of a sequence of sequences i'm trying to give one sequence and then i'm able to get one output as a sequence now okay so now here you will be able to find out that people have given one function right let's try to understand this function as well right i was talking about basically encoder and decoder right now here in this one you will be able to find out that this is the sigmoid right now what the sigmoid is trying to take right and what is our meaning of this ht in this particular research paper so this ht is nothing but right so this ht is the ht that you are able to get is nothing but it's a kind of a output that you can expect from one particular cell right so here you will be able to find out that whenever i'm talking about a sigma like whenever i'm trying to talk about like a like a h3 right so this is nothing but it will be an output from this sigmoid function which is trying to take basically a weight of x at time t and weight of this one at time t minus 1 right so basically it is trying to consider a current data as well as means current input is able to understand as well as it is trying to understand some kind of a feedback right this ht is nothing but it's a feedback from the previous one that's the reason it's an hth is basically for the output right which is basically representation for the output in this research paper so here you will be able to find out that here it is trying to take some weight it is not trying to take exact output as an input no it's not like that it is trying to take some weightage of the output as an input to the next layer and this is what has been represented and based on current input and the output so it is trying to give you what is trying to give you a final result it is trying to give you a final prediction in this particular area in this particular place okay so now this is what uh model like always try to do and that's the reasons they have mentioned that the neural network is a neural generation for the feed forward new network to the sequence given a sequence input suppose if i'm trying to give a sequence input from like a x1 to xt right on a time axis basically suppose if i'm trying to give n number of input right so it will try to get into a rn and a standard rn computes a sequence of the output by iterating the following equation right so it will try to first of all calculate this one and then finally it is trying to multiply this hd with a weightage and then you will be able to get this final outcome so to calculate this final outcome you are supposed to calculate first of all h of t and to calculate h of t we are trying to consider a weight from the current context from the current input as well as we are trying to find or we are trying to take input from the feed forward or from the feedback network as well from the previous network as well and this is what like this system does now in a later stage they have mentioned that rn can easily map sequence to sequence whenever the align between input and output is known ahead of time now what is the meaning of this one so the meaning of this one is very simple that whenever you are you will be able to get or you will be able to give a data which is mapped means let's suppose i'm talking about english and hindi if you are trying to give a mapped data that okay fine so for this input this is the output which i'm expecting so for sure it will be able to align itself it will be able to align itself means it will be able to understand our relations in a backward propagation it will be able to understand relation in a backward propagation and then accordingly it will be able to tune it will be able to train its weight it will be able to adjust its weight and finally it will be able to give you an output here so again you will be able to find out this formula which has been used right this which has been used for our internal hidden layer calculation so basically it's it is trying to give you a probability of finding some word probability of giving you some kind of output out of the n number of the like a n number of the probability so here you will be able to find out that it is trying to give you some kind of a yt and basically just trying to find out the probability of giving you a yt when some sentence or some input has given this is the meaning of this particular equation that you will be able to find out so it's trying to give you a probability of giving you an output as a y right output as a y when some input has given to you when something has known to you right it means that when you already have some kind of input and when you have already learned some kind of a wait and this is what it is going to give it to you so it is so here we are just trying to find out that what is the probability of getting this this this then this all of those things is given right so you must have seen a name biased equation so name bias equation so we used to talk about probability of a of b so it's nothing but probability of occurrence of a when b is given to us based on same thing so it is trying to give you some sort of a output so based on your input and based on the learnings that it is going to do internally so it will try to give you an output in as a simplest possible way now if i'll try to move ahead right so if i'll try to move ahead so here you will be able to find out a decoder side so in a decoder site so yes it is just going to give you like a output one by one one by one output from each and every decoder output network that has been given to us and uh here they have given you just a benchmarking benchmarking in terms of like uh with what in all data set what kind of a training that they have done what was a total number of the sale that they have created so thousands sell at each layer the thousand dimensional word embeddings they have used so this is the specification that they have given to you when they have written this research paper it doesn't mean that that whenever you will try to build your network you are supposed to follow a same thing it could possible that you can try to change and in general yes we do change all of those things whenever we try to train our own network it's just a hyper meter when i'll talk about the code you will be able to understand that okay fine so code is nothing anyone can write it down and anyone will be able to implement it now so here you will be able to find out basically a benchmarking so here for a language translation right for the language translation so they have given you a value score blue so blues code they have given to you bleu so it simply means that bilingual evaluation under study this is the meaning of this failure score so for any kind of a problem statement which is related to a language translation i'm just talking about language translation i'm not talking about entire nlp right so for any kind of a language translation so we always try to calculate a what value score basically a bilingual evaluation under study so this is what a kind of a score that we used to calculate and here if you try to go through this research paper so you will be able to find out that a single forward lstm with a beam size of 12 means a size of a network is equal to a 12 single forward in a single direction right so this was the value score which they have received then in symbol with the five reversed lstm beam size is equal to one so they have received this as a value score in symbol with five reversed lstm beam size 12 so this is the believer's score basically this is what a kind of a standard result they have given to you based on the standard data set so whenever you try to write some kind of a research paper so people always try to mention their findings people always try to mention about the data set people always try to mention about their accuracy that they were able to achieve people always try to mention basically are different different networks and experiments that they have done so whenever even if you are going to write some kind of a paper in a future so for sure you are supposed to follow a same kind of the same same set of scenario and then you will be able to like uh build some better kind of a model now again right so these are the models benchmarking that people have given to you so again you're not supposed to look into all the models you are supposed to study because all of them are like a sequence to sequence model itself means there will be a encoder there will be a decoder and you will try to give some input it will try to give you some kind of a output in this particular way now again so they have like a build a model so here you will be able to find out that here they are trying to give a thousand best with the in some a thousand best with an in symbol of five reversed lstm and this is the score that they were able to get now if you try to find out this network yes you will be able to find out a design and architecture of this network in the easiest possible way over the internet right and all of these things has been performed based on this particular data set ntst 14 data set so they have like a basically like a french test set data set so they have performed this kind of a like uh this kind of operations they have performed on top of this one right so here yeah so they have released our truth ratio as well and finally a conclusion that you will be able to find out now so let's come to this one this is the architecture this is the previous one was basically a sector sec model it was a very generalized structure now so any kind of a model that you will be able to build based on sector sector that that is completely fine it depends upon me that how many input layer how many input that i'm supposed to give in my encoder side how many output i am supposed to receive in a decoder side right so set to sec was a basic i would say a fundamental papers that people have released now based on that based on that so people have designed a different different kind of a network and people have derived a different different kind of a situation people have derived a different different kind of a scenario now in this paper right in this paper you will be able to understand the actual effect of or actual calculation or the actual phenomena which happens actual things which happens behind the scene of encoder and decoder so the name of the search paper is effective modeling of encoder decoder architecture for joint entity and relations extraction so entity recognization name entity recognition let's suppose suppose if i have to like uh detect a name from the entire sentence which has been given to me name of a person name of a place name of maybe a building right name of the street name of the animal name of the word or a relation extraction suppose if i have to understand the relations may be based on the noun pronoun verb adjective age verb all pos parts of a speech right so this is basically our research paper which people have designed which solves uh different different purposes but based on same sector paper that we were talking about and here in this research paper you will be able to find out again base is same right base is same you have an important model you have a decoder model right but again inside an encoder model it could possible that i can try to use up like a unidirectional lstm or maybe i can try to use a bi-directional lstm it could possible that i may end up using a stack of lstm which is called as a deep lstm or deep rnn cell i will end up using in both of the side or maybe in a one of the side right so based on like this phenomena so the other paper is been written so before moving to the next part so maybe what we can do is we can try to take some questions right chris are you with me yes not sure i'm here yeah so if we have any kind of a question so we can we can try to take those questions um again people are asking about community classes so nothing with related to this okay so nothing related guys any questions with respect to encoder decoder please do ask anything that you have with respect to encoder decoder any questions so that he'll move ahead everyone okay i think no questions as such can you share the pdf just uh yeah so after this like uh uh like uh i'll put that in the description guys we can try to share this one any reason there is a feedback from encoder to decoder as well along with the output from encoder as input of encoder what does this mean what is this uh any reason that any reason there is a feedback from the encoder to decoder as well along with the output from encoder as a input of a decoder see uh the thing is like in in terms of encoder right so it will it is responsible for just for taking the input data so it will try to give you just one output and that is called as basically a context vector or like a weight which is of the learning that it has so context vector of the entire sentences it is going to give it to you that is one thing right and decoder is a kind of a network which is responsible for like uh taking this context vector as an input and then it will try to give you an output now so in general what happens is so once decoder will try to take this as an input and then it will try to give you some kind of output now after that let's try to like uh get back to our slide over here yeah so here what you will be able to find out is like uh so this is the encoder network that is fine and this is basically a decoder network like i said so it will just take a input and a decoder will just try to give you some kind of output now once like a an encoder will try to give you just a context vector right based on that it will start giving you output now so whenever i'm trying to get some output at the time of training right whenever i'm trying to get some output at the time of training so it could possible that i was expecting something else and it has given me something else that is called as a y hat or i can say y so y hat is nothing but it's a predicted value means it's the output given by your model and y is nothing but what i'm expecting at the time of training i'm talking about i'm not talking about at the time of testing right so at the time of training so you it will try to give you this as an output and uh this is the input that it can expect like this is the expected output that you have now so if there is a differences so for sure it is supposed to give up it is supposed to do a backward propagation because unless until you are not able to do a backward propagation for sure you will not be able to adjust a weight of all of this layer which is consists of a rnn cell or a variety of variants of the iron cell like a lstm or a gru so that's the reason so we have like an encoder and a decoder and uh i think this was the question like chris i did not understand the question only okay i think i understood right so based on that i'm like trying to give uh okay uh okay what what does the reversed beam means what is this sorry reversed guys so if you can write your question better way i think uh we can understand and then we can try to explain you i think you can continue to answer and you will take okay amazing explanation encoder is a part of sequence okay someone is asking that is it is it's only used for language translation no unknown it's not like that it can be used for the language translation it can be used for like a name entity recognization it can be used for like a sentence tokenization it can be used for like a question system it can be used for the abstractive extractive summarization it can be used for a fill in the blank so there are multiple tasks which it will be able to perform but the thing over here is so based on the task people have modified a different different networks right and based on like your tasks you can try to use a different different architecture what i'm talking about as of now is a basic basic of or evaluation of this particular sequence sequence model because after that people have discovered this uh like encoder decoder or people have started talking about transformers or like an auto encoder or like a attention based model which is bought elmo gpt1 gpt2 and so on right so basically this is not just a base this is our starting offer nlp i'm talking about but before that first for sure like uh you are supposed to know rn lstm as chris has already like uh declared these things in his uh channel right so if if you will not be able to understand rn lstm or like a rng are you right so it will be difficult for you for sure okay one question he's asking is that all nlp based tasks yes but not the same network it's not like you will build just one network and will be able to solve all the problems so based on the problem statement we keep on like are changing we keep on changing the network architecture right so i'm just talking about a generic approach right so what will happen and how it will be able to understand because everything it's all about what it's all about understanding your sentence understanding a context in both the direction or maybe into a unidirection if if i'm able to because when now you are able to ask some question how because i was trying to explain something right somewhere in between you're confused and then based on that you're trying to raise some kind of a concern it means that you are able to understand a context you are you are trying to ask some question in terms of nlp itself you're not asking some question from finance or from physics or from mathematics or from biology you're trying to ask a question from an nlp itself how because you are able to understand a context and based on the context and based on the paper which i'm talking about we're just trying raising the concern so now this is again one scenario this is a same thing which my network is supposed to understand right okay i think so where are this model uh mostly applied so if i'll talk about uh your whatsapp system right so your whatsapp let me give you a very simple example let's suppose if you will type my name right let's suppose if you type my name so for the first time if you type my name for sure my name is not a part of dictionary it is not going to give you any kind of a like a hint right it will give you some random hits but if you will type my name twice or thrice right what you will do what you will get so after some time so it will start giving you some kind of a proposal it auto completion so basically in your whatsapp machine system so it will start giving you some kind of proposal right and this is where you can apply it now another scenario suppose if i'm trying to build a chatbot from a very scratch right so again you need this kind of a model so okay uh maybe like uh in next session so i'll try to show you one chat bot that we have built for one client that that is basically like a custom chat bot like uh we have not used any kind of a predefined framework we have just used a basic network and based on that we have built a chatbot and you will be able to find out that how good it is to like uh like to communicate with you and how good it is to understand a context and that too in arabic and as well as in english so it will be able to understand all of the context and it will be able to give you all the answer so in that kind of system yes you can use it suppose if you are trying to build some kind of a self-responsive system right which will be able to like uh but auto reply to some of the sentences some of the chat again in that case you can try to use it suppose if you're using some uh like a app called as in sort right so i think most of you must be using like a insert uh app so you must have seen that they will not show you entire news they'll just show you the abbreviation they will just show you the chunk of the news they will just show you the abbreviation of the news right again in that case in case of a summarization you can use it right suppose if i'm trying to like uh write some sentences right and in between those sentences i have i've not like uh like i've missed some of the word it means i have to do a fill in the blanks right in that case i can use it suppose i'm talking about grammarly right it is supposed to understand the punctuation it is supposed to understand the parts of the speech and based on that it is supposed to correct itself right i can use it i can use this one this particular model right so these are the real world situation these are the real world scenario in which we are we are not supposed to just use we are using it actually grammarly yes we are using it gmail yes we are using it whatsapp we are using it chatbot yes we are using it right so in all of this system you can try to use this uh sequence to sequence kind of a model encoded decoder kind of a model which can solve you many of your real world problem yeah now so what is next what type of architecture google translator use so as of now it is try it is using basically a transfer learning based architecture right so i i can't say that it is uh so initially initially because uh this is the paper like a you can you can uh just like uh see over here that uh this is the paper released by a people from our google itself right so at that point of a time yes they were facing an issue and at that point of a time so they have used this sequence sequence model but if i'll talk about a current situation right now they are not using this uh basic sequential sequence they are using basically a transfer learning based model it can be a bot elmo gpt one gp2 albert electra any kind of a thing right so can we use for sentiment analysis also yes you can use it so basically that situation is called as sec to break means you will get give a multiple output but you are expecting just one output multiple input one output means you will give just entire sentence entire phrase in that paragraph and then you can expect just one output so yes inside sentiment analysis and people generally try to solve a central analysis problem by using a machine learning approach right machine learning approach is not going to give you a best accuracy yes you will be able to train the model there is no doubt many people have done that right but if you are going to use this particular approach sequence or sequence approach right and if your base funnel is maybe lstm or gru you can just try this out and you will be able to find out that yes it is going to perform in a much better way as compared to your previous model right so yes in like a sentiment analysis blindly you can try to use it after sometimes just like i'll take a couple of question after that so i'll just give you a walkthrough of this entire code and again code is not tough at all that's very easy what is the benefit of dividing sequence sequence model to the encoder decoder as the functionality is same see so what people have seen over here is that when so okay let me talk about one situation over here so why people have divided input and output that was uh that is uh actually very good question right you are the first one who has asked this question in this session generally in my class many people used to ask me so see what happens is what is encoded decoder so in case of encoder decoder we'll be having one network over here and we'll be having another network over here inside this network inside this network so i can have this kind of a cell i can have this kind of a cell this is going to give you a like a context vector and inside this one i can have like uh input and output okay so here end of string this is my x i uh like this is my x1 i can say this is my x2 i can say and here let's suppose i am expecting y 1 maybe y 2 all these things are just a predicted value and then i can say like eos end of string okay now so in this case what will happen is like okay these two networks are connected and this will get as an input inside this one let me change a color of my pen so here it goes okay so this will get inside this one this one and this one and this is how this things are connected this is how now here now so the question is right the question is so it's it's completely fine when we are trying to do a same thing right when we have a similar kind of a network or the kind of a base model in input and output as well as like in an encoder as well as into a decoder then why do we need it right here let's try to understand this one let's suppose i have another network right i have this network where i don't have input and output in a separate way what i have is i have a input and along with that i have an output network let's suppose i'm talking about this basic implementation all of this round on figures are nothing but it's a let's suppose rnn lstm i'm trying to use now here what will happen it will try to take one input right it will try to give you some output again it will try to take one input it will try to give you some output but let's suppose let's suppose so here i have given input as i m fine right here it is supposed to give me mera right here is supposed to give me magic now sequence of this input is not same as a sequence of the output or if i will try to translate so here now if i just translate in this way right because if i'll try to understand this particular network so if i'll start translating because i don't know because when i'm trying to give this output right i don't have even an idea that what i can expect in our second phase right so basically this was an issue with this kind of a basic network that we already have so yes people have used this kind of network even you can use this kind of network to solve some time series based problem some for to solve some kind of a time series based uh data or maybe if you have a time series based numerical data multivariate or divided time series analysis if you're trying to do yes you can use it you can try to solve it there won't be any kind of issue because over there dependency will be not much in terms of a previous context or the previous like a previous contest that you have and the next one but when i'm talking about a sentence so whatever i'm trying to give as an output after my next output scan depends on the previous output or it could possible that it can depends on the whole sentence itself right it may depends on the whole sentence on the whole input unless and until for example i was trying to give you an example right suppose if i'm giving you one sentence bull is going right what you will be able to understand if i'm saying bull is going right i've given you a sentence you will be able to understand that okay fine i'm talking about some animal which is going right but if i'm saying that bull is going high now you will be able to like change the context you will be able to understand the actual contest so when i'm able to get a last word called as high then i'm able to form the meaning of the entire sentence so unless and until i'm not able to understand the relation of the whole sentence i'm not in a position to give you an output basically i'm not not in a position to give you some kind of a context and this is where this is where what people have done people have separated this input as an encoder and output as a decoder to resolve the complexity of a sentences but yes you can try to use even this kind of a network to solve another kind of a problem maybe time series yes it will work fine and you will be able to solve it with the help of this kind of network so i think i'm making sense to a people who have asked this question okay so who will decide when we have to put a stop word output end of the string so see whenever we design the network architecture right so we try to like take maybe a hundred cell at a time so basically we used to define that okay fine so i will be able to take input just as a hundred number of the cell right i will be able to take just as input maybe 200 or maybe a thousand so this is us who used to design or who should define when we are trying to create a network architecture for the very first time i think i'm making sense if we can use the latest and the best architecture present then should we invest time to uh on our data architecture no you should not invest a time on the other architecture but the thing is everything is connected that's the reason so like chris has asked me to talk about a bird directly but i said that no so instead of like uh getting into a bird directly right uh we should talk about attention to understand the attention uh paper or the attention is because in our attention so we are going to talk about our heavy mathematics here we don't have any kind of a like a heavy mathematics at all it's uh more more like a theoretical things right but in our attention model so it's a heavy mathematics that i used to discuss i used to talk about and to understand the attention for so before that what you have to do so you have to understand basically transformers to understand transformer you have to understand encoder decoder to understand in code decoder you should know sequential sequence to know sequence to sequence you should know a basic like a rna lstm or rna grnu right so uh that's the reason so you should know about the previous architecture so that you will be able to learn a new things in a best possible way okay so now let's let's talk about like uh just a code walkthrough right uh maybe at the end of this session so i will try to give you some example like uh maybe in the next class so i'll try to show you name entity decoration model as well so when i will end up talking about like a uh like a transformer model because after this i will be talking about transformer model itself and uh based on the encoder decoder architecture right so here code wise if you will uh try to understand right so you will be able to find out that coding is like a very easy right it's not at all tough now let's talk about like a encoder and a decoder cell um in a first place so this is a like a kira's uh like a official implementation that you will be able to find out so it's from the official repository itself if you will just look into the example right you're not supposed to go anywhere every every kind of a code is already available in a kira's own repository right just you have to search like a keras github repository so from the official documentation itself you will be able to get it and you will be able to build the model now if you will talk about a sequential sequence so people have given like a lstm sequential sequence and people have given lstm like a secret sequence restore so the other different different models that people have given to you and uh again so you can if if you're trying to understand like a sequential sequence so i think this is the best model uh this is the best code a core implementation of the code that you will be able to find out in general what happens is so we try to call the entire library we try to call the entire function and then like we try to just pass the data but this is the core implementation that you will be able to find out just based on the lstm here in this section right so in this section we are not doing anything we are just trying to give it part of the data right so part of the data means input text english hindi let's suppose if i'm trying to make a translation now on top of that so we are just trying to like read our data it will be available in lines we are just trying to iterate over the lines right and i'm just trying to give you a code walkthrough because uh you can take your time and you can try to understand it right so once this it will try to iterate over the line then we are trying to like uh extract all the characters one by one one by one out of the entire data set right then we are trying to like create the input character and target character means input and output x and y we are trying to create over here and then like our tokens we are trying to create and then here we are trying to define the length of the sequence that whenever we are trying to give an input what is the sequence so someone has asked me a question that how we are going to define that what will be the number of the input that we are going to give you can control it over here so you can try to define that what will be the length of the data that you are or what is the sequence of the data that you are going to give as an input now here so what we are trying to do we are trying to do enumeration innovation simply means that so for a particular character it will try to generate a respective numeric values so it will try to generate basically a dictionary for you key and value pair it will try to generate this is what we are like trying to do in this particular place nothing else so we are trying to create the encoder input data then decoder input data and decoded target data we are trying to create because here encoder will take one input and decoder will try to give you another output we have to prepare all the data set this is what we are trying to do over here uh now once this data preparation will be done right once this data prediction will be done so here we are trying to create the model so the architecture will start from here from this particular place before that we are just doing a data preparation nothing else that we are trying to do we are trying to create a basic like uh lstm over here and again now here so you can you can try to say that we are trying to call a model.compile we are trying to use optimizer as a root mean a square propagation then uh like a loss function wise we are trying to use a categorical cross entropy loss right and we are just trying to define whether we want accuracy matrixes or not and then we are trying to call model outfit inside a model not fit so we are trying to call encoder input data and decoder input data as simple as that so this is a like a a single like a like a model that we are trying to create over here and finally we will try to save it now in this particular page right in this particular place so first of all you will have to create a encoder entire encoder layer and then you will have to create the entire like a decoder layer this model you will have to define sample model you have to define so here you will be able to find out there is a function for our decoder sequences and uh in this case yeah so it is it is going to return you the decoder final decoder output so it's nothing but just a cell that we are like trying to define inside this one that what this a cell is supposed to give you as an output so basically this is the core implementation whenever we try to implement this particular things in our use cases right it's not like you are supposed to write the entire code and then you will be able to use it no it's very simple what you can do is you can just try to call lstm sequential sequence right pass your data your work is done this is how easy it is this is a core keras implementation this is the code which people have written from a keras itself so this is the like a code which people have released as an open source so that you will be able to modify and then on top of that so you can try to perform any kind of operation if you are looking for that kind of a control right if you have to modify like a layers inside that one as you can see that people are using base model as a lstm model over here so people are using basically like a lstm model in this particular place right in our decoder side as well as into our encoder side in both the side people are using basic model as lstm let's suppose i don't want to use lstm let's suppose i want to do gru right i can try to copy this code and then i can try to convert this a model i can try to like replace this lstm with a gre accordingly i can try to give my input which will be almost same right and then i can build my own architecture if you are planning to build your own architecture only in this case if you are just trying to like a train your data set if you are just looking for the final model in that case you are not supposed to worry about this entire set of a code you can try to pick and choose a problem statement and then simply you can call akira's api pass the data train your model and your work will be done so this is how easy it is or this is how easy it will be in terms of implementation which i'll show you it's a like just a first class so i know like many people are confused or many people will not be able to understand that how we can apply all of these code but yes implementation wise it's not at all tough when i'll show you name entity recognization example in my class you will be able to find out that yes it is very easy it's not uh as complex as these code because these are the code which has been released by like a keras itself and you just so that you can try to like uh modify it and then you can solve your purpose okay so can you please send this link of the research paper so not an issue chris will upload it in the same like youtube video itself how it is better than a bi-directional lstm it's not better than binocular stream bias and lstm is much better than this one can encoded decoder be used to find out the similarity between two things let's suppose a symptoms such as nausea and a fever based on the disease they might have lead to it depends on the data set what kind of a data that you have right depends upon the data set so yes it will be able to find out people dilip has asked these things many times how to detect a name in text right that's the reason don't worry so like i'll be talking about specific i'll show you the example itself a deployed example end-to-end deployed example i'll show you that is called as a name entity recognization and i'll give you an end-to-end line-by-line walk-through this is basically like a keras repository code we'll show you our code basically that how we actually do our implementation and uh you will be able to find out that it's it's not tough it's very easy actually and we can build this kind of a system we can train our own model in this way yeah chris any other question that we have to take no i think uh that is sufficient for today's session uh anyhow just forward me the links i will be putting in the description oh yeah sure so i'll just give you the link for each and everything that i have discussed right and uh probably in uh next session what i can do is like uh i will be talking about like a transformer based model so i can now i can jump into a transformer-based model directly right where like uh we can we can try to show them examples plus uh we can try to talk about the architecture and mathematics behind that in detail yeah sure on wednesday right yeah wednesday wednesday and then it is same time yeah same time eight o'clock guys okay guys so if you like this video please do hit like for sudan show oh he has taken out his time and he has come over here to explain these things thank you again sudanshu for this amazing session so guys this will this this whole uh video will be available in uh deep learning playlist itself so you can actually refer any night any time whenever you require anything uh from your end sudanshu you want to tell uh no i think it's fine just try to go through it and uh people people can just try to like uh revise the basic fundamentals of this uh rmn stm gru right and encoder decoder so that it will help them to understand like transformers otherwise it will be difficult for sure okay guys just before coming to the next session which is on thursday sorry wednesday please make sure that you complete the whole deep learning complete deep learning playlist in my youtube channel make sure that you complete completely uh it will be completed till here that will be the main thing that will be required for transformers and birth uh many people are asking about our community classes so i think tomorrow is a big day so we are going to make announcement tomorrow yes tomorrow we are going to make the announcement for community classes all the details will be explained to you in a live session yes yeah yeah okay so thank you for this amazing session and thank you guys for seeing it i will meet you on tomorrow live session telling about the community classes okay thank you people are asking about like chris have you made a video on gru so have you done that um lstm i know iostm is there i know that okay no problem i'll make a video on gru tomorrow morning itself okay okay okay sure yeah cool then uh okay guys so thank you all for this amazing session thank you sadanji for this amazing session yeah we'll see you all on thursday okay yeah thank you so much thanks chris and uh see you again yeah
Info
Channel: Krish Naik
Views: 32,490
Rating: 4.8927913 out of 5
Keywords: data science, machine learning, deep learnign
Id: bHfXYQgn0Cc
Channel Id: undefined
Length: 97min 30sec (5850 seconds)
Published: Mon Aug 24 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.