LIVE: Design and build a Chatbot from Scratch

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
uh hi friends uh good evening and thank you for joining in today could you please confirm in the chat window if you are able to see me and hear my voice clearly or if there is any technical glitch so if you can just confirm that again i have the chat window open on my phone so i can see if everything is if if you're able to uh if you're able to i mean just please confirm on the chat window so that i know that if everything is working all right okay yeah thankfully i'm visible okay cool cool we'll just get started in another three four minutes it's 6 57 indian standard time so three more minutes right cool uh let me just see the chat window quickly i mean me along with others have been answering a few questions okay cool visible and audible good good good good good we'll get started uh in just a few minutes dive into the session itself again probably will not be able to finish the whole design and build a chat bot in one session probably we'll do it in two sessions today we'll focus on the design aspects how do you do the high level design what are all the modules how do you break the problem into sub problems what types of models you can use for each module and things like that right so uh one more thing is very importantly uh some people have asked me what are the prerequisites for this so uh for the so the two part said there is design and building in the design part uh we will try and explain by giving you the intuition for most things so even if you know the basics of what a chatbot is some basic machine learning you'll be able to understand the design part of it of course there will be some parts which you'll not understand like what is bert what is an lstm you may not understand some of these but you can look at them as black boxes and try to try to follow through so that you at least understand how the design of chatbots actually happen as far as the next session where we'll actually build a chat bot go through code walkthroughs and things like that you will have to know tensorflow you will have to know libraries like hugging face which is extensively used for transformer based models and things like that so for this session even if you are a beginner you will certainly be able to learn a few things on how the whole design itself works for the second part of this session where we'll do more code walkthroughs you will have to know some of the code related stuff related to uh related to deep learning and nlp right cool okay uh so what are the so let me somebody says is there a course on cyber security i think i responded to that just a while ago we don't have a course we don't have it on our near term future goals okay do we need to know all concepts of ml yes so today state of the art chat bots are using state-of-the-art techniques right so they're using transformers bird gpt-based models and things like that so to actually build something from scratch on your own you will have to know state of the art techniques no doubts about it but for this session to understand the big picture again if you know all these techniques you will understand more of today's session but even if you don't know that you will be i'll try to give you the intuition of what bert does some intuition so if you don't know some of these techniques you'll still be able to get a design overview of how things work right so second part most likely we'll do it in the coming sunday or the sunday after that right so we try to do each of these two hour sessions on sundays so this is the first session we'll either do it the coming sunday or the sunday after that right sounds good uh cool uh let's let's oh it's already 7 pm so let's get started can everybody see my screen just to be sure can you please confirm that can you please confirm if this is if everybody can see my screen please so that i know that you can see it and i can get started in i can get started with the session okay yeah so everybody can see the screen so the plan of attack today is as follows for about 90 minutes okay let me just write it down for you uh so for about 90 or so minutes let me pick okay so for 90 minutes or so we will just go through some of these notes that have pre-written i'll try to explain each of these concepts and please please jot down your questions so 90 minutes we'll cover the design aspects so this whole session is called design and build a chat bot from scratch so most likely in about 90 minutes we'll be able to finish the design part thoroughly so the building part probably will need one more session for this it's a building again if you understand the design and if you know libraries like tensorflow hugging face etc building is actually fairly straightforward if you just have the basic data sets for it right it is a design which is harder very often right so we will design the solution today we'll break the problem into some parts again there are many many design choices i'll come to that so for 90 minutes we will design the chat bot we will look at each of the modules that we need to build and what techniques we can use for each of these modules and about 30 minutes i will i will try and do a q a where i'll try to answer as many as many questions as possible from the youtube chat again please understand that since it's a public chat it's almost impossible for us to answer all the questions so we'll try to answer as many as possible okay that's cool so today's so this is the first session we are hoping we can finish this in two sessions but this in this session we hopefully can finish most of the design aspects if not all if there is something missing we'll always do the second session if it requires three sessions we'll do three sessions right so that's perfectly okay but we want to do it thoroughly cool so let's get going uh since everybody can see the screen let's let's jump in cool so as i was just telling you uh we have two aspects here the first aspect is designing the second aspect is actually building the system itself now okay so to design a system there are multiple choices there is no one right choice so think about it think about it like a system design problem okay think about it like a system like a system design or a high level design high level machine learning design problem this is not a software design challenge it's a machine learning system design challenge there are multiple choices and there is no one right answer right so we will discuss multiple modules that we can build and we'll take one running example so that we can show you the whole pipeline okay again there are many many design choices based on the actual real world problem you're solving so this is what but we'll try to cover some of the choices we'll try to be try to cover as much as possible okay the second part is actually building it to build it we will use python obviously we will use tensorflow because tensorflow 2.0 and keras because it's mostly deep learning models we'll also use some nlp libraries like spacey and ltk etc wherever we need to use those libraries right again please understand that this is an introductory session if you i mean there are books written you can write full-fledged books on how to design a chat bot from scratch right we will not be able to cover all we will not be able to cover all the cases all the variations of chat bots because there is a huge complexity in chat box right we will try to again please treat it as an introductory session where we will not be able to cover everything but we thought if you can do two maybe three sessions you will at least get a good flavor of actually how to build a real world chatbot from scratch that's the whole objective because we can spend we can spend 20 30 hours just designing multiple chat bots with multiple design choices but that's not helpful because you're only learning one aspect of it it is a breadth of knowledge which is important also in machine learning right so again uh so we'll focus mostly on design i am hopeful that we can finish design in about 90 to 100 minutes once we finish the design aspects we will cater the next whole session for building it with code walkthroughs and things like that okay but design is the more important challenge because what are all the pieces of the puzzle because if you know that okay here is a piece of a puzzle for which you have to use this model you can go ahead and implement it if you know tensorflow and keras it's actually straight forward to implement some of that stuff of course you need to have data also for that right so again please understand that we may not be able to cover the many many design choices and many many architectures and variations that are possible but we'll do one thorough solution okay cool so the next thing here is it's very important to understand the differences between these so you have these cloud-based systems like dialog flow from google which is one of the best that i've seen there is also alexa from amazon and aws there is azure bot service from from microsoft uh there is ibm watson bots ibm watson based bots from ibm itself and there are tons of companies like like i can think of at least 20 30 what services most of these bot services are designed for people who may not have the deep learning expertise okay if you want to build a reasonably simple and straightforward chatbot you can build it actually using all of these we have done a live session earlier on how to use dialog flow in general to build a chat bot of this type okay this is one approach certainly a valid approach if you're not an expert in deep learning and things like that the second approach is using a library called rasa okay so rasa is one of the most popular libraries for building chat bots it's an open source library so so so you can actually go ahead and read the source code i've read some parts of rasa's source code myself it's a it's a beautiful beautifully written library and if you're a python programmer look at this if you're a python programmer and if you want again this is all this run on the cloud remember all these run on the cloud if you want to build your own chat bot on your own compute hardware suppose imagine you're a bank let's just say imagine you're a bank and you want to deploy a chat bot on your own servers without using any cloud-based systems for whatever privacy security whatever you call it okay rasa is a very very popular choice that lot of companies use lot of banks financial institutions are using again rasa you can write code everything in python and as far as the logical structure itself is concerned rasa does programmatically what dialogflow does using using a simple ui right basically that's what it is again in rasa also you don't have to be an expert in machine learning and you can still use rasa the third approach is build your own system right i mean this this this is this is where this is what we want to focus on this is our focus for the for this session and mostly the next session also now you might say hey why are we building our own system why should we just use this great valid point right if you want to have a quick solution if you want to build something very quickly makes very logical sense to build something using dialog flow or rasa but if you want to be a good data scientist or a machine learning engineer you have to understand how these systems work internally because if you start treating everything as a black box then i mean you don't have the depth of knowledge the purpose of build your own chatbot type stuff the purpose of these live sessions is for you to gain an insight into how rasa works how dialog flow works so that you understand how some of the deep learning techniques or some of the machine learning techniques that you learn are connected to building these systems imagine if you want to design a dialog flow like system on your own from scratch how do you do that again these these this type of session hopefully will help you understand how to apply concepts that you learn to real world problem solving number one and companies who want to have complete flexibility on how to again remember the dialog flow alexa azure they all come in a packaged form which means you don't have full flexibility in how you can use them they're fairly reasonable by the way similarly rasa since it's open source code you can go ahead and change the code but if a company wants full flexibility in designing their own chat bot they want to do everything from scratch using their own models complete flexibility this is surely the best approach right so the purpose of this is for you to gain a deeper understanding on how to design and build a chat bot from scratch and if you ever end up in a company which says hey we want complete flexibility and we want to do it on our own using our own data scientists i mean great i mean you're going to build and or probably if you join a company that wants to build a system like dialog flow or alexa razorbot service or any of that right so i mean that's phenomenal cool so i just wanted to give you a flavor of all the things so we have discussed both about dialog flow and rasa in the previous live sessions that we've done on how to design a chat bot okay cool now remember many people have this very fundamental confusion the chat bots today all the chat bots that we have all the chat bots that are that work in the real world most of them have limited domain and scope like for example you could have a chat bot for a bank okay i see lot of major banks in india and us which have a chatbot of some sort they have limited domain which is they they are helpful to customers or users who use them for some subset of transactions or for some to answer some subset of queries so both their domain is limited to only banking services and the scope is also limited like for example um for for complex banking transactions you still need a human in the loop okay chatbots can mess it up and you don't want things to be messed up again people have tried general purpose chatbots it's not that people haven't tried general purpose chatbots many companies like microsoft had this general purpose chat bot that they deployed on twitter okay they actually deployed this on twitter and it went crazy people just played around with that chatbot i think this happened a year or two ago people just made it a racist bot okay google has built some of them facebook amazon you name all these companies they all have tried the general purpose chat bots which can do everything but they all are still research projects and they are not ready for the real world productionization we are not yet there so if you if you are serious about building working chat bots please work in a limited domain and with a limited but a clear scope not everything can be automated today we are not yet there okay so just keep please keep that in mind okay cool now what all concepts do we need to build to build chat bots obviously to implement them you will mostly use python we'll use some nlp libraries like spacey nltk etc for all deep learning we'll use tensorflow 2.0 keras and libraries that work very well with this for example we'll be using hugging phase for transform for transformer based models which is one of the best libraries for transformer based models similarly as far as deep learning itself is concerned i mean if you know bert transformers lstms some of these concepts are very very helpful even if you don't know what these models are i will try and provide an intuition so that you know what these models do as black box but if you really want to do this thoroughly you should ideally understand how word works internally what it can do what it can't do okay that's very very important if you actually want to build okay next as far as nlp itself is concerned there are a couple of concepts named entity recognition core referencing and concepts and there are some more concepts but mostly these two concepts i will explain these concepts in the con in in these live sessions wherever we encounter these concepts i'll explain them intuitively i'll also explain how an ner system can be built or how a co-references system can be built and things like that okay cool so okay so as a running example okay because it's better to take a real world example and work with it okay but the design is not limited only to a bank chat bot okay because the chat bots that i personally used the most are my bank chatbots okay financial institution based chatbots whether it is stock trading chat bots or my savings account my fixed deposit type stuff right so we'll use this as a running example but this need not be the only chatbot that you can build at the end of it okay just wanted to use a running example so that i can show you some chat and how these things fit into the bigger picture okay so we'll just use it as a running example cool now let's let's take so the domain is fixed right so first we are not building a general purpose chatbot we are building a chat bot for a financial institution like a bank like standard bank take hdfc sbi icici city bank standard bank that we all work with in general right so let's so our domain is well fixed that's very very important right because don't try to build general purpose bots because they have failed even when they had terrific teams across the world okay so that's first second thing here is what is the scope of this which means what all tasks can it do i mean that's important you can't say hey a chatbot should help me just apply for a home loan no that won't work because there is lot of process there is a lot of manual verification that is needed to apply for a home loan or a personal loan personal loan probably easier but home loan there is much more process so the scope also should be well defined what things do you want to automate or what things do you want your chat bot to be able to tackle so i'm just i've just written a few examples here okay first is you might want your chat bot to do a q a right in the sense that okay for example imagine it should be able to answer a question like this what is the savings account interest rate right i mean i ask this question right very common question that people ask or it could be an information retrieval based question for example what is when is my credit card payment due okay again for this i'm calling this information retrieval based system because when this question is asked by a user let's say you my user asked this question the bot has to actually of course i'm assuming that the user has logged in so the bot knows what is the user id right it has to go to the credit card and a person could have multiple credit cards let's assume for simplicity that the person only has one credit the user has only one credit card what is the question being asked credit card due date that's what is being asked right to figure this out the bot has to probably perform an sql query on the database of all the credit card information for for all the users using this user's user id right so it has to retrieve that information using an sq most likely because most banks store their information in a typical relational database or a transactional database right so it has to perform an sql query retrieve the answer and show the answer and show the answer right so again here some information is being retrieved from a database or a data store okay there could be transactional queries also okay for example it could say hey move two thousand dollars from my savings account to credit card okay this is again this is transactional here we are saying please move this much amount from this account of mine to this account of mine credit card is also an account in most banks right so this is a very transactional thing and here remember money is being moved so you have to be very very careful here you can't mess up things suppose instead of moving from savings to credit suppose you change you you move from savings to current account the customer will think that he has paid his credit card bill while he has not paid right or you so you have to be very very careful and accurate in understanding these statements and performing these transactions again this is just a small exam small set of examples there could be many others also okay so for example it could be hey i want to change my password right i mean that's also a transactional thing right because you have to make some transactions some edits some updates it's not just retrieving some information or answering a standard question that typically many customers ask it's a transactional thing so you just before you start any project please define very clearly what is the domain that you are working in like a bank chat bot in this case and what is the scope it's always important to say what all scenarios you can call it scope or all the scenarios it's very important to list these down okay if you don't do it your your chatbot can go crazy it will become impossible to handle it and to build a sensible scalable system which works in most of the cases again most chatbots today don't work all the while but even if it works in 98 99 percent of the cases we are happy that's all we care about right it is 99 or 98 of the users or customers their questions have been answered right so it's very important to list down all of the major scenarios or the scope of all the stuff that your chatbot should do okay this feels like a drudge work but it's important to write it otherwise you'll have you'll have crazy bots which don't work most often customers will get pissed off okay very importantly along with scope and along with the domain you should recognize what data sets do you have for example think of a bank right think of a bank okay imagine you are building it for let's say a major bank what all data do you have you might have faqs most banks have faqs on their websites so you have question answer pairs in faqs but you may have what maybe 200 faqs no more than that you won't have thousands of faqs right these are frequently asked questions for which you have standard answers for example what is the current interest rate all of those stuff right what is the home loan rate things like that the second major piece of information this is actually a major piece of information in most banks which is the data is stored in a typical tabular database typically in a relational database so typically in a relational database engine which you query using something like sql right this is where a huge chunk of your data is okay the third piece of data that you might have is you might have some text documents or web pages okay which contain purely text for example you might have an internal document which lists which lists some information about what are all the required documents for a home loan let's say okay so you might have some of these documents or this could also be publicly available on a web page so this is basically free form text this is basically paragraphs and paragraphs and paragraphs of text okay again i'm just giving an example what all documents are required to apply for let's say a home loan or a personal loan okay or what all documents are required to apply for let's say to apply for a credit card okay it see such such information could be there in faqs or it could be there in some web page or text document okay so the type of data that you have here is text the fourth very very interesting piece of data which many companies don't have is historical chats imagine see most companies who are launching chat bots they fall in two categories one they don't have any historical data but they still want to build a chatbot in such a case they can only build using these three pieces of data that they have right the second type of companies are they have had humans who have been chatting with potential customers or existing customers and they have huge amounts of historical chat data from customers if you have this data this is a gold mine this is extremely extremely useful okay but it's only very few companies which have that so we'll see in the design aspects how this system can be used on its own okay so again whenever you're designing a system always ask yourself what is the data that i have at my hand at my disposal again this is just a small subset of data that i'm trying to work with in this in this running example right so it's very very important to to remember that to remember to define the scope or the scenarios where your bot will work and it's very important to understand what data sets you have access to and what you don't have imagine if you don't have all the historical chat information then you can't use that to build your chat bot right so it's very important to get a get a hand handle over that next things again let's try to break the problem into sub modules okay so again i am not saying that this is the only way to break it i'm trying to build a simple useful system here i'll explain what each of these things do okay again please understand that this is an introductory session where we are trying to break it down so you have a user a user asks a question to the chat bot now this question is fed into multiple systems okay so again i'll keep coming back to this okay again this is one this is one way of breaking down the problem into modules so you'll have a module called intent classification which tries to understand what is the intent behind this question what does the customer want to do like for example is the customer trying to find what is the fd interest rate is that what he's trying to find this could be one in intent i'll talk about intent classification lots of detail in just the next few minutes right here intent basically means what is the intention of the customer this word intent comes from intention so you want to classify whatever question whatever question a user asks there could be hundreds of intents there could be hundreds of things that customers want to do based on the scope and the scenarios that we have listed right so we have to classify what the intent is then we have to do something called as entity extraction i'll explain this in lots of detail in just the next few minutes at the same time again i i'll just give a quick walkthrough of this but i'll explain each of these modules in depth i'll explain each of these modules in depth okay so just bear with me so then there is something called entity extraction i'll come to what it is with some examples similarly your chat with the user could have been very lengthy you might have had some questions the user so user asks a question the bot answer something then the user asks something the bot answer something user asks something bought answer something so and so forth this could be a dialogue back and forth right in such a back and forth conversational system how do you maintain the context of all the dialogue between the user and the bot how do you how do you track that context we'll see some simple mechanisms again this is a big topic in itself okay but we'll see some simple systems but i'll also guide you to more complex systems that we can build here now given that we know what the what the user wants what is the user's intention given that we know what are the entities i'll talk about what entities are in just a few minutes given that we have all of the historical dialogue context of all the dialogue that has happened between the user and the bot we want to now take a few actions the action could be okay actually move let's say let's say i'm trying to move two thousand dollars from my savings account to my credit card bill right suppose suppose i'm trying to do this right this is this action has this transaction has to be performed right okay so you have to take this action it could be so for each for each intent that the user generates you could have one or more actions sometimes you'll have to generate answers also it's not just taking taking actions after taking the action you might have to generate some answer to him either confirming that you have taken an action or if the user doesn't want a transaction hasn't given a transactional query but the user says hey what is my what is fd interest rate there is not much of an action to be taken there you just have to generate the answer and generating the answer can again once you generate the answer you give it back to the user now generating answer can be done multiple ways in some cases simple rules will work in some cases for actions and generating answers you might have to run sql queries sometimes you have to do extractive natural language generation sometimes you have to do you have to generate natural language itself sometimes okay there is there is a huge complexity in generating the answer there are multiple approaches we'll discuss about them also okay so this is the big picture i'm not saying this is the only way to solve the problem but this is one of the simpler ways which is easy to break down the problem into these parts again just this itself can be broken down into many sub parts but we'll try to first get a high level view of things okay cool so first and foremost okay let's let's go to each of these modules first is intent classification now imagine what is the intention of the user that's what you want to suppose what is the intention of the user the intention of the user right could be anything right but your chat bot when you designed it you said that this will only work in a certain number of scenarios that's what you defined in the scope of your problem right when you're defining the scope you said hey my chatbot will only work in a specific subset of scenarios and if if if one of the scenario if we don't have one of the scenarios then we will not be able to use the bot we will have to send it to manually we will have to send it to one of our one of our customer service executives right so first and foremost it's important to write down all the intents that you will serve okay for example one intent could be check savings balance this is one intent okay another intent could be check credit card due okay i mean how much money is due in the credit card or when is the credit card due then the third question could the third intent could be transfer money the fourth in intent could be enquire about interest rates so you have to list down all the intents based on the scenarios again this all feels common sensical till again a lot of designing the chat bot is very common sensical there are only a few pieces where you need to use cutting edge deep learning models first you have to get it into a form where you can start using machine learning and deep learning right so here what do you have so for checking savings balance okay so what we want to do in intent classification is we are trying to build a multi-class classification which means if a user gives you a question you are going to try to classify it either either as this intent or this intent or this intent or this intent you could have hundreds of intents also it's not uncommon to see real-world systems which have tens or even i mean close to 100 intents okay it's it's not very uncommon okay i'll talk about more about intent granularity and things like that in just a few minutes but you can think of this intent classification as a multi-class classification you're given a user query or a user question you have to classify it first into one of these right so to classify it you need some data right you need some training data i mean this is not magic happening without data nothing works so what you ask so what you do at the very outset is you say okay this question one one question two one question three one question four superscript one question ten one these are ten questions okay or ten queries that the users typically ask for this class one this class one is let's say check savings balance so some user can say hey can i know my savings account balance some user can say what's my what's my account balance okay some users can say again there could be different ways in which people can frame that query so you should give at least a few examples you can think of these as few examples on how customers most likely will ask you or what are the typical queries or questions that users typical examples of what questions typically your customers or users will ask if they want to check a savings balance account okay i mean you just have to apply some common sense and write these down okay very simple okay so just write these 10. not not a big task if you just talk to any banker he will be able to write 10 ways of asking this question because they are domain experts they know it right similarly suppose there is another class which is check credit card due okay there could again be six ways of asking this question you have to manually write this or have a domain expert write this for you i mean you can't you can't escape these so let's call them as example questions let's just give them a name okay let's call them as example queries or questions okay so for each class that you have you have to ask for a few example or a few sample of questions that typically people ask for example for class 1 let's say you might have 10 such questions 10 ways of asking the same thing for for uh for two right uh for two let's assume i have six ways of asking it for transfer money let's assume it is question superscript three because this is class 3 and they're again 1 to 10 which means there are 10 ways that people can ask this similarly class 4 there are 8 ways of asking this so for each intent that you have you have to diligently sit and create this training data right i mean you have to do it again imagine you have all the historical data i mean that's that's another thing right imagine if you have historical chats right imagine if you have a history of lot of chats again many companies may not have it but if you have it right you can you can just have a small team of manual annotators who will just will just take actual questions from historical chats and place them here so in such a case your data set is going to be much larger than just 10 samples right you could have let's say 100 samples here on how this question can be framed right again all this so your q i j right your q i are basically your training data simple question and what do you want you want multi-class classification you could have some 50 classes here it's not it's not uncommon to see 50 intents being generated in a real world banking system not very uncommon right so now i have the question oh there is one more aspect before i go into the question the question here is what is the granularity that you'll break each intent into that's also an important design choice that you have to think of would you just say transfer money as one intent or this is this is a very high level granularity right so this is one high level way of creating an intent versus you can create so you can create just one intent like this or you can create three intents like this right you can create three intents transfer money pay credit card this is very clear you're transferring money to pay your credit card bill or transfer money through imps in india right in india imps is a is a very popular service that banks use for instant money transfer or money transfer using unified payments interface okay so within transfer money you can just say i'll just create one intent called transfer money and i'll figure it out later whether it's a it's a transaction to credit card or via imps via up i'll figure it out later or you can say hey given any query first i'll break it into this lower level granularity it's a design choice that you have see if you just say transfer money right one advantage of having a this is a higher level granularity one advantage with this is you can write many more questions here many more ways in which people will ask this right but suppose you recognize that what the user wants is to transfer money then you have to be very careful to understand from where to where is the user trying to transfer money i'll show you some examples on how to do that using entity extraction in just a few minutes but or you can say hey i want to be very very double careful i want to classify transfer money and pay credit card is as being different from transfer money via imps and things like that so this is a design choice that you have but by doing lower granularity of intents your number of intents can explode so a design choice that many people do is called hierarchical intents first given any query so the way hierarchical intents work is as follows okay so you have a question the user gives a question right now first it will go into intent classification okay so let's assume there is an intent there is an intent classification model which is mostly a deep learning model i just come to that model in a couple of minutes so there is an intent classification model that that easily understand that the intent is transfer money okay so we know that the intent we know that the intent is transfer money transfer money now within transfer money i can have five more intents then within transfer money i need to have one more multi-class classification system which will break it up into five classes okay i hire a hierarchical intents based system is also very very popular in the real world but you just have to build more models then that's all okay now one simple question i have for all of you especially for those of you who know some machine learning one of the questions that i have for you is this okay so this is this is an interesting question right so it's it's a good question actually so if you ever say i built a chat bot or i've designed a chat bot or i work on chat bots the first interview question that you typically encounter here is how do you build the intent classification model because look at the properties of the model it's a multi-class problem you can typically have about 50 classes fairly common but you only have few data points or few few training samples per class per class you can assume to have about 10 to 20 samples no more than that you can't say hey for every intent i want 200 ways of writing that sentence that's very hard to get so given these constraints that you want to build an intent classification it's an nlp classification task obviously because your text your input is text it's a 50 class classification and you have few samples per class can some of you suggest again i'm looking at the chat window here can some of you suggest how do you build this system okay i would like to hear your thoughts how can i use okay some of you are asking questions i will come to answering the general questions at the end of the session itself for nerve how do you build this system how do you classify again remember it's not a straightforward simple multi-class classification using softmax classifier it's not that because you have very few samples per class okay you only have 10 to 20 samples per class how do you tackle that okay somebody says we can create data with higher similarity what do you mean by that i mean look at imagine a case you have class 1 class 2 so on so forth class 15. let's just assume for simplicity that each class you have 10 10 10 ways of framing the question okay this is all you have to do so your total data set is only 500 queries okay this is all text remember this is all text 500 samples is very little given that you have a 50 class classifier some people say data augmentation how do you do data augmentation again the problem here is you have very little data you have 500 your 50 classes and you have only 500 queries or 500 text samples again people say i'll use word to vect it will not work vertovec will work if you have a few hundred points right and if you have if you have two classes maybe three four classes 50 classes and you only have 10 10 points per class word to work but how do you use birth even here i mean i would like to understand more word-to-work bag of words naive bayes they won't work no you have look at the data you have you have tiny tv data data is the killer often in the real world okay how do you apply naive bayes with all the i mean try word to work whatever text featurization technique you want to use right tf idf whatever you want to use how do you do it use sequence to sequence model sequence to sequence models if you're using deep learning sequence to sequence model you need a few thousand data few i mean typically most machine learning models will require at least a few hundred data points per class deep learning would require much more because the number of parameters in a deep learning model is much larger okay people are saying k n n how do you do up sampling in this data please tell me how do you see you have 10 you have 10 strings here okay you have 10 queries that people are using how do you up sample this please think about it okay i've only seen one reasonable answer till now i've only seen one reasonable answer man okay extrapolation how do you extrapolate again extrapolating text is non-trivial it's fairly complex by the way think think the problem here i think most of you are just throwing whatever technique you know at the top of your head your technique is a multi-class classification that has to work with few samples per class those of you who have we have gone through our course videos we have discussed this specific case in a very interesting example we have done a live session on this actually there is only one reasonable answer i've seen till now no i think i think most of you are going down the wrong path or at least a non-reasonable path i shouldn't say wrong because i don't know what you're completely thinking so one of the students uh let me see his name uh he has i've read his answer just give me a second i think one of the students answer one second let me see yeah his name is deepak singh he says why don't we try one shot or few shot learning that's the approach right so there is this whole area of machine learning deep learning techniques called few shot learning right that's the approach to use so very often you have to figure out what is the right technique to use in the right context few short learning basically says if you have few samples per class how do you train models okay we have done a very similar problem in our course right it's there in the course live sessions and course case studies where we built a we built a speech recognition system we built a speech recognition system using something called as siamese networks it is in this context okay where we use convolution neural networks and siamese networks to do it it is in this context where we explain what if i only have a few utterances of speech from the user a user will give let's say 10 utterances of speech using that i have to classify who the user is right it's a it's a concept of it's a very interesting concept okay and this is something that is there in the course videos by the way okay so uh hey deepak i'm trying to i'm trying to read all the answers i'm happy uh so one shot learning or few short learning is is the right approach for this setting so whenever you are learning any machine learning model try to understand the context in which that machine learning model can be best used this will help you design better solutions in the real world okay it is something that many people overlook you should know where to use a model and where not to use a model that is extremely important that's when our course videos we try to focus on that a lot okay so again for those of you who don't know what is a few short learning or what is siamese network let me just give you an intuition because i understand everybody may not know that the intuition is as follows again i am giving a very very high level intuition we have spent almost two hours in the course videos explaining how siamese networks work and things like that but bottom line is this suppose you have two queries or two questions from the user you pass both of them to a neural network model right both of them the neural network model will try to represent this question one into a vector let's call it vector 1 it will also try to represent this q 2 into a vector let's call it vector 2. now along with these two if these two queries so if these two questions or queries come from the same class let's assume this belongs to same class this belongs to same class so what is your x i your x i is basically question one and question two okay your y i will be one if q one and q two are from the same class zero otherwise okay again i'm trying to simplify a few short learning siamese network at its very core this is what it does okay so here here you have a neural network which takes question one which takes question two it will represent question one and question two into vector one and vector two then you have a binary softmax classifier which is nothing but a which is nothing but a logistic regression it's a fancy way of calling a logistic regression right which using these vectors will now classify either as class label one or zero so why why is sims network very powerful here if you think about it it will try to again this whole model is trained end to end using back propagation so the whole neural network the softmax parameters everything is trained end-to-end using backprop okay again those of you don't know backprop it's a very very popular technique in deep learning on how to train deep learning models okay again i'm trying to simplify many things here but if you're interested we have a video in our we have we have a uh we have a uh we have i think it's two part like four hours of videos on how to solve this problem using science networks okay so now coming back to this so this vectorization is such that queries which belong to the same class or in our case to the same intent will be closer together in this vector space right i mean that that's what it's trying to do effectively that's what it's trying to do okay now why is this powerful okay so effectively we are trying to get a vector a custom vector representation for our questions such that questions from the same class are going to be closer to each other vectors from different classes are going to be farther from each other that's the whole idea of the few short learning via sims networks now why is this powerful in our case if you think about it imagine we have three classes let's just say class 1 has 10 queries class 2 has 7 queries class 3 has 10 queries what is the total number of queries we have we have 27 queries and 3 classes i mean you can't train a model with 27 data points even statisticians will laugh at us okay but if you convert it into a few shot learning setup okay using siamese network okay again sims network is just one of the many networks you can use for few short learning okay so now what is a data set our data set is basically pairs of questions okay so take these 10 questions okay i can pair up with any of these 17 questions so these 10 questions pick any one of these 10 questions pick any of these 17 questions so this will give me 10 into 17 170 pairs for these 170 pairs of q1 q2 my output label will be 0 because they are coming from different classes similarly i can pick one of these 7 right multiplied by 20 right multiplied by 20 right because the 7 can be combined with any of these 20 and i can give a class label of 0. similarly 10 into 17. so this 10 into these 17 again i'll give a class label of 0. now because because these are these are pairs of points which are being picked up from different classes now similarly i can pick up two points from this in how many ways can i do it 10 choose 10 c2 i can pick up two points from two queries from this in seven c two base two queries from this in ten c two base for all of these combinations our class label will be one for all of these combinations the class labels will be zero even though we we initially had only 27 points here the number of data points that you will get now is going to be very very large right very very very simple logic right if you think about it so this is where we are using pairs there is also an extension simus networks use pairs but few short learning is a general concept where even triplets are used not just pairs okay wherein you'll have one question so let's assume this question one is from uh is from class two let's just say then you will pick one more question let's say question six again from class two so these two are similar and you will pick one question let's say question three from a different class so you can give triplets like this where you say hey these two are similar questions these two are dissimilar questions these two are dissimilar questions if you have triplets you can form many many more triplets than than pairs so even though you have few samples you're able to create much much larger data sets right very simple concept here now one question that i have here okay okay so let's let's okay so if what what would this neural network consist of obviously so some neural network architecture is required here right so if you're in our course videos and course case studies when we used siamese networks when we used siamese networks for audio classification right when we used it for audio classification we used cnns because we converted each audio into an image okay that's what we have done but in this case what we have is text and one of the things that you can use in this neural network architecture is lstms because lstms work very well for any sequence data like text that's one way of tackling this problem i'm not saying this is the only way but this is one way of tackling the problem now i have one more question which is uh okay so alternatively i'm not saying this is the only solution i just want to expose you to multiple solutions here alternatively so this is one way of solving the problem the other way of solving the problem here is you can say hey nowadays for nlp tasks we don't use lstms or bi-directional lstms all of that stuff today transformers are the state of the act why can't i just use a transformer model right of course amongst transformer models you have tons of models okay you can have you can use a pure vanilla transformer again those of you who don't know what a transformer is let me give you a quick overview of what a transformer is okay think of transformer as a black box okay think of a transformer as a black box you give some sequence as input the sequence could be a sequence of words what it would return is an another sequence of words okay again the mathematics inside transformers is will take i mean i think we have spent close to 10 hours just explaining the internals of how transformers work the mathematics behind it with some code walkthroughs i think more than 10 hours because the code walkthroughs themselves are just some some what three four sessions yeah almost close to 20 hours maybe yeah 18 to 20 hours probably right we have spent just explaining how transformers work the code walkthrough of transformers and things like that in our course videos and previous sessions okay so for those of you who don't know what a transformer is it's a simple model which will take a sequence of words or sentences or paragraphs as input it can output other paragraphs if you don't know that if you know what a transformer here is all of your gpt models whether it's gp2 gpt3 whatever it is it is basically a type of transformer your whole bird right is basically a transformer model right again in the future hopefully in the next month or so we will do sessions covering gpt3 internal mathematics and things like that we have covered the classical transformers we have also covered burt in lots of detail in our course videos along with code walkthroughs okay so those of you who know bert okay let's use bert as an example right if you want to use burnt okay you can take a pre-trained birth model which is trained on large amounts of english data right it could be wikipedia data google books data right you can do all of that and you can easily fine tune this model again there is a very popular library called hugging phase we have done code walkthroughs earlier on how to use hugging phase to fine-tune the birth model right so the the approach here the approach here is you will take a birth model look at this you'll take a birth model that is pre-trained on large amounts of english data to this but model what do you want at the end of the day given two queries you have to figure out whether the queries are similar or not right that you can figure out using birthright as part of fine tuning all you have to do here is you have to stay cls those of you who don't know but may not understand some of this but just bear with me i'll come back to the normal thing okay your question one is this then you have a separator right then you have question two right that's it then you have end of sentence or end of input right you'll give this as the input to your birth and your birth has to simply figure out if these two are again you can you can you can build a few shot learning system right you can build a few shot learning system using bird by concatenating question one and question two pairs of these questions using separators using cls as the starting end of sentence as the end of input and your output will be one or zero just like in your science network architecture the reason but may work better than a simple siamese network in this situation remember so this is very important this is a standard siamese network with the neural network being lstm or a bi-directional lstm right a bird might most likely actually in lot of studies but works compete as as good as this or sometimes even better than this and the reason behind it is very simple birth models are pretend on large corpus of english text so they understand the nuances of how english text works now all we are doing here is we are fine tuning this model for the task that we have this is one of the greatest ideas i mean the reason i love transformers bert and gpt models is because you can pre-train them on like nowadays gpt3 right gpt3 is trained on common core data which is literally a few i think it's a few hundreds of millions of web pages on the internet i mean imagine a model can learn a lot from existing pre-existing wikipedia data or the common crawl data and things like that all we are doing here is we are first pre-training the model then we are fine tuning it to the task that we want again those of you who are course registered students you can check out more about how transformers work how birth works how fine tuning can be done using hugging phase we have discussed all of that with code walkthroughs in the course videos right for now i will move on so it's important to think about the problem the reason i am spending so much time on intent classification is because this is one of the most important aspects that i have seen a lot of people who said hey i've built up i'll tell you why i'm focusing so much on it because in some of my interviews i've seen people who said hey i've built a chat bot for so-and-so company using so and so thing my first question is hey how do you do intent classification 90 plus percent of people can't give me a reasonable solution to this problem a reasonable solution just throwing some ideas is not helpful because for every idea you say we can give a counter argument on why the technique may not work okay so that's why i'm focusing so much on it because understanding how things work internally is extremely important if you want to have a long successful career in data science and machine learning right so if i were to design this system myself from scratch i will most likely use a bert or a gpt based model again i might just use bert because gpt based models are still data hungry and the number of parameters are certainly higher than but right so that's what that's again i will not use a simple vanilla transformer also because we don't require the whole transformer architecture with encoders decoders just about with just encoders will work reasonably well in most cases cool now comes the next important task i've been i've been postponing this for a while so let me let me tackle this now the next problem that we have is called entity extraction okay this is also referred to as named entity recognition task let me just again this example has been taken from wikipedia okay so let me create the source and let me explain how this works okay you're given an english sentence like this let's say jim bought 300 shares of acme corp in 2006. it's a sentence what entity extraction or a named entity recognition system should do is it should recognize a few predefined entities for example here jim is an entity he is an entity jim this this word is an entity of type person similarly this phrase it's not a single word it's a combination of two words here acme corp look at this the acme under acme corp i wanted to recognize that this is an organization or a company similarly this is talking about time so entity extraction or named entity recognition basically says given a sentence like this right actually here there are more entities if you think about it right so there is one more entity here in in in the wikipedia example it's not called out 300 is actually a number okay we also want that we also want that entity that this is a number again the entities of interest okay so the entities of interest will differ from application to application maybe again for one application this for one type of uh for one type of real world ner application this 300 may be important for other things it may not be important okay so the entities of interest you need to first listed on what all entities do you want to extract okay and again look at this what is this problem if you think about it you are given a sentence okay or a sequence of tokens again each of this is a token each of these words is a token right in in nlp terminology this is called a token right so you are given these tokens or the sequence of tokens you want to output another sequence of tokens what should be the output be the output should be corresponding to jim it should say the entity is person corresponding to bot it should say no entity recognized just say x 300 suppose we don't care about this number entity it should just say x corresponding to shares it should say x corresponding to off you should say x corresponding to acme it should say org organization corresponding to corp it should say organization corresponding to in it should say x i don't care corresponding to 2006 it should say time so if you think about entity extraction it is a sequence to sequence problem look at this you are given a sequence of tokens and you have to generate another sequence of tokens where the tokens here are that this is not an x basically represents not an entity not an entity of interest basically it's not an entity of interest that's what it means right otherwise it should give one of the entity names that i care about right very simple again it's a proper sequence to sequence model and mostly in in again entity recognition or entity extraction is a problem in nature language processing very popular problem it's been it's been done ever since nlp as an area was created and the way it's often measured is using precision and recall okay and actually they don't measure both these metrics they typically use f1 score which is a combination of precision and recall okay i am assuming that some of you know precision and recall because these are very common metrics that are used in in machine learning right if you don't know just google search for it uh i'm sure we have some videos on precision and recall also uh maybe as free videos okay you can check it on our course also cool so okay so this is this is the problem of ner okay so the next question okay let's take another example this is actually interesting okay so let's see so in the banking example right so let's take a concrete example from bank also so you might say move suppose this is the query this is the query that the user gave okay we recognize that the intent here okay let's assume the intent here let's assume we recognize the intent is transfer money okay let's assume this is the granularity that we have we recognize using our intent sorry using our intent classification model we realize that it stands for money now we have to recognize what are the parts in this for example you have to recognize that this is the amount not just a number it's an amount because there is a dollar symbol associated with this and this is an account type similarly credit card is also an account type okay we have to recognize this again remember that this is a very special type of entity which is very domain specific this is a domain specific entity corresponding to banking right so again we'll see we'll see how a ner model can be trained and how we can actually use it okay again account type is something that you don't use so much in regular uh in this this is also a custom entity if you want to call it or a domain specific entity okay this is also a domain specific entity okay that's important because not every entity again it's important to be able to recognize this okay because once you get these entities now you can actually construct the transaction or the sql query for this now i have one question so one second i don't know what is the next slide okay but i have one question here okay so let's let's go into this how do we build how can we design an ner system how can we do an entity extraction or let's say an entity extraction okay let's let's discuss this okay i've explained the problem right some of you who know machine learning so how do we how do we build an entity recognition system uh how do we build it i'll explain it anyway i'll explain how do we build it but i want to understand how imagine if you are designing an entity recognition system because i know a lot of people use an ner system but i would like to see how if you were to design an ner system from scratch how would you do it i really would like to hear your thoughts i'm looking at the chat window folks i'll try to pick up anything of interest yes spacey has ner capability shilpa you're right that's true but how does that work if you are treating machine learning systems as black boxes then why should anybody hire a data scientist they can just hire a python developer right a python developer can also call a function in spacing you should understand what's happening internally the depth of understanding is very important okay using nltk using spacey somebody says using lstm using json how do you do it using json json is a data type guys come on come on please uh is this video going to stay on the channel after live yes you can access this video after the session also somebody says using transformers how do you do it using transformers can somebody help me with that can you explain more details explain every entity to a variable based on gradient what gradient gradient means derivative right so based on what gradient classify into vectors and classify them into entity classes okay so how do i do it suppose i have a word okay so i think what uh i don't know who is this okay uh so okay i forgot i think the this thing is moving very fast suppose there is this word right okay suppose there is this word savings how do you recognize that it's saving that's it's an account type or how do you recognize that it's an amount ah some of you are coming up with ideas we can build simple rules right you can build a simple rule based system you can use grep right you can say if what i have is a dollar followed by a numerical value then it must be amount right a very simple rule based systems or if i have the word savings or savings account then it's an account type if i encounter the word credit card not just the word credit credit followed by card it's an account type yes rule based systems yes you can build rule based systems for this as somebody put it yes that's true i think somebody that's certainly one way of doing it okay uh parts of speech tagging and regex okay how do you do parts of speech tagging then that's the next question yes regex is certainly one way rule based systems what are other ways again rule based systems will fail very quickly in the real world is there any other system that you can use okay okay tokens elmo regular expressions okay okay some of you are on to some interesting stuff okay let me let me just walk you through a few solutions that are typically used okay so certainly rule-based systems you can write your own e-files conditions build a simple rule-based system somebody was saying about spacey and nltk right this was one of my popular questions when i was interviewing at other companies also i would uh i mean if somebody says hey i know nlp i've used spacey and i've worked on nlp for a year or two my first question was how does i would ask about ner and i would say how does species and er work can you explain me again not surprisingly many people don't do that homework okay so spacey nltk open nlp many of these things use a method a probabilistic machine learning model called as conditional random fields okay again it has its own advantages but crfs were a very very popular technology till about five years back they were literally state of the art they were working very very well till the time deep learning came and ate their lunch okay the state-of-the-art systems if you want named entity recognition state of the art any other systems here is a link again i'll share this whole document with you at the end of the session i'll provide it in the description section of the video right so if you just click on this link let's go to let's go through this link this is basically okay let me just zoom in a little so this is basically saying that there are a lot of research papers on name needed recognition there are lot of benchmark data sets co-nll is a very very popular data set right similarly onto nodes is there a lot of data sets on which people have built state of the art result systems if you notice if you notice on most of these more recent data sets birth or a modified version of birth look at this sp bio bird burt mrc you'll see lot of but there are also some lstm plus crf based models right so bert has become or variations of birth or modified versions of birth has become very very important or they're some of the best methods for named integrity recognition in the world today now the most important question here is how how how do they work so very importantly if you want to build an ner on your own okay so if you want to build an ner on your own remember that what is ner i was just talking to you a while ago about ner right ner is basically a sequence to sequence model right what do you have your ner problem is a sequence to sequence problem you have tokens here you have a sequence of tokens and you want a sequence of entities right that's what you want to generate the biggest problem here is again whenever you have something like sequence to sequence models it should immediately strike in your mind the transformers are the best models as of today okay again within transformers broadly you can use the three three broad classifications here you can use the encoder decoder based architecture the full transformer or you can use bert or you but is basically encoder only transformer then there is gpt based models which are decoder only these are the basic classifications okay again but is helpful because it has been extensively used there are lot of pre-trained models gpt is also very powerful but gpt models are slightly larger so whenever you have sequence to sequence models it should immediately strike the transformers or what you want to use actually when we explained about transformers when we walked you through the okay i'll just come back to this slide in a minute i just wanted to show you this so remember whenever you have birth sequence to sequence models you should think of birth transformer gpt etc the original research paper so when we explained about birth and the code walkthrough of birth in our course videos using hugging face right so we in our course videos and previous sessions we explained how to use bird how to fine tune bird using hugging phase how birth mathematics works we even refer to the original research paper from 2018 right again i've taken a screenshot from the original research paper in this original research paper all that they have done is this they have used a pre-trained birth model look at this all that they have done is they have used a pre-trained birth model okay pre-trained on standard wikipedia text after that they have fine-tuned it they have fine-tuned it for the specific task of ner of course there are some publicly available data sets for ner for example in again this is a screenshot from the original 2008 paper okay so if you notice here look at this here they say they're using fine tuning using birth large and bird basic model and this is on co nll 2003 data set this is a very very popular data set on which lot of lot of ner algorithms are trained on okay so this is custom data set so this is a very popular public data set very similar to your imagenet right fairly fairly good data set on that data set all that they have done here is they have pre-trained a birth model and they have fine-tuned it for the sequence to sequence using using co nl co co-nll data set and if you look at this on the test data set they are getting an f1 score of close to 93 percent if they're using a large model they're getting slightly higher 92 point so between 92 and 93 percent f1 score which is pretty good which is very very good now all we have to do now now here comes the challenge okay so if we were to solve it on our own okay if we were to train an ner system for our own problem how do we do it two things first train your model using existing data sets if you just go to kaggle okay let me let me show you this okay let me just show you this so that so you it's better to train it in two phases first here are all the publicly available data sets for ner tons of see co nll003 is one version this has data sets in many languages also there is hindi data okay those of you want to train it on hindi for any language that you need there's something on portuguese data right there is something on tamil data there is swedish corpus okay you need the data set otherwise it won't work some of these data sets are also domain specific if you notice uh oh this this is spaces resume named entry recognition so this is very domain specific right similarly i oh there is hindi health data set so this data set will have uh will have tags or entities very specific to health data like bukhar bukhar in hindi means i think fever if i'm not wrong my hindi is not that good right so i think bukhar basically means fever if i'm not wrong in hindi so bukhar will have a entity saying uh condition right similarly there is a there is there is an er data set for lot for disease extraction look at this there is a hackathon actually okay there is some data set available here so there is e-commerce and er data set okay so there are a lot of there is chinese word based uh stuff there is urdu data set right so there is medical ner specifically there is indian names data set for indian people's names like srikant a model that is trained on american and british names will not detect srikant as a name so here there is some data set of indian names right so very often in the real world what you have to do here is you use you train your model using existing publicly available data sets using birth again bert itself you do pre-train but itself you train using pre-trained plus fine-tuning okay so you have to do both of them first train it on existing data sets then if you can obtain domain specific entities for example account type account type is a very domain specific entity for banking data right so some of these things for example the dollar amount okay so suppose if you have dollar 2000 it can be recognized as an amount using standard data sets many standard data sets will have dollar 2000 tagged as amount but savings will not be there as an account type so you will have to create domain specific entities now how do you create the data for this i mean data doesn't come from thin air right if you don't have existing data for that domain you sit down get it manually labeled there is no option you can't run away from the data collection part lot of people don't focus too much on obtaining the right data if you don't have right data it's junk and junk out okay it's basically garbage and garbage out stuff okay once you have domain specific entities you can either use the rule based system again please don't throw away rule-based systems the fancier models may not always work in the real world that's why in each of our course case studies we start with a very basic very basic model first we start with a very basic rule-based system logistic regression nearest table type of model so it is always important to have a very simple basic model like so you could make a simple rule based system what is there write some hundred default statements you get a rule based system right try to build a simple rule based basic system but also try more advanced models so once you have this basic basic system you can also have domain specific entities again retrain it on domain specific entities right again you will need a decent amount of manually labeled data here okay so please don't think rules are useless rules are very very useful in the real world because models may not be able to capture everything especially for tasks like this right so today again if you use pc nltk etc they use crfs some of them use rule-based systems if you want to build a state-of-the-art ner system it is best to follow the steps that i have just mentioned which is train it on pre-existing data sets using bert then domain specific entities either write rules or fine-tune the pre-existing model so any are today again the the reason i'm showing this example is the performance of ner the performance of a birth based system here look at the dev data or the training data the test data it's very somebody was telling about elmo right i'm sorry um i think somebody talked about elmo elmo is also very very good model okay so elmo also has some very high performance so you can use an elbow-like model also okay you can use bert elmo any of these good good cutting edge more recent papers again elmo itself is a paper from 2018 very nice paper okay so you can use any of these systems and you can get a fairly good high quality f1 score but as always in machine learning the challenge is not just models the challenge is also data sets don't ignore that if you want to build let's say a language specific and suppose you want a language plus a domain specific suppose you want to build a chat bot in hindi you need to have hindi training data you need to have hindi domain specific entity flagging you need this manually labeled data without data you can't do anything let's forget that or the best you can do is write some rules okay which which is exactly which is like nothing bad about it but it won't scale to real-world systems as much sometimes it might for some types of entities it will but not everything okay so this is how actually ner systems can be built from scratch i just wanted to run you through that also cool so given these two okay um what's the time 8 15. so i'll try to spend some more time how much how much more do i have i have actions handling uncertainty oh wow okay okay let's see let's see i'll try to cover as much as possible today so the next thing that you have to look for is okay given okay let's go back to our original diagram right uh let's go back to our original diagram where is our original diagram okay so a user gives a question we know what is intent classification we know what is entity extraction first let's learn a little bit about actions what actions can you take then we'll also talk about dialog context tracking or keeping track of the context okay we'll talk about all of this again if time doesn't permit i'm okay to do more than two sessions but i want to do it as thoroughly as possible okay cool uh so let's um let's go here okay so again actions an action depends on the intent and entity at the end of the day you have to define these actions as functions in your code okay for example let's take a simple example so that you understand what i mean by an action okay suppose you have moved two thousand dollars from savings to credit card let's assume okay suppose we have our annotated model okay let's assume our intent classification system it's a very high level granularity intent it says transfer funds let's assume this is it but we also have our entities written okay we also have our entities written from our entities we know that this is amount this is the from account because the word from is there before this okay two this is the two account okay let's assume this is how we define our entities we can define our entities the way we want okay so let's assume we got these two these two entities also okay from account to account are two different entities just like account type let's say you we can define we need to be very careful in defining the entities of interest and of course creating the training data for it now what do we have we have our intent as transfer funds what and we have so what these two things we have now we have our entities of interest we have our eoi entities of interest and we have our intent given these two what action should i perform remember this is a very critical money transfer thing you don't want to mess it up so the action may not be just one action it could be a sequence of actions the first action could be that you give the user a confirmation message saying hey do you want to transfer again rephrase this like you want to say do you want to transfer this amount so you see this com in the confirmation message you can construct this confirmation message as this do you want to do you want to transfer right the amount whatever amount is there in the amount tag right whatever is there in the amount here from right from account right from account to to account okay it's very so so the way you your this action is going to be implemented as a function okay this action will say if your intent is this see if your intent is transfer funds and if you got all these three entities that you need to perform these actions first you want to show a confirmation message and this confirmation message itself can be hard coded except for this part this part this part so it's a standard message you see this in many chat bots do you want to transfer dash from dash to dash this amount will be filled using this amount entity this will be filled from account this will be filled from this you want to get that then the user has to say yes okay if if the answer to this so yes itself will be a yes entity you'll create an entity saying the user said yes to the previous thing okay then you execute the sql query so this action has two parts first a confirmation message then you execute the sql query to execute the sql query what all do you need the user is logged in so you have the user id right you have the amount okay you have the from account and the two account you have everything that is required to run your sql query just like your confirmation message your sql query will also be like this okay it will be an update message let's say okay it will just say update from so and so table okay because you know that it's a transfer funds okay you know from which table to update and you know this user id you know this amount you will fill each of these as variables you'll fill them as variables and you'll create an sql query and execute it it's just a simple update statement right very simple so the way you implement it in code is if your intent is of intent one and if you have these entities take action one followed by action two else if your intent is intend to and you have these entities take these actions this is basically a rule-based system and that that's what that's how that's how most of these systems like rasa everything works internally there is lot of component which is rules based because otherwise things can go crazy okay so your update command is going to be as i was just telling you you fill the amount you fill the from account to account and the user details and you execute the sql query after you do it again you might want to say you might want to give a confirmation message and send probably a confirmation sms also right in the real banking scenario right you want to say there could conf confirm message on what the transaction should be the sql query and you might want to say do you want to revert back so it's it's it's a fairly i mean you might think this is too rule based but if you want high accuracy if you want your system to work well and to serve your customers well rule-based systems are sometimes mandatory please don't look down upon them sometimes they work phenomenally well okay in this case so all you have to do here is it's basically your actions are basically functions or they're basically sequence of uh that sequence of actions and you can implement them using this effects type structure right it's a simple nested default structure basically now it's not that simple in the real world there are there is a lot of uncertainty for example let me just show you this okay let me show you a few examples you have to understand how to handle uncertainty how to handle boundary cases that's very very important in the real world right so let's look at a couple of examples so that you can appreciate this challenge so for example imagine we have moved two thousand dollars to credit card which you might just said this suppose the user query is just this nothing else okay suppose my intent classifier model says this is transfer funds and it gives us probability score it says 99 it belongs to this class or this intent which is transfer funds good this is very trustworthy i'm very happy and we'll also extract entities when we extract entities we only find the amount and the two account we don't have the from account right that's a problem here the from account entity missing then what do we do this happens a lot people don't give like proper statements people just say transfer 2000 rupees to credit card assuming that it understands that we want to do it from our savings account but it could so happen that you could have multiple savings account you could have a current account i mean i mean there are so many possibilities you could have multiple current accounts then what happens in such a case to handle this type of uncertainty wherever you don't have all the entities that you need you typically throw a something called as a choice intent okay the choice intent basically so you you call this so you basically you basically as a follow-up you basically raise this intent and you say hey i see that you have again this choice intent itself is implemented as a function in the code and you can say hey i see that again this information you can fetch from the sql databases right you say hey i see you have savings account one savings account two and a current account from which account do you want me to transfer okay so this is called choice intent very often because it is giving a choice on one of the three things to pick based on that it will fill the entity that is missing here and then execute everything so the problem here is that our entity classification worked perfectly no issues with that our set of entities is partial it's not complete so to complete that we went through this choice intent based approach again this is something that we have to handle in code there is no escaping that fact there is another type of problem there are lot of uncertainties in the real world i wanted to show you some of them suppose if it says what's the interest rate let's assume this question is very simple the user question is what's the interest rate now imagine we have two entities which is enquiry savings interest rate enquiry fd interest rate let's assume there are two there are two entities we have i mean we defined these two entities let's say so then we pass this statement through our intent classification system it gives probability scores right again depending on how you designed the system these probability scores each probability score could be zero to one or the sum of all these probability scores can be one depending on whether you use the one versus type of model or whether you use the uh you or whether you use the what you call it uh whether you used a soft max classifier okay suppose imagine if you used a one versus rest model right then the model could say hey i think 90 chance that this statement is relevant to this class or to this intent there is a 92 percent chance that it is this now you see that these two percentages are very similar now which which so which intent will you define it to be will you define it to be this intent or this intent what do you do then it's a multi-class classification or if you are using a soft max type classifier then what happens look at this if you are using a soft max class like classifier this could have a probability of let's say 30 percent this could have a probability of 31 percent rest everything has let us small percentages then what do you do very close very close very close numbers then what do you do so whenever you have whenever your intent classifier is not very certain of the intent type then again you have to create a choice intent this is again you implement this in code using effects okay if two entities if two entity again for a not entity for uh two intents for every intent you will have some english text also corresponding to it okay so for this you will have savings interest rate for this you will have fd interest rate okay in english text now your choice intent will say do you want to know about savings interest rate or fd interest rate okay again that's a rule based system right based on again here what's happening we are not sure about the entity itself earlier in the previous uncertainty in the previous uncertainty we were sure about this entity but we did not have all the uh sorry we were sure about this intent sorry i'm sorry we were sure about the intent we were not sure about the entity right one one of the entities is missing that's why we use the choice intent okay now while doing all of this okay there is a third case if your system is unsure what is happening okay you have to always have this as a backup and say we will reach out to your phone or we are transferring this to a human most chat bots about depending on how well the system is designed anywhere from one person to 10 percent of questions go to manual intervention most often again remember companies like amazon etc go through hundreds of thousands of phone calls emails queries from customers they still do it most of it manually there is some automation that's happening there there is some machine learning deep learning being used but the questions can be very very complex each case could be very different and amazon being very customer centric they want to answer customer queries they try to do it with lot of manual intervention okay google is the exact opposite google wants bots to answer as much as possible so if you ever raise a ticket on gmail or any of that it's mostly bots that are answering very few humans relatively cool the most important thing here okay let me tell you about dialog context because this is this is something which is very challenging and this is a big topic in itself but i'll try to simplify it for you okay i'll try to talk about the simplest way to implement dialog context now dialog context is all about this simple system for example suppose the user says what's the interest rate the same example that we just saw the bot says hey this intent i'm not able to classify so it says savings or fd or rd this is the follow-up question this is the choice intent basically this is the choice intent that it asked now the user if he says fd now the bot remember the bot has to keep the context of what happened earlier this is all the dialogue between the user and the and the bot it has to understand suppose the user says fd it has to the bot has to keep track of what what is being discussed earlier it has to keep track of what was the most recent intent right it has to remember what all entities that it had what all entities were missing and if it can fill this as one of the entities and then execute an action right so it has to keep track of the dialogue context again this is not easy there are many ways to tackling dialogue context the simplest way is you keep a sequence of intents and actions this is the most simplest way of doing it and your your your engine your bot engine keeps looking at these rules for example we are here right so for example okay look at this so in the sequence of in the sequence of events intents etc again there's also a concept called event but event is basically change of state for example let's let's look at this right so if you have enquiry so with this as soon as this is as soon as this is there the dialog context in the dialog context we start storing let's say this is the first query that the user asked i stored the information that in this chat in this chat session the user is inquiring about interest rate okay i keep that then i say i've given a choice intent because i was not sure what interest rate is asking then the user gave this fd based on this fd i i updated my uh i updated my intent saying this is enquiry about interest rate for fd right right so you have to keep updating both the intents and entities remember so in this typically in a dialogue context you store all the intents all the entities and all the actions that you have performed till now and that tells you about the current state of the system so effectively what you are storing here is the state of the system what are the sequence of events that have happened so this dialogue context you can think of it i mean those of you who know computer science you can think of it as a finite state automata or a finite state machine but if you don't know about it don't worry about it you can think of it as a sequence of states and a sequence of intense entities and actions that we have performed because this will help us create context but while this is a simple system it doesn't always work there are many many comp okay that okay there is one more concept related to context dialogue context suppose imagine there is a chat like this in this part of the chat let's assume the user is asking about home loan interest rates okay let's say then he suddenly changes and he says hey i want to find about let's say let's say personal loan okay personal loan interest rates okay or probably he's looking at uh probably he wanted let's change the whole context itself suppose he wants to pay his credit card bill okay suppose suppose let's assume he wants to pay his credit contract so there is actually a context switch that is happening in this whole context the user's context is revolving around loans home loans and things like that from that after after asking a few questions if he has changed the context right being able to detect this context switch or context change is very important then what you can do here is in your dialog context right so look at this at any time suppose imagine that this is a big sequence of actions intents and entities you want to know how how much back you want to go whenever you have an uncertainty right for example suppose there is there is a dialog context here there is a dialog context here there is a dialog context here there is a dialog context here the dialog context is is getting updated for every chat we are storing the intense entities actions everything we are storing here now if you can recognize that at this point of time at this point of time the users questions have changed from asking about loans to credit card payment right if you can detect that context switch right then it will help you better focus on only those only the most recent context so the two ways that people handle it one thing that some in some implementations of bots what they do is they only look at the last 10 um sometimes they're called as events but they often look at only the 10 uh 10 intents 10 actions and 10 uh 10 entities or 10 sets of entities okay they just say i don't want to look too much back okay i just don't want to look too much back but more complex systems at every point of time they want to build a simple classification model wherein all this text is fed in all this text is fed in okay this is all text right all this text is fed into the model right this model should predict whether a context switch has happened or not okay again this requires a lot of historical data without historical data it is doable but not that trivial okay so simpler system simply say hey i don't care about context switch i'll just use the most recent 10 intents 10 actions and 10 sets of entities okay so again dialog context is a very big sorry dialog context itself is a very big topic and i mean there are i mean i've even seen reinforcement learning being used here i mean this is a state of the art but again reinforcement learning for context of not not for context switching but for dialogue for dialog context to keep the dialog con to track the dialog context i've seen research papers which use reinforcement learning but the problem is reinforcement learning algorithms are very data heavy which means you need tons of you need hundreds of thousands of chats to be able to train a reinforcement learning model here right so often times in most practical systems that people but but hopefully over the next five to ten years reinforcement systems will play much bigger role in ai or especially with respect to chatbots but today they are not yet there today dialog context is typically maintained using these states where you have all the intense entities and actions that that's the simplest way to track it right but there is a possibility that in the next four to five years again there's a lot of research happening in reinforcement learning some of this could play a role there okay now there is one more one more question here so till of we have been we have all the examples that we talked about now are very transactional it's about moving some money from one account to other account and things like that but there are very different types of questions you can create an intent called as q a intent and what is that what is the suppose suppose the question here is what's the minimum balance required in a savings account let's just say the question is what's the minimum balance okay very simple question and you have documents and web pages on on your on your bank on your bank web pages or internal documents where this is actually listed somewhere okay there will be huge terms and conditions right which is an internal bank document right so this data is there somewhere it's not there in a let's assume this is not there in faq it just assume it should be there in faq but let's assume it's not there in fact just for simplicity suppose there is a question like this and it's there in some huge text corpus either on your web pages or docs now how do you answer this query now how do you answer such a query okay i mean what is that again this is not about a transaction remember this is not a transactional thing this is not a transactional thing this is a q a intent okay what this is basically saying here is i want to know what is the minimum balance that i should maintain in my bank account and that information is actually there in my bank documents right so any ideas on how we can tackle it folks can can somebody explain how to actually solve this problem any ideas getting the data is easy you don't even have to web script you can just go to the bank because probably you're working at the bank you can just get that data right they'll give you all the text in their web pages you don't have to scroll it or anything you can just get it how do we how do we do this somebody says we want to do it using birth how do you do it using topic modeling using sql query again this is about again this is this is not a transactional data remember this is not a transactional data this is not about what is the balance in my account it says what is the minimum balance that i should maintain in my bank account and that information is there in some web page or internal doc search keyword for minimum balance very good yes simple regex that's certainly one way and no no denying that fact so if you have all these web pages do a simple regex or do an inverted index search and just look for this word minimum look for this 2 gram right look for minimum balance wherever it is there now how do you answer that question suppose that is there how do you answer that question ok basic idea might be textual similarity okay how do you do search somebody says we want to use search some of you are in the right direction no doubt about it but how do we do it some more details okay somebody says we want to use sql query full search yes that's what your regex matching is right effectively some some more details if somebody can provide more details that will be fun indexing yeah search indexing elastic search all of them are pure they're not machine learning based methods you know your regex or inverted indices or indexing right all of them are pure algorithmic data structure algorithm approaches huh so yes again so very good very good somebody has somebody uh let me see somebody again some of you got this answer right again it's very simple we have done we have done uh we have we have done we have done a solution for this we have done we have done a public live session to answer this exact question we have a four part public live session where we said so sorry sorry this is not that i'm sorry i'm i'm mixing two things up so the simplest solution actually we've done a code walkthrough of this so as part of our course videos and uh course case studies we have a case study where we talk about how do you build a question answering system using birth or transformers again today i mean you might say i'm a big fan of transformers i'm using transformers for everything but that's the reality transformers are like just like lstms were the best models five years back transformers are the best models for text of course you have to know how to find tune it right so you can just google search for it if you're a course registered student we have explained how to build a question answering system using birth and hugging phase we've done a detailed two hour code walkthrough of that okay if you don't know so the inputs to that is all of your text docs and the question you give these two it will answer it does a pretty good job if you don't know just google search for bert squad so there is a very popular data set called squad data set which is stanford question answer data set but squad fine tuning and if you want to use a library use hugging face it's a very good library one of my favorite libraries for transformer based models okay if you just google search for this you'll get very simple code through which you can get it very simple question so if you have a large corpus of data like web pages and docs and you have a question open-ended question who's whose answer lies here regular expressions is one way indexing is another way but you can also do this similarly if you have large corpus of faqs let's assume a bank kite might have let's say 500 faqs more importantly imagine a bank has had humans answering the questions suppose take a large bank like sba hdfc etc they might have hundred thousand plus chat sessions historical chat sessions they might be sitting and these are these are this is manual chat sessions so highly accurate chat sessions right if you if you're sitting on a gold mine of data like historical chats or faq you can do a simple semantic search for q a we have done a four part again i'll share this document with you we have four part live session six hours of public live sessions on our youtube channel you can check how to build this okay so uh these are some of the these are some of the concepts that i wanted to cover again please understand that this is not everything that is there in a chatbot there are a lot more details but the objective here is to give you is to put you in the shoes of somebody who is designing a chat bot from scratch okay please remember that it is just an introductory session so as to help you understand how these systems are built okay if you were to design it and suppose you want to design rasa tomorrow from a rasa competitor from scratch rasa has taken years to come to where they are right but if you want to build it these are some of the concepts and applications of concepts that you learned in nature language processing deep learning etc that's very very important to recognize and what we will do in the next session is so the reason i wanted to do this thoroughly the design part is because once you have the design part see we have we have these multiple pieces once we have the design part writing the code is much more easy because it's all about using a bunch of libraries using a bunch of models of course bunch of libraries models and data that's what it is right the hardest problem as you become more and more senior in your machine learning career or even software engineering career it is design that matters right so understanding how to design a solution is is very very important of course i do agree with you that implementation and building is also important so in the next session we will do again this session some of you might have felt it slightly theoretical because design by design is theoretical okay but i hope i've given you real world examples to connect the theoretical concepts with the design aspects oftentimes you end up writing something called as a design document where you explain all the modules you have all of the systems and subsystems that you have to build okay this is something that people with experience with either software engineering or machine learning should be able to do it next part is code walkthrough all you have to do is all the modules that we discussed we will use publicly available libraries like hugging phase like spacey nltk all of that a few machine learning models and some publicly available data to actually build this session okay so now i'll i'll change the context it's already 8 43 i'll try to spend some time answering questions on youtube chat right so so again the whole youtube chat is right in front of me i'll try to answer as many questions as we can so the next session will mostly conduct it on the next sunday let me just look it up so next sunday is 13 so mostly it will be on the 13th or the 20th okay so again we typically do three live sessions a month typically so this is six so we'll either do it on 13th or 20th one of these two dates anyway we'll publish we'll inform you on our youtube channel on when the exact session is but it'll either be on 13th or 20th okay cool uh okay next uh i have a question when we have to save historical data is it good to use mongodb yes this is a question from rahul chakraborty so yes mongodb is certainly a good a good data store because it is designed for document store certainly a good without doubt one of the good data store systems because it is designed for docu with documents in mind right unlike your standard rdbms which is not designed for that task surely uh okay cool uh are we designing for text and voice based chatbot okay oh good very good uh this is from um tiago ajan sini so his question here is if you want to build uh if you were to build a chat bot for audio how would we design it okay so let me just probably i don't need the screen now i can just move to one second my system has hung up one second okay so i'll just keep the screen as is and i'll try to answer a few questions okay so this is a good question uh tyagarajan so uh the answer for that would be as follows so if we were to design okay let me just turn on the camera because there is no point let just give me a second folks give me a second if i can yeah here it is so uh tyagarajan the answer for that would be as follows let me just keep this chat window open here just give me a second folks so that i can see the uh just give me a second please i'm just keeping the chat window open here so that i can see everyone and respond to questions better no sorry ah okay cool yeah so the chat window is also right here i don't have to look down at my camera sorry at my phone so uh the answer to that is the audio based systems have to also add two more components one is speech to text right because whatever you're speaking you want to convert into text number one there there are ton of other challenges that will kick in accent and all the other things right so that's that's a big problem in machine learning onto itself that's one at the other end when you are giving an answer you have to do text to speech also but text to speech is much easier there are very good systems like wavenet from google etc which which can do text to speech very well it is a speech to text which can go crazy sometimes like i use a google assistant at home i also use alexa they sometimes get it completely wrong okay so even the best and imagine companies like google and amazon have on my gut feel again i don't know the exact numbers they might have millions of minutes of speech text speech speech to text data to train on but it's a fairly hard problem so you just have to figure out you just have to add these two modules which is a speech to text and a text to speech at both ends okay hey uh okay so this is a good question uh uh one second so another question that somebody's i think kiran latent will not be an issue when deployed especially in mobile app okay i don't know what latent is by the way but uh one other question that is often asked i saw it in the comment section also suppose if you want to integrate it with slack how do you do it or if you want to integrate it with whatsapp what do you do it again most of these systems give you an api okay you can post i mean if you look at slack slack has an official api okay if you give your slack credentials you can send messages using an ap web api call to slack and you can retrieve all the messages that you're getting from slack using a web api so we have a live session called using web apps it's a public live session using web apis in python please go through that where we use apis from bing apis from amazon etc so in api you can very easily write code in python it's all you have to know is what is the end point of that api and how to query that api that's all you have to know again that that live session is publicly available you can check it with whatever tool you want to integrate this chat bot you just have to call the api that's all there is okay so omkar has a very good question any resource to learn when to use which ml strategy i don't know if there is a there is no finite answer for that actually to be honest with you because it depends on how well you understand a concept okay see the the question that i asked a while ago right you have few data samples and you have a multi-class classification many people directly jumped and said hey i will use tf idf i use word to work they forgot that most of the models like naive bayes any model that you are using right requires at least a few hundred data points per class to work reasonably well so if you understand the limitations of models or if you understand where models fail and where models work or why a model came into existence in the why is a model designed in the first place the whole siamese network model was designed for a few short learning case so if you have that foundational understanding of when to use when not to use and and why a model came into existence in the first place figuring out where to use what is just applying that concept here that's why in our course videos we focus a lot on that on the on those specific aspects like for example if you have very high dimensional data right how do you use the decision tree right again these are very interesting questions about application of the theory to the real world problem solving you have to know this right okay cool uh if somebody wants to learn more about chatbots my recommendation a very great source is the rasa's blog so you can just google search for it rasa has a blog on medium they write some very good posts or if you have more time like i've done some code dive deeps or for my own learning on how rasa works internally because rasa is open source right you can go and read the source code if you have time and nothing beats that that's the best way to learn in depth again i don't claim to be an expert at chatbots there are phenomenal engineers at companies like rasa alexa etc i have learned most of it by reading their research papers reading their blogs and reading their source code cool how to proceed with answer if a query has multi intent very good uh that's a very good question so by multi-intent i have a question here probably it has multi intent because your intent classifier is getting confused then you have to give a choice intent as i just mentioned a while ago that's one possibility second is in a same query the user is asking two things right then your then that's a tricky thing right then you then the bot can give two answers one based on two actions one action could be what happened in the first case with the first intent what is the right answer with the second case what's the right answer you can also do that instead of giving choice intent you can give both answers let the p let the user figure out what they want right uh so akash has a question which is i want to know does everyone use python for ml task or they go for low level language like c plus plus to make it faster like what tensorflow does actually most of the python code that you write in tensorflow gets i mean the the most the core functionality the computational intensive functionality right your your whole training is actually done in c plus plus code so you don't have to worry too much about it of course i've had my own experiences in my own cases where we took a model that was trained in python or java or whatever language sometimes scala if you're doing it in spark we converted the evaluation part the evaluation function in c not even c plus plus little bit of assembly level code because we wanted to run at an extremely low latency under one millisecond latency so i've had such experiences also where because of latency because we wanted to respond in under a millisecond we implemented some of this code very very optimized in c that happens sometimes yes no doubts about it how to file patent sorry i am not a patent attorney i mean i'm not an expert at well i've filed some patents i've always had lawyers who helped me so if you're interested in it please talk to a patent lawyer okay ah also vinith has a good question how do we detect typos and correct them before classifying very good question so ok so all typing mistakes can be corrected using two methods broadly speaking there is a data structures algorithms method so there is a data structure called try okay which is a very very common type of data structure that is used to find mistakes in your spellings okay again you can just google search for it i'm sure there are tons of spell checkers that are available uh i i can try is the most basic system that is used there are more complex systems also complex data structures also for that there are there's also technology search trees that are sometimes used and things like that the second approach which is a more machine learning approach let me show you there is actually a very nice blog okay let me just show you that uh okay so this is by there's a very nice blog from spell check one second let me just to do this uh spell check from peter norvik yes that's that's the name okay so peter norvig is a very nice so you can just go and check this actually so this is a slightly old article this is written by peter norvik who is a director at google search and here he just uses simple bayes theorem simple bayes theorem and uses a very simple bayesian approach not even naive bayes much more simple than that on how google has built one of the very first systems okay for spell check using historical data so if you have historical data again it's a very nice blog let me post it here so one second so where is it i'll put in the live stream link itself let me just okay so here it is right so it's a very simple probabilistic model which says what is the correct so look at this all you want to do here is see this is a simple probability system that you have to build okay you're given a word w you have to find the probability of the correct candidate word given the user given word okay so it's a very simple probabilistic system very simple system okay so all you have to do is you're given a word w and c is the set of all the candidates that can replace w imagine w has a typo okay c is c is basically all the candidates so c belongs to a set of all candidates okay now all you want to do here is assuming that you have historical data on people giving the wrong the the incorrect word and then the correct word they imagine they have self corrected it in the past so you have the wrong word and the correct word let's assume you have this historically all you have to do here is using this historical data you just compute these probabilities what is the correct word or which word is the highest prop it has the highest probability amongst all the possible candidate words which word has the highest probability given the wrong word so if you have a lot of historical data it is just a very simple basin you can actually implement this as a simple lookup table okay this whole thing will simply say given this wrong word this this correct word was the most likely probable this was the most probable correct word given all the historical data you just compute this conditional probability and you add it here okay so this is this is one of the simplest machine learning models that you can use of course there are more complex machine learning models also so you can tackle this problem either using uh so you can type so i've just provided a link to this so spell checker is a very nice topic in uh so spell checker is a very nice topic in computer science in general you can solve it using data driven approaches or you can solve it using uh so you can you can solve it using simple data structures algorithms also ah good aditya a good question how to deal with irrelevant intents very good question if somebody says some vulgar words etc so anything that you don't know so again lot of bots right people just play around with them so you have to uh you have to be courteous as a bot right so there are two possibilities there one is the user is asking a sensible question but you can't answer it in such a case you send it to a human okay just send it to a human just send it to a an executive who will answer that second if the user is talking some ridiculous nonsensical stuff out of context you can just say the bot can do this this this what you have asked we are not able to understand do you want to speak to a do you want to communicate or do you want to chat with one of our executives simple standard reply okay so that that's how most systems like that are handled okay uh one second uh what if user uses hindi and english very good rahul that's a good question your data set needs to have i mean if you don't have hindi and english data the the the the birth-based models are not magic they're mathematical models at the end so if you don't have english or hindi plus english text it'll mess it up okay that's i mean that's one of the biggest reasons why in non-english speaking countries uh the english-speaking data sets don't work very well right it's a big challenge it's a very big challenge because very often because we in india are a non-english speaking country somebody could type a grammatically completely wrong sentence your ner goes for a toss unless you have trained it on indian user data that that's very important if you have not trained it on indian user language usage if you train an american or british or australian people it will go for a task that happens a lot cool uh one second uh uh one second i want to build a chat bot in tamil so that's what uh this is uh uh this is hema malini nithyanandam so she says you want to build a tamil chat bot you just have to gather data for it you need to get intent class you need to build you need to have large corpus of data intent classification entity recognition at least these two there are other parts also but at least these two pieces you should have it in place okay so one of the reasons why many of these systems are not built in non-major languages is because the data sets are not there i wish major state governments invest resources because there's a huge population in india that doesn't speak english and we still need to serve them using our chat bots and as more and more mobile phones are going into people's hands we should serve them in their languages for example there are chat bots in german because in germany everybody uses german as their primary language there are chatbots in korean there are chatbots in japanese not so many in english sorry in indian languages because today we we don't expect non-english speaking people to use them but that's changing fast and companies and governments have to collect the data for sure okay okay sounds good uh is there any pre-trained techniques to generate single questions in multiple ways so this is kamali swaran his question here is uh you have a question right you give it as an input it generates other similar questions how do we do that see if you have data for it for example if you have a question and if you have other questions being generated right then you can do a generative model and generate this then it's a sequence to sequence model now so that part is easy so imagine if you have training data where you have you have let's say sentence and rephrased sentences okay so this sentence is input this sentence is output this sentence is input another sentence is output same sentence rephrased yeah there are actually some rephrasing data sets in english so probably we can do something like this use this as a sequence to sequence model okay just use this as a sequence to sequence model the source because all these so let's assume you okay suppose you have a source sentence you have let's say 10 similar sentences rephrased group everything you get 11 sentences right so just create 11 choose 2 okay so 11 c2 pairs say this and this so this input this output this input this output this input this output sequence to sequence model if you have data set it's very easy to do it right yeah when i think about it again i'm solving it i'm solving it right right here i've never solved a problem like that but i think it can be solved if you have if you have sets of similar rephrased sentences or questions i think it is doable and i've seen a few small size data sets where question rephrasing or not question sentence rephrasing data sets exists so we can we can do that actually yeah so some yeah this is a good one um so very good question very good so yeah that's a good thing you you got it actually i didn't think of that so we have a case study called quora question pair similarity this is very similar no you take all of quran data all the similar pairs you group them and now give give one question as the input to your to your transformer or birth or gpt model give the other question as the output that's it you have yeah you can train that model actually that's true you can get the data from quora question by similarity i didn't strike me yes you can get the data from there train um train a sequence to sequence learning model pre-trained on english corpus i think it should work fairly well it's a it's a nice interesting case study or it's a fun project i think you can do it in a week to two if you sit on it if you know if you know how hugging fakes works uh if you know bert fairly well i think this can be done fairly quickly it's a nice project fun project cool uh sounds good sounds good yeah sorry uh i have i've i didn't look at the time it's already uh three minutes past nine and thank you folks thank you for joining this session and uh hope you have you have learned a few things about how to think from the design aspect of solving machine learning problems uh we'll announce the next live session it'll either be on 13th of september or the 20th of september where we'll do more code walkthroughs okay see you soon bye
Info
Channel: Applied AI Course
Views: 29,863
Rating: 4.9172058 out of 5
Keywords:
Id: ImzRs8eORsM
Channel Id: undefined
Length: 126min 32sec (7592 seconds)
Published: Sun Sep 06 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.