Intelligent Voice Assistant in Python

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] what are we going to learn in today's video we will learn how to program a simple voice assistant like me by the way subscribe to neural nine what is going on guys welcome back in today's video we're going to build a simple voice assistant in python so let us get right into it alright so let us get started and open up a command line in order to install a couple of libraries first of all we're going to install speech recognition so pip install speech recognition so that we can communicate with the voice assistant um and also the other way around we want to know what the voice assistant is saying so we need to install pip install pytts x3 which is python text-to-speech for python 3 um and then we also need the library that i have coded myself which is neural intense neural intense pip install neural intense this is a basic interface that we can use to uh easily build chat bots and intelligent assistants and so on so this is what we're going to need we're going to install them uh and then we're going to import them so from neural intense i'm going to import the generic assistant we're going to import speech recognition with an underscore we're going to import p y t t s x three s t s which is text to speech and we're going to import sys which is part of the core python stack and this is just what we're going to use to exit the program if needed now what we're going to need here as in all the other videos like chatbot and virtual assistant and so on we're going to need an intense file because we're going to have intense and this chatbot this assistant this voice assistant is going to be able to respond to us based on the intents and tags so this intense.json file is going to look something like that this is the one that i'm going to use for today's video it's not complicated at all so don't don't despair if you see this for the first time it's a very simple structure i'm going to go down here and do the same thing again so it's basically just curly brackets inside of those curly brackets then we have the first thing here which is the intense so intense colon let me just see if i'm not blocking anything here so let me just move this up here um there you go so we have intense and inside of those intents or the intents themselves are just a list of values and these values are just json objects which are again in curly braces and here we have the different keys so we have tech we have attack which is something then we have patterns which is something else which is a list of something and then we have responses responses which are also a list of something and then what's the problem here i think this is because yeah because we already have this here but that's the basic structure we have intense and inside of those intense we have the individual intents so this thing here is one intent this is the intent with the tag greeting and the greeting tag um some patterns that would fall into this category are hey hello what's up how is it going hi good day and so on now it's important that we provide a bunch of different samples here because this is what we use for the machine learning the responses are going to be static so this is what the voice assistant or the chat bot is going to give us as an answer so hello sir or hello what can i do for you it's not just roughly what it's going to say it's exactly what it's going to say so this here needs to be static and this here can be a little bit more flexible so here we just provide a bunch of different imperfect attempts to greet the chatbot and the chatbot is then also going to recognize similar patterns so it doesn't have to be what's up it can also be what is up for example it doesn't have to be the exact same thing and this goes for all of those different things here as you can see some of them don't have any responses and this is because as a voice assistant here we're not interested in messages so for example when i say added to do to my list i'm not interested in a static message like hey i'm going to add this to your to-do list i'm interested in an action and this is what we're going to do here we're going to bind the individual tags to actions so we're going to have a function which is going to be add to do and this tag add to do here if we recognize it by some of those patterns or similar patterns we're going to trigger this function and then it's going to do whatever we want it to do so this is the basic structure of this project here all right so let's do some setup work here we need to set up a recognizer which is basically the speed recognition we need to initialize a speaker and then we need to also create the assistant so first of all we say recognizer is going to be speech recognition dot recognizer and that's it actually and we're going to say speaker equals tts dot init like that um and once we have that we also say speaker dot set property and here you can also set the rate so depending on how fast you want your assistant to to talk so for example if we do something like 300 it's going to be quite fast if we do something like 20 it's going to talk in slow motion i use something like 150 and that's a good speed for me so what we're also going to have here as you saw in the json file we're going to have a to-do list oh by the way let me just remove that here we're going to have a to-do list so we're going to add to the to-do list we're going to show the to-do list and because of that we should have an object which is the actual to-do list and for this we're going to say just to do list it's going to be an empty list or actually it's not going to be empty we're going to start with some values go shopping clean room and uh i don't know record video like that so this is the to-do list now what we need to have here is we want to train a model that is going to recognize the intent and is then going to map them to a function and for this we're going to use the the library neural intensity i've written and we're going to say assistant uh is going to be generic assistant and here we need to specify the path to the intent file so so intense.json is going to be the file and what we can also provide is some mappings but we're not going to do that yet because we don't have functions but i'm going to talk about the basic uh functionality of this in a second here once we have that we're going to say assistant dot train model and actually that's all we need to do so what we're doing here is we're just specifying the intense json file we're training the model and then the model is already trained and once it's trained we can just call assistant.request and if we re request something we pass a message here so for example how are you and if it's recognizing that as an attempt it's going to give us the right response so it's going to look okay how are you can i fit this somewhere in here so for example this could be greeting so i'm going to respond with hello sir or if i have a function mapped to it so if we don't have a response but we have a function mapped to this specific tag it's going to execute that function so but this is what we're going to do later on the basic idea here is that we have some function let's say we have some function here and this function does print hello world for example and if we want that function to be called when a certain uh intent is recognized then we just have to specify a mapping so we can say mappings and the mappings are going to say for example let's say for greeting we want to execute that function if we recognize the the tag greeting so if the mall if the assistant recognizes that my message my input was of the intent greeting then it's going to be executing the function sum function now it's important that we don't call the function we pass the function so once we have that we can then go ahead and say in intent methods is going to be mappings and this is all we need to do by doing that we provide a dictionary that says okay this tag means calling that function and those are the intents and then we just have to request and depending on on if there is a function in the mappings here we're just going to execute that function and if there's no function we're just going to return the responses that we have statically typed here so this is the basic structure now we have talked about all this in previous videos already so what is the new part here the new part is that we're going to use voice we're not going to just chat with the bot we're going to actually talk to it and get voice output as well so let's implement the first function which is create a node so we're going to have this tag create node here it's going to be triggered by something like please create a new node i want to create a new node create a node new node and so on so anything that has the phrase new node in it will probably be recognized so here we're going to say create create note this is going to be the function and it's not going to be a small function actually so let's just get rid of this here and off this here there you go so this is going to be a very basic function even though it's not going to be small and we're going to say global recognizer so that we can work with it and then we're going to say something so we're going to say what do you actually want to add to your note and for this we need to say speaker dot say what do you want to write onto your note question mark and in order to actually say this we need to say speaker run and wait like that and once we have that we're going to just wait for the user input and this has to be done a little bit more complicated we cannot just say input but we need to actually go in a loop because sometimes this loop is going to fail and then we don't want the program to end but we want to try again and so on so what we're going to do is we're going to say done equals false and then we're going to say while not done while we're not done we're going to try to process the input of the user so we're going to say with speech recognition dot microphone as mike we're going to say recognizer.adjust for ambient noise [Music] mic duration equals 0.2 and you can tweak that value if you want um but what do we have here actually why is it oh yeah because we need an accept um so we do recognizer just for ambient noise microphone duration 0.2 and then we say audio is what we get there so recognizer.listen mic so this is the audio part it's going to have this threshold here or it's going to recognize when we're talking based on this value here and once we're talking once it's recognizing us we're going to save what we say into the audio variable here and then of course we don't want to just come on what do we have here yeah then what we want to do is we want to extract the text from that audio so what we want to do is we want to say note equals recognizer dot recognize google of course you can also use ibm bing uh and other companies but we're going to just use google here and we're going to recognize what's what's being set in that audio and then note equals no dot lower so that we don't have any confusion with the case this is going to be the note that we just put in via the microphone however then we're still not done because then what we want to do is we want to um to actually also choose a file name so what we're going to do then is we're going to say speaker dot say choose a file name and of course speaker dot run and weight and then recognizer jaws dot adjust for ambient noise microphone duration 0.2 and then again audio equals recognizer dot listen and we're going to listen to the microphone so let's recenter this again now once we have that we're going to do the same thing file name equals recognizer dot recognize google audio and filename equals filename.lower now of course this with the file name is going to be a little bit tricky because it's always going to have some spaces if you use more than one word uh but that's an optional thing that you can do here now what's important then is if this worked we're going to say with open file name so we're going to use the file name to open a file to write into it as f and what we're going to write into it is going to be the node so f write note and if that is done we're going to set done to true because then we're done with the process um and we're going to also say speaker say i successfully created the note and then we're going to add a file name to it which makes it an f string um and then of course we're going to say again speaker dot run and wait now what happens if all of this goes wrong if sometimes it doesn't recognize what we're saying in this case we're going to get an exception so we're going to say accept and the exception that we're waiting for is going to be the speed recognition dot unknown value error so if that exception occurs what we're going to do is we're just going to re-initialize the recognizer we're going to say recognizer equals speech recognition dot recognizer so we make a new instance and then we say speaker dot say i did not understand you please try again like that speaker dot run and wait and since we are in a while loop we're not going to break out of it we're just going to get the exception and then it's going to start all over again so this is the basic create node function so next let us also implement a simple add to do function just for adding some to-do lists something to the to-do list i'm going to say def add to do and here we're also going to say global recognizer i'm going to speed this up a little bit because it's actually the same so global recognizer then speaker dot say what to do do you want to add like that speaker dot run and wait and then again done equals false because we can always make mistakes while not done try with speech recognition microphone as mic we're going to say recognizer dot adjust for ambient noise mic duration 0.2 audio recognizer listen mic and the item that we want to add to the to-do list is going to be a recognizer dot recognize google the audio item equals item dot lower and then if that is done to-do list dot append the item and once that is done we're going to say okay mission complete done equals true and the speaker is going to say i added item to the to-do list actually most of the time it's pronouncing it toto list so maybe if we do to-do list it's going to say to-do list let's see um and of course we need to say speaker dot run and wait and again accept if we get this unknown value unknown value error so speech recognition unknown value error if that is the case we're just going to again reinstantiate the recognizer by saying speechrecognition.recognizer and we're going to say speaker dot say i did not understand please try again speaker dot run and wait like that so this is the basic add to do function and then we can also make this simple show to do function which is just def show to do's like that and this is just going to iterate over the to-do list so we're going to say we don't need any interaction here we don't need to listen to the user so we need to say speaker say the the items on your to to-do list are the following and then for item in to-do list we're going to have speaker say this particular item actually we don't need an f-string here we can just say item and of course in the end speaker dot run and wait there you go and last but not least the only thing that i want to add here is the exit or actually we can also add the greetings because the problem as of right now with this module is that i don't have implemented that the responses can be put out via voice now it's not a hard thing to do but i haven't done it so what we can do is we can also implement a function for greeting so def well let's just call it hello def hello is going to be speaker dot say hello what can i do for you question mark speaker dot run and wait nothing too complicated and then def uh quit or whatever want to call it exit or buy or quit or whatever we're going to or what is what is it called actually exit but i think exit is already a term in python so i'm going to use quit and here we're just going to say speaker dot say by speaker dot run and wait and then sys dot exit 0 like that so those are all the functions what we now need to do again is we need to create a mapping dictionary so we're going to say mappings is going to be this dictionary here and we're going to map greetings or was it greeting or greetings greeting we're going to map greeting to hello so whenever it recognizes that the tag is greeting it's going to call the function hello whenever it recognizes that it's create note it's going to to call the create note function whenever it recognizes that it's trying to add a to do it's going to add a to-do whenever it's trying to show to do's it's going to show to do's and whenever it is trying to exit we're going to call quit they go so those are the mappings here we now need to say intent methods equals mappings and then what we're going to do down here is we're going to say assistant or actually we already have the training so now we need to go into the endless loop of interacting with the assistant so we're going to say while true we're going to try with speech recognition dot microphone as mike we're going to say recognizer dot adjust for ambient noise mic duration equals 0.2 let me just center this um and then we're going to say audio equals recognizer dot listen to the microphone and then the message that we're saying to the voice assistant is going to be recognizer dot recognize google audio and then message is going to be message dot lower and we're going to say assistant dot request and we're going to request a message because this is the basic function this function is either going to give us a response or it's going to trigger those functions here it's going to call them so this is the basic request and in the case that we get the basic unknown value error we're going to just do nothing we're going to reinstantiate this we're not going to even say something we're just going to wait for another input because if you're interacting with a voice assistant you say something and it ignores you you're just going to try again so this is kind of intuitive all right so let's give it a try we're just going to run it and see if it works by the way one thing that i want to mention in between here is that when we train the model we can save it and load it by just saying assistant.save model and dot load model so we don't need to train it every time we run this assistant so i can show you that in a second hello hello what can i do for you add an item to my to-do list what to do do you want to add test task i added test task to the to-do list show my to-do list the items on your to-do list are the following go shopping clean room record video test task create a new note what do you want to write onto your note this is just a basic test note choose a file name my note i successfully created the note my note i want to exit bye there you go works perfectly fine we can now go to my node you can see this is just a basic test node now of course we can also instead of just opening the file name we can say when we name the node we can just append a txt where was it open file name we can also add fstring here file name and dot txt like that if you want to so other than that it worked perfectly fine as you saw now again what you can do is you can go ahead and say assistant dot save model if you want to and then next time assistant dot load model you don't even need to specify any parameters if you want to use the default name it's going to be done automatically for you but that is how you build a simple voice assistant in python all right so that's it for today's video if you enjoyed it i hope you learned something if so let me know by hitting the like button leaving a comment in the comment section down below and of course don't forget to subscribe to this channel and hit the notification bell to not miss a single future video for free other than that thank you very much for watching see you next video and bye [Music] you

Info

Channel: NeuralNine

Views: 34,687

Rating: 4.9563141 out of 5

Keywords: python, voice assistant, alexa, echo, google, voice, assist, alexa clone, intelligent, virtual assistant, AI assistant, virtual AI assistant, AI voice assistant, tutorial, programming, coding, simple

Id: SXsyLdKkKX0

Channel Id: undefined

Length: 24min 43sec (1483 seconds)

Published: Tue Apr 13 2021