JARVIS : A ChatGPT (OpenAI) Powered Raspberry Pi Based Voice Assistant

Video Statistics and Information

Video

Captions Word Cloud

Captions

Jarvis can you hear us yes I can hear you loud and clear how can I assist you today okay so hello everyone this is me Arijit back with a new video now I think from this intro now you know what we are going to build in this video so yes we are going to build a voice based AI assistant using open AI okay so at the end this is how it is going to look like so you can like similarly to chat gbt you can ask charging with any question and it gives you the answer similarly here also you can ask any question to this AI assistant and it is going to give you the answer okay but here the thing is you don't need to write or you don't need to read you can give the commands to voice and it is going to speak dances for you so the whole system we are going to build to it Raspberry Pi but if you want to build a tweet Mac or Windows or Linux you can also do that okay the procedure is completely same okay now in this video at first we are going to create our openai account we are going to get the API key then we are going to see how the code is working finally we are going to do the setup in Raspberry Pi and we will see how it is working so all these we are going to cover in this video and now if you haven't subscribed to the channel yet please subscribe to the channel so that whenever we make interesting videos like this you will get update and now without wasting any time let's get started okay guys so let's start the video now before we get into the coding part so let me tell you the setup I'm using now here I am using Raspberry Pi 4 8GB version so as you can see I'm using Raspberry Pi for 8G version you can also use 4GB version even with Raspberry Pi 3 1GB RAM you can also do the same thing okay next for the mic I am going to use this muano mic okay this is a USB mic you can use any USB mic like nowadays most of the USB mics works with Raspberry Pi or if you don't have a USB mic you can use a webcam also webcam is also going to work so you can use any Logitech or HQ webcam I have a Logitech webcam Logitech C270 I guess and I also have HP uh W3 and there both of those like works with raspberry by quite well okay so you can use those webcams or you can use any USB mic that is going to work perfectly okay and finally I'm using this small speakers okay these are basically USB powered speakers so I have connected them with a Raspberry Pi so the aux cable okay so this is the setup I'm using okay but obviously you can use other speakers or other like I mentioned other webcams for the mic or anything you want okay next now let's talk about the coding part now here uh at first you have to clone this repo uh open a highways uh so open AIP is voice chat board in lasbury by your PC so and first we'll get into Raspberry Pi now here I already have connected my Raspberry Pi and I am going to control my PI from this VNC okay but you can also use create HDMI connection or whatever condition you want no problem in that now uh what we have to do is we have to clone the repo so what I'll do I'll just go to I'll at first I'll just delete this okay let it be here so what I'll do is I'm going to clone it so git clone and I'm going to basically clone this one so I'll show you the whole procedure so not here so get clone and this link so it will take few seconds to clone it once it will blow it you can go inside it so it is like open AI based this and inside there are two files this is open AI bot and there's this readme.md these are the two files we have there okay now before we do something else we have to install the dependencies now dependencies and their installation meters are already mentioned in the GitHub repo so you just need to execute all these commands so you need the port audio you need the python Pi audio okay you need the speech view of mission you need the pi ttx3 you need open AI Unity speak and units flag all these things you need okay and actually there is one small thing that you need to change virtual change here itself so there is instead of python it will be Python 3 now okay so in with python also it may work but yeah python c will be a safer option okay so this is all things you need to install and why you need to install all these things so speech recognition obviously you need to install because here we are using the python speech recognition library for the speech recognition part uh next Pi TTX 3 this port audio this Python 3 Pi audio and uh also this flag and you speak all these things are mainly we need uh at least the flag is peak and mainly the port audio this part will mainly need for the uh speaking review a speaking thing okay so because here we are using pi TT XT library in Python for the founder text to speech conversion so for that we need all these things and Pi audio we also need for the species organization part okay and also open AI obviously need openai because the main uh so what happening here is basically when we are speaking first we are doing the speech recognition from there we are getting the keywords or whatever we are saying so let's say who is Tony Stark so speech recognition part is going to uh from the speech it's going to say okay that person has said who is Tony Stark in text now that text we are going to send to the open AI okay using Qi API and from there we go I'm going to get a response now that response is a text now that text we are going to again convert into our speech okay so for that part we are going to expi TTX uh Pi ttx3 okay so this is all the libraries we are using and that is the reason why we are using you can use this commands and install those libraries I already have installed okay so once the library is installed there is only one thing remaining that is you need to open AI API key how you can get it is very simple you just have to do uh open AI API you can we'll get to go to this page here you have to login once you log into it so you can use login so obviously if it's first time then you have to create account okay but you don't need to pay anything and also open AI when the for the very first time you will create account you are going to get five dollar like five USD or trial okay so for the whole project you also don't need to be anything and five USD is actually enough for this project okay now here you have to go to API okay and in API you have to go to like personal from there you have to go to view API Keys okay now here very important thing is this billing part you have to go to billing and here you have to see what is your remaining Credit Now in my this account I already have my like I already have used the five USD credit so I don't have any credit left but in your case you may have the credit okay but also there is one thing that I think there is some expiration of the credit so maybe if your account is too old and you are using chat activity for a very long time maybe it is zero in that case what I recommend you you create a new account you just need a new email a new phone number that's it okay so you create a new account with a new email and phone number and then maybe you will again get a five USD trial okay so if that trial you can use for this project and also if you are also if you are creating this thing for uh like you want to use it for a longer time or something different because you can also start the billing okay now uh here after you make sure that here you have something left or you are starting you have started the payment billing then you go to the API keys there you have to create a new secret key here you have to just do you have to give a name so whatever in my case like I can give it something like chat bot you can create a secret key and once the key is created you can copy it done and then you have to copy the key so in my case I have a key already there so I'm not going to use this key and I'll delete this later okay so after you have created this you just copy this key now I'll show you where you have to paste it so just a minute here now let's go into the code open AI bot dot pi now here there is this part where you have something like uh this one open AI dot API key is equal to XYZ here in space of x y z inside the quotation you have to write the whole key okay you have just copied now uh so let's talk about the code first then we'll run it so obviously like I said we are going to use the speech recognition API not API library next we are going to spy TX and openai okay then basically here we are creating a pi TX engine so that we can convert a text to speech okay and then basically here one thing is important that in in the messages you have to create a role for your open AI uh basically the AI chatbot so in my case the role I have assigned a system okay and also the content contains you have to say like what its main role is you can say not role what it should do something like that so I have said that your name is Jarvis and give answers in two lines okay so that's a very simple uh chatting application I have just given your name is this and this is you are going to answer in two lines you can say that you are a business consultant and you should work in this way uh you are a school teacher you should answer in this way in this way you can mention okay but in my case this is how I have mentioned but you can change this thing then basically here all these things are basically for the text to speech parts of voices we are going to get the voices property we are going to get a rate property and volume property so we are getting the properties so that we can modify things okay so in pi TTS uh Pi TT SX 3 we have multiple voices the names we can change the volume also we can change okay so you can play around with the settings okay based on the settings The Voice will get changed I'll show you my settings so and find out okay so that part we'll get into but before that we have this get response uh function where we are giving the user input so let's say I said who is Tony Stark so in this function who is Tony Stark is going to be passed next messages dot happen so in message is going to happen the role is user because I'm the user okay it's the system so here the AI application is a system and I am the user okay now user and content is user input so whatever I have said that is the content names we are going to basically using the openide.chatcomplete.create okay this method we are going to use and model we are using GPT 3.5 turbo and there we are going to pass the messages now remember messages is a list it's not a single uh what I can say it's not that a single uh user input so whenever you start talking with it so the very first thing is you are assigning it a role and you are saying this is your content from next time whatever you are saying you said who is Tony Stark so this is one uh this is basically one particular chat next you are again saying sir or not chat let's say this is a very particular user input next maybe it is giving you some reply that is reply that is uh what you can say is like output from uh the application side then again you are saying something that who is Jarvis and then again it is replying again you are saying that who what is this so all these things these are like I would say like a list of uh inputs and outputs so what you are giving is the input and what this application is giving is output so messages is a list of input and outputs okay so every time you we are going to pass a chat GPT uh basically our user input we have to send it messages so it will note a context okay because whenever because it's not that uh you so whenever I'm asking something to this application the answer will also depend on my previous text so maybe I have said something like uh who is Tony Stark first then it gave me some answer now now I can ask like please tell me little bit more about him now when I'm saying him like this application should know that him here refers to Tony Stark and for that it needs the context okay so that is why here we are going to send the list of messages or list of input outputs okay okay that we are sending and then chat gbt reply we are getting the response and this response we are basically again appending in the messages okay because this is basically the output from the application and then we are basically returning that activity reply okay then finally here while listening this is an infinite so listening if you see is true so it's a finite Loop basically okay until you exit from it here okay by pressing Ctrl C next with Sr dot micro microphone as source so basically we are here using the so whatever you can use uh like I already mentioned any USB mic or whatever mic you are using so that microphone is going to use as a source next basically it is using the it is going to create a recognizer object from their edges for ambient noise this is very important so why what this thing will do is basically when you have a large noise around you it is going to set the adjacent noise according to that so it is a very important thing so if you are enough so I recommend you that whenever you are using this application you should be in a empty room at least with less noise okay but even if you have a high noise it is going to adjust that noise okay and why it needed see what happens is uh when this application is listening to you it has to recognize when you are starting talking and when you are ending like where your sentence is end okay so that's why in it need to get the ambient Noise Okay so for example now it's nothing and then I start talking so it will know this is a start and now here I it ends okay so now it knows here's the end because then again so this sound level from this sound level it knows where's the start at the end so now let's say there is a lot of noise here in that case it's very hard for this application to know where to start and end so that's why it needs to adjust the ambient noise okay but still if there will be a lot of noise it's really hard for the at least for this speechy of Mission Library to get it properly okay um and that's it okay and again here we are setting the dynamic energy threshold and finally here your triclatch block inside that we are using the recognize.listen source and timeout 5.0 so that means that whenever it is listening to you if for a very particular time you are not saying anything then again basically what you will do it will take the voice okay and it will it will try to recognize something from it and if it can then again if you start from the beginning so it is like it is listening to you it is it is going to listen to you for five seconds in five seconds if you if you don't say anything it is going to send it is going to try to recognize from that five second speech and if it doesn't get anything then again it will start from the beginning so again five second and then again stop try to recognize again five second stop right reorganizing in five seconds and this switch is going to do okay now yes and there we are going to get the response of recognizer dot recognize Google okay so audio using the basically this Library you are going to get the response and finally we are printing the response also okay now if Jarvis is in the response dot lower so the overall sentence we are converting to lower case and we are checking if Jarvis is in the response okay now here you can put anything you can put it any name instead of Jarvis you can put a digit sparkless whatever you want and then if that very particular word is in the central response then only it is going to send that response to that function get response from there it is and here also it is we are basically setting the property rate as 120 volume as uh basically volume okay and then this voice as Greek now I have tested few things and I think like this Greek voice I really liked it so that's why so most of the things are almost same uh but still I have used this Greek voice okay and so basically volume or also you can change the volume if you want okay but I haven't changed it but you can make it bigger like you can make it more loud or less loud up to you okay and finally it is the engine dot say response from open AI so the response whatever you are getting the output that we are going to uh speak through the uh using the pi TTX so it is uh so the whatever speaker will connect you are going to hear this audio from the speaker okay that's all it and then engine.runner weight so run and wait what it will do is basically whenever it is saying something so the output is saying it is going to wait okay so it is not going to again start listening to you okay and that that's it and then finally if we don't have Jarvis or whatever the word it is so it is going to say didn't recognize your wish and if it is internet issue or something problematic then if you say didn't recognize anything okay so this is what happening here so only things you need to change API key and if you want to change Jarvis to if you want to use the keyword something else in that case you can do that okay so that is all we are having in this code now let's try to run this code so in my case in some other in another file I already have saved my API key so I'll just go to that file so I'll just go to that file and from there I am going to run this same thing again I'll go to run it so Python and open uh so I think smaller yeah open AI bot dot Pi so once I run it can you hear us yes I can hear you loud and clear how can I assist you today Jarvis can you write a message for my YouTube channel sparklers we are the makers viewers so that they will subscribe to my channel so it got it attention and creators join our YouTube channel and be part of our creative Journey so try now for exciting DIY project tutorials and inspiration at midnight your imagination okay so this is how it is working okay now few things few improvements that can be possible we will discuss and then maybe in some other video I'll do that so very first thing is the voice is not that good it's little bit robotics wise but there are a few more libraries there and also some apis online apis using which actually we can create better voices so there is something called as Festival then there is this Google API using which you can actually get a voice then there is I think uh I think uh or also I'm not sure maybe Azure also has this some kind of API using which you can also convert text to speech those kind of things so maybe in some other video we can talk about that so that's one thing another important thing is this speech recognition part here uh basically when you say so every time it so one problem is that all the time it is basically listening to us and then it is trying to recognize but it shouldn't be the case so even I'm not saying Jervis still it is trying to recognize but it should not do it so to solve this problem what you can do we can use what what detection so whenever I am going to say Jarvis only after then we are going to use that speech recognition basically uh Library so that once I say Jarvis then only it will try to recognize so here what is happening that you like uh what happened just uh before sometime that I said something maybe something random I'm saying and still it is trying to recognized because after recognition only we can check if there is Jarvis in the sentence or not but it should not be the case what should happen is once I say Jarvis then only where whatever I'll say that should be on that part only should get recognized okay not everything so that thing we can actually do with whatever detection maybe in some other video I will actually do that or maybe you guys also can do it and let me know that you updated something like that so that's some improvements you can really do but as a basic project this is quite good okay and you can actually make it of your own or you can do the improvements okay and also you can let me know what improvements you have did also if you have any questions regarding to this project or any part of this video you let me know in the comment section I'll try to solve that problem and now I think that's it all about with this video okay guys thank you for watching this video I hope you have learned something from this video in that case please hit the like button subscribe to the channel so that whenever you are going to make the next content you will get the update and many more contents we are making right now and very soon you are going to get all of those so stay tuned with us and I'll see you in the very next video

Info

Channel: SPARKLERS : We Are The Makers

Views: 9,430

Rating: undefined out of 5

Keywords: JARVIS, VoiceAssistant, RaspberryPi, OpenAI, SmartHome, AIAssistant, HomeAutomation, ChatGPT, TechInnovation, FutureTech, VoiceControl, ArtificialIntelligence, MachineLearning, Innovation, DIY, HomeTech, Automation, Project, OpenSource, SmartLiving, TechEnthusiast, HomeAutomationIdeas, VoiceRecognition, RaspberryPiProjects, SmartHouse

Id: EZPWbXPlxIM

Channel Id: undefined

Length: 21min 16sec (1276 seconds)

Published: Wed Sep 20 2023