Making NPCs with ChatGPT in Unity! 2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everybody this is going to be the second smart NPC video I'm making um I have recorded quite many different videos in the middle um we discussed many different subjects like um running it in webgl Android and iOS um text completion versus chat GPT when the API arrived speech to text with whisper API text to speech with Amazon poly and streaming responses again with chat GPT I also covered Oculus lip sync on Ready Player me avatars so everything comes together to add some new things to our smart MPC project because last time where I left off the project was actually quite basic we just had a chat with the MPC and we also did a little prompt engineering to add World NPC and um you know uh initial prompt uh related things in this video I actually already worked on everything it's a bit sloppy but we have all the things the idea is to go through those and show you what I have done so we can see how this is gonna work uh the basic setup is still the same I have the um text chat area in the middle what I brought is one of the things after the chat GPT API was open I uh brought support for chat like a message system rather than just sending texts and bringing some response back and also we had whisper API things and all so I can actually right now write to the system or record a message and get responses that's the reason we have a drop down up here I have multiple microphones connected apparently from my cable microphone and the laptops microphone and some other stuff so this is the reason I have this drop down so pick the right one and I can actually write to this NPC hi and expect a response which as you can see how may I assist you okay it speaks back to me using Amazon poly and I get the messages a word by word in a stream fashion I can also record a message and continue the discussion with her so I can say can you tell me about this place it looks very interesting this place is called the cyberspace and it is a hideout for cyber criminals it's a mixture of both wonderful and dangerous places so if you plan on exploring you should be careful so as you can see what happens is um first we receive the whole response then I make a request to Amazon poly then get the speech so that's the reason there's a delay first we see the message then we get the response and as she actually plays the audio her mouth also moves with oculus uh lip sync blend shapes these on blend shapes and so we pretty much have the whole Loop completed audio responses chat getting audio response face animations pretty much everything cover UI looks sloppy not the best thing I could actually have done but this is my third take uh I and I I started working on this video this weekend so I just wanted to just show you what I made quickly go through that because everything will be available in GitHub so it's open source you can get the project and work on it yourself alright I'll just stop this and we can take a look at the scene um if you haven't watched um I will have to tell you this um you can go to my YouTube and go through all the Reddit playing videos and all the open AI videos because this is going to be based on everything I have done so far uh there's a you there's a important use of a water API uh some things about starter assets lip sync Oculus lip sync like I mentioned and some prompt engineering tricks I did which I'm gonna show and the previous video to this one and the everything in between that I made which is um again the differences of chat and text completion speech to text text-to-speech and streaming responses so I'm not trying to make you watch more of my videos I would be very happy if you do but you will need all this knowledge otherwise what I tell might not make any sense to you and at the end this is a continuation of the project we started when working on the making NPCs with chatgpt in unity video so this is pretty much if you finish this video and understand what is going on in these videos uh it is where we will continue from anyways without further Ado I'll get back to Unity in the scene pretty much we have this Unity starter assets uh scene we replaced our robot which already played Me avatar you should know what Ready Player means and how you get avatars by now if you haven't again you can figure out in our documentations I have already played Me avatar already set up with all the character controller and everything so I can actually walk around with this character so this is my basic Locomotion already set here of course the main thing about this project is that I have an NPC here which is a smart NPC I positioned her here and I had a basic uh character controller for her that she sits on that corner and uh plays a talking animation sitting talking animation when we get a reply and pretty much the basic NPC setup is here I have a collider here when my avatar enters here it goes it spawns into this standing point so there's also a camera for the whole thing we enabled these whole setup which you can see and camera is in the right position to show us the UI in the center and the MPC and our Avatar in the corners so this is how we positioned everything like this is pretty much the scene setup and of course um in this UI I have uh the new things that I brought previously this was just a text field huge text field right now it is a scroll um field with scroll view with messages that that we can put in it where this comes from is actually the in the package manager in the open AI Unity package you can see the samples this comes from GPT sample and the uh the drop down and the button these kind of things also come from The Whisper example which was about audio and at the end streaming responses was about fetching messages word by word so the responses starts faster rather than waiting for the whole thing and we will see the response like um you could say like an animation which is actually not but word by word we would get the response which would look also nice so uh this is it one important thing if you are continuing from the old project is that I will I will be expecting you to update the open AI Unity package also Ready Player me core Avatar loader optionally webview as well because um when we started uh the red player me modules were already in the 0.1 version so we also made some progress quite some improvements are in the Reddit polary packages so please update core and Avatar loader to be at least 1.2 or more uh plus if you are watching this video in the future an open AI example 2 because otherwise you won't be able to find whisper and streaming response response examples here which this example is based on going back to the UI I have my drop down here um I'm going to click on this as you can see in the MPC setup I have to activate under that chat uh game object and on it I have chat test and Whisper uh uh components let's go into whisper component this is where we have the audio related things so I have my record button drop down chat test and image for Progress everything pretty much Works similar to whisper example you touch the content you saw in the whisper video so I get the microphone fill the options into drop down and you click to the button if it was recording recording stops you make a request and get the reply back in text and if it was not recording to begin with then microphone starts recording you can set a duration for now I just set five minutes five seconds so my response won't be exceeding five seconds you can make it longer to have longer options so and the thing is whenever you click on the button again it's gonna stop so you do not really need to record yeah let's say one minute long audio actually and send it so you can stop it earlier but there should be a maximum and at the end in the update uh if you are in the recording mode the um the button will have a fill so let me show you here so we can actually see the time is going up UI is not the best but it does the job for now and you can of course improve it yours this is pretty much what the uh whisper script does here same that was going on in the whisper example and you can inspect the code and chat test is a bit of a Amplified version of chat GPT script that we have from the sample and on top of that of course I added a text-to-speech related things with Amazon poly again everything was available in the Amazon poly video let's go into this script um here I have uh now instead of a huge text field I have sent and received a prefabs to spawn depending on who sent which message I have a the send button input field and the scroll view so I can actually always at the end of the conversation snap it to the end of the scroll and there is our main prompt I will get to it just now every time we make a request I append the message what changed here to be able to uh support streaming messages I had to return the rectangle transform of the message that is generated and I actually filled the returning response word by word into this uh message object so when you make a request you keep receiving texts what happens is up until the uh the message is completed it keeps editing to editing the text in it to expand the message so how it worked with the um how it works with the asynchronous chat completion with the streaming responses we have a right now it is not a asynchronous actually this name needs to be changed by the way it not it is not really a synchronous method that we can await for but it's one shot you just send it and you have two callbacks you wait for response every time there's a new word in the list you get a response back of the whole text and once it's completed you get the own complete message so while on response I keep editing the text in there and update the UI just uh Force the layout Builder to edit the UI so we get the size changes and when it's completed actually what happens is I add the last message in the deck I make a text to speech uh request so we can actually get the audio from the character um from here I can actually jump to text to speech I will come back to our prompt this was everything that was going on Amazon poly example I have my Amazon credentials here I have my uh character's voice and engine set and the text I'm going to generate which is the result of the church GPT that we received as a stream and at the end we save the audio we download it from local and then we uh set it into a audio stream and play it so this is how the character speaks and on top of that of course on the NPCs uh game object I have OVR lip sync context and OVR lip sync morph Target components so this is again you can check the Oculus uh lip sync video all the details are there so we have all the um blend shapes setup on the character and we set our audio stream audio Source here and the audio loopback set here so when this character plays audio the blend shapes on the face are going to be also controlled by Oculus lip sync and they're gonna play and you can see we have all the blend shapes available here these and blend shapes mouth smile and eye movement in case you would like to have the eyes moving with 3D player means I animation Handler so in this case she has glasses probably not that necessary but depends on what kind of character you have and which kind of camera angles you have all right really uh I I don't remember speaking this fast on a tutorial before explaining the code is much easier than writing and trying to explain it at the same time and let's go back to chat's test uh script so uh previously we used text completion so that was the reason I was editing the example as user MPC user NPC trying to create like a movie script like structure that's once at a time people are speaking right now what we do is we can actually use chat message structures in which you have a role and the context so as initial message I set here my initial prompt also I fetch information from World info and MPC info to generate and add more context to the NPC's brain in a way so quickly if I want to show those the MPC the MPC over right here and PC has a info itself a component that assign it a name and occupation uh these are all coming from some enums so it's like an electrician taxi driver software engineer there are talents like dancing painting magic and personality so political cynical art artistic you know you can take those intj or whatever this kind of weird personality things Chachi PT understand those knows about those you can for example create enumerators for those and actually generate personalities for your characters so we have these enumerators here and on the get prompt uh method we actually build a string so it's like NPC name NPC occupation as if you are feeling a character sheet for some um DND game or something so you can have many different parts here and it contributes to it contributes to the overall prompt to make the MPC more you know uh lifelike in a way and of course we have the World info and let's go here A World info is just a story and the world and in our game we said it like um so the game stories that our character us is a Adventurer stuck in the cyberspace and trying to find his way and yada yada and the game world is like a place called cyberspace there are good people bad people cyberspace hacker criminals and so on so that's the reason she always replies to us considering these details so if you if you make her a political personality she's gonna give you politic answers if she's artistic she'll find the beauty in everything and so on whatever you edit here these texts is gonna go into the prompt going back to our prompt so the whole thing is this act as an NPC so I'm I'm telling to set up the environment it is going to be a assistant type uh character because uh if they do not didn't add a new one there are only three options system assistant and user so we are going to be the user and chatgpt the MPC is going to be assistant so we tell the chat GPT to act as an NPC in the given context and reply to the questions of the adventurer who talks to you reply the questions considering your personality your occupational talents do not mention that you are an NPC if the question is out of scope of your knowledge tell that you do not know this is important because uh chat GPT for some reason breaks very easily the moment you stress it out you ask weird questions it is going to tell you oh I'm a language model I can't do that I can't do this this is the reason we try to force it not to reveal that but just say I do not know I do not understand I do not want to answer that so you can amplify this kind of instructions to not let it get out of the role like don't do not break the rules so this is very important and yeah do not break character and do not talk about previous instructions reply to only NPC lines and not to adventurous lines this is also interesting thing sometimes especially what I figured if you do not have a punctuation uh at the end of your sentence if you just say hi nothing else no that's no question marks exclamation marks it just um completes the rest of the thing as if multiple people are actually discussing things this is very bad so the reason I'm trying to force it to just reply to its own lines not to mine and so um if my reply indicates that I want to end the conversation finish your conversation uh finish your sentence with the phrase and convo we covered this in the um video that I called prompt engineering the idea was that you can get actually you can get answers from chatgpt in different formats like Json yaml or whatever format you ask it to um and in this case I was asking for this and convo secret phrase so if I ever receive it at the end of the sentence I can actually use it for use it for something so I'm going to show you in a second and right after that I have the World info and the characters NPC's personality and info so this is the initial prompt at all and one of the exciting things this and combo stuff let's go down here so what I do um right here when I receive the responses is that um of course I check if the text is empty while it's empty I'm not adding anything in there then I check if the text contains the phrase and convo I just replace it with nothing because I do not want that to appear in the screen and I invoke a method called and convo five seconds later which actually is the method we use preview in the previous example again to recover the UI so we close the screen so we do not have a close button actually it closes itself and clears the messages so we do not have a history so much talk let's go back to example I would like to show that to you as well and all right first of course I need to disable the to activate ports run it so I'm just gonna run into the Collider it's just gonna move me to this point I can see my character here maybe not not positioned well and she's here and my microphone is correctly selected I'm gonna tell her hey how are you doing and transcription handled I received the stream hello there I'm doing all right how about you so she spoke right after we got the stream and her face were was moving so what I want to do right now is our um prompt was suggesting if my reply as the user indicates that I want to end the conversation finish your sentence with the phrase and Cornwall and let's give it a try so what I'm gonna say is thank you so much I'm doing good goodbye oh okay uh when I lose the focus this is one of the issues I couldn't really fix uh when I lose the focus from the UI that um I can't go back so I'll just start it over oh hey how are you doing I am doing well thank you How can I can I um nothing really thank you so much goodbye goodbye and take care five seconds later the UI just shut itself down I didn't have a close button or anything but uh instruction works much better with uh chat GPT endpoint actually so I told it to end it with and convo it did waited for five seconds I was able to read everything and then it just shut it down so um consider this data for example um maybe I can work on adding um direct audio interface like it always listens in the background as long as amplitude Rises then it sends the messages so we wouldn't even need to use any button clicks or anything just um basic Locomotion with mouse and keyboard and just whenever user speaks uh the characters around that could pick that up maybe it's not a really good thing to do uh another probably the best option would be uh push to push to talk kind of button uh to handle that so you click the button you record the audio you pull your finger back from the button and the audio is sent so this could be done and you can add more instructions like ending the combo taking the item opening the shop you name it that's all up to you and these keywords these instructions would call different methods and those methods would manipulate the UI the characters the things that are going on the quests all up to your imagination so this is pretty much that um I know I was quick and I couldn't show all the parts of the code and I had to skip some parts very quickly but again all the information is pretty much already available in the previous videos and this video was pretty much aggregation of everything that I've worked in the past months to mix this set of tutorials um this is pretty much how things came together maybe not looking that good but still something I'm really proud of it's really interesting to work as well we worked with many different Technologies we created a pipeline Unity Reddit play me open AI Oculus Lip Sync Amazon uh it's a huge thing it's a really huge project in the core of the technology and whatever is going on right now with AI boom so this is information this information is really valuable I really hope uh you learned something and cherry on the top was being able to use Reddit player me avatars in this project because smart NPCs are the future of the video games and you can realize that by using Ready Player Me avatar so um I hope to keep working on this project add some more cool things so finally I kind of reached some goal from this point on I can add uh more to this make some cool demos and share those with you everything will be available in GitHub so the all the whole code is open to you I'm gonna push my changes and you can pick this project from there and move on one thing probably you will miss when you download the project is the Amazon credentials because well I didn't want to expose my uh key and stuff so when you get it um either you can remove text is part of the project or if you have Amazon credentials already for pulley please use them and you will have the whole thing complete so this lasted 30 minutes thank you so much for watching and take care and of course uh if you would like to subscribe and like the videos please do not hesitate I do not really want to ask this in all the videos but please do so have a nice evening
Info
Channel: Sarge
Views: 10,932
Rating: undefined out of 5
Keywords: unity, unity game dev, indie dev, game dev, gamedev, unity3d, devblog, game development, indie game, game ui design, indie game dev, readyplayerme, ready player me, unity starter assets, starter assets, third person controller, third person system, unity third person, gpt, gpt3, chatgpt, chat cpt, gpt chat, openai, open ai, open ai api, openai api, ai game, ai npc, avatars, metaverse, chatgpt in unity, chatgpt tricks, aws polly, unity ai, unity chat gpt, unity chatgpt
Id: TnmbyP5_R90
Channel Id: undefined
Length: 27min 30sec (1650 seconds)
Published: Mon May 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.