10. Voice Cloning, Eleven Labs API, OpenAI Embeddings, ChatGPT API, Whisper API

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so I've been I wanted to ask I felt a little bit unsafe with my uh money recently with all the events happening so um is my money safe should I be putting it in a money market account I understand that recent events can make you feel uneasy about the safety of your money however there are many safe and easy ways to manage your cash such as online savings accounts t-bills CDs and more it's understandable to worry but it's important to remember that the financial system is built on faith and it's not in the system's interest to let people's assets disappear hey everyone welcome back in today's video we're going to continue the series on open AI by building upon the previous two tutorials so in video 8 of the series we created a financial advisor q a that was able to query a corpus of text to find the answers to your commonly asked questions and this Corpus of text was built upon transcriptions of the podcast portfolio rescue now we're going to take that concept of a financial advisor q a and combine that with my previous video where I had sort of a financial therapist where I was able to talk to it and have it respond to me with its voice and the voice was based upon a text to speech that's built into Mac OS X now I received a variety of comments that the text-to-speech in OS X was a little bit robotic not quite natural and so what I wanted to do is figure out can I add a little bit of personality to this and this idea of a personal assistant that's actually intelligent is kind of taken off right now because you know we all tried Alexa and while it was kind of neat ultimately it seemed kind of dumb right but with all these natural language models and advancements that are happening uh week after week you got to wonder you know are these things actually going to get smart enough and will they be someday convincing enough to where you almost feel a connection to it and if you watch the movie Her about 10 years ago or so you remember the lead character actually sort of fell in love with his uh personal uh voice assistant so basically in every moment I'm evolving just like you really weird is that weird you think I'm weird kind of why well you seem like a person but you're just a voice and a computer I can understand how the limited perspective of an artificial mind would perceive it that way you'll get used to it so having watched the movie Her how would I go about building something like this how would I try to add a little bit more personality to my voice assistant so um what we can do is take different voice samples and create different personalities and have it speak in different ways using the same Corpus of text and to demonstrate this what I've done here is take samples of this exact movie sequence in her right and I took Scarlett Johansson's voice and trained an image on Scarlett Johansson and so right here you can see where I've just chopped it up to only have The Voice samples of Scarlett Johansson so how can I help you do you mind if I look through your hard drive okay let's start with your emails you have several thousand emails regarding La weekly but it looks like you haven't worked there in many years [Music] yeah there are some funny ones I'd say they're about 86 that we should save we can delete the rest okay can we move forward okay so before we address your organizational methods and so let's talk to Samantha now and see what she sounds like all right so I got uh Samantha loaded up right here so we're gonna have another financial advisor session with just a different voice but the same knowledge base so just a different Persona here so let's see how this goes hey Samantha you seen that good movies lately session is coming back soon and Yellowjackets is one of my favorites if you're interested in financial history I recommend Panic of 1907 in the good life which is a study on happiness so what do you think pretty cool right it's not perfect but it's actually pretty impressive if you've never played around with any of this voice cloning technology and so I know we want to get to the coding on how to actually train this thing and talk about how to build this but I want to demonstrate one more personality that not only has a different voice but has a totally different style of speaking and so I made a Persona in the style of Jay-Z and so this one is going to use the same corporates of knowledge and not only use a different voice but also answer with Rhymes right so I want my answer in the form of rhyme and it and it will portray the exact concept as a freestyle rap right all right so I have Jay-Z loaded up right now so let's ask him some similar questions hey Jay what have you been up to lately wealth with style and Flair Jay-Z's in the house cause I'm a financial billionaire that's right man sounds like you had a good weekend so uh I've been meaning to ask you a question so let's say I am a billionaire like you once I get there I'm gonna get there someday so um should I be putting my money in a bunch of different accounts just to make sure it's safe I've been worried worried with all these bank failures that my money's not safe less feeling low but let me tell you friend fdic's got your back up to Dollar 250k they won't let your wealth crack as for brokerages not to conflate the two secs got out back with rules that ensue and so where did I get the samples used to train the Jay-Z voice clone well right here heel blows or the big face rolly I got two of those and the reason I bring this up is because there's been many objections to uh taking these voices and things art things from other places text and what people are talking about and combining them into a new form a lot of people are say uh that's plagiarism you're stealing all our stuff and I think there's some good points there but also think about something like hip-hop music which I'm a big fan and some of you have seen where I have a Machine Studio or a machine MK3 here uh which is a beat making machine and so when you think about the song Otis itself it's actually based on a sample from Otis Redding and so what I can do is take the original song just like Kanye West did and chop that up so here on my screen you see I have my sampler loaded and you can see here I have it chopped up and so the original song is mapped to these pads right [Music] right and so what a hip-hop producer does is take these samples from other sources and remix it in a new way do play instruments barely do you know how to work a soundboard no they have no technical ability and I know nothing about music that you must know something well I know what I like and what I don't like and I'm I'm decisive about what I like and what I don't like so what are you being paid for the confidence that I have in my taste and my ability to express what I feel has proven helpful for artists and so what's happening with all of us now is that we can all essentially get take all these samples from all these different sources and put them together in a new form using our own creativity and come up with a whole different form that might mean something to someone else might make a connection in a totally different way than the original artist or author has intended because I'm young and I'm black in my head is real low do I look like a mind reader sir I don't know thinking maybe we start acapella with um if you're having girl problems I feel bad for you son I got 99 problems but a ain't one give me bomb right into the first time Rat Patrol yeah that's that's money back in the game right yes sir yeah apparently so eat all these records I used to listen to these guys are the creators of it the architects of what we do right now and so in a way all of us are reacting to all the inputs that have come into our brain and outputting it in some different format and we can debate this kind of stuff forever but you know it's here and you can't really stop it so that was a very long demo and a lot of commentary a lot more than usual um usually I'm just talking about coding this stuff but I'm I'm talking through all this stuff and my reaction to it as it unfolds the stuff is moving very quickly and I'm trying to make stuff with it and then share it so I'm talking it through with you um and so yeah that's what I do on this channel and so all of these Concepts build on each other so this has been a nine part Series so far so this project the main concepts are covered in video number five open AI embeddings API video number eight the financial advisor embeddings and question answering and video number nine where you use the chat GPT API and the whisper API along with radio user interfaces so those three and so what would be best I know a lot of people don't do this they just want a quick five minute thing but if you really want to understand at the lowest level you want to watch this embeddings video and questioning answering and so this is like over an hour of free content here and free code and so I'm gonna go over each of these Concepts that I used and the code that was used to write this project and that might inspire you to watch the rest of these and build all of this yourself and so the first thing we wanted was a knowledge base and so I showed you how to do this in video number eight building a financial advisor and so what I did was use open AI whisper here and we walked through how to actually transcribe a YouTube video and so we downloaded all these different YouTube videos we transcribed them dumped all the text to a bunch of different files right and so you can just transcribe a YouTube video to get your Corpus of text and if you walk through this collab notebook that I shared you'll already know how to create this knowledge base and you'll end up with a nice spreadsheet and so once you're able to extract the questions and answers from your data source or you you know you could just manually create your own question answer data set right here you'll just have a nice spreadsheet with column A with the questions in column B with the answers and so what I'm going to do here is show you how to load this data set in so I'm just going to use a smaller data set here I'm going to load this into Google collab so I'll share this notebook with you and you can open it up and you can try this stuff out yourself and so what you can do with Google collab you can open this file system and you can drag in your own data set so I'm going to upload this questions a CSV here temporarily and so once we've uploaded the CSV file we just need to write the python code to calculate word embeddings which is what I discuss in video five of the series that explains this concept in a great detail best video on YouTube on the topic if you ask me so what we do is calculate this vector or numerical representation of our words and phrases and so I uploaded these questions and so now what I want to do is calculate these vectors that that represent these questions and answers that were were asked so to do that we first need to import the python open AI package so I'm going to do pip install open Ai and that's going to install it and add it to my colab environment you can also do this on your local machine but for ease of sharing it online I've written the code in a notebook so that you can execute it and not have to set up your environment yourself for this particular part of the exercise so I installed open Ai and then I also am going to import some common libraries so we're going to need pandas to manipulate data frames we're going to import the open AI library and I also have this get pass that accepts my openai API key and if you've been watching this series you already know about API keys and how to get your open AI API key so I'm going to enter in mine right there and it's entered and so now what I'm going to do is use pandas which we imported SPD I'm going to read a CSV file so I'm going to read this question CSV file and run this and we should have a questions data frame and look at that so I have a question did VCS cause the svb bank run via Twitter a question about uh their wife's husband and wife's Investments the loan bank of San Francisco and I have a general what have you been doing lately and do you have any movie and uh TV recommendations so I have a variety of questions and answers this is simplified one I in the previous financial advisor video I I actually talked about how you could transcribe all of them and I have a list of all the episodes and I'll actually put all the transcripts online so if you want a giant database I have all those transcripts available for both this podcast and animal spirits so now that we have our questions and answers in a pandas data frame we just need to calculate these word vectors here and so what you do here is you import uh the open AI get embedding function so you import it there and then what we do is create a new column called embedding that's going to store these vectors right and then so I say apply this word embedding to the question so my question is do you have any movie and TV recommendations I apply this function get embedding using the text embedding ada002 model and so this is a cheap model it makes it very cheap to calculate word embeddings right and we talked about this in video five already and so for each row we're applying this again embedding function to the question and storing the result in a column called embedding right there just like that so I run that and you should see in a moment that for each one of these questions the answers we now have an embedding so cool we have a numerical representation of these questions great and so now so that I don't have to store this I don't have to run this again because this actually costs a penny or so less than a penny for this many questions I save it with the embeddings to a CSV file and so I can save it right there and then so now look I have a CSV file that's a copy of the questions and answers and I'm betting and then I can use that in my user interface in a moment so I can use those embeddings I have to calculated okay so I saved that now to demonstrate what we're going to do here we're going to be adding this code to our user interface in a moment but just to show conceptually what we're doing here we're inputting this function called cosine similarity and then the word embeddings video I created we talked about cosine similarity and how it allows us to measure the distance between two different points in a vector space right and so this embeddings concept allows us to take words that are spoken in the in the human language and convert it to a numerical representation that lives in this Vector space and what we learned in our semantic search video is that words and phrases that are close together in this Vector space are actually close together in meaning as well so we showed how milk and uh and cappuccino are actually very close together if you actually calculate these numbers and we can use the same concept for question answering so we can find the question so if I if I ask my own question in a in a strange way I can find the question here in this data set that's close to my question even though I'm not saying the question exactly the same way so I'm able to search for the meaning of my question and see which question it's closest to right and so what I'm doing here is um I'm hard coding a question just to test this out so I have a question so let's start with a question uh this question is dude I heard there was a run on Silicon Valley Bank what happened there right so that's a question I'm just putting in here later I'm going to say this with my voice but now I'm just doing it as a string and so I'm going to do is calculate the embedding for this question dude I heard there was a run on Silicon Valley Bank I'm going to get the embedding and so I'm going to use the cosine similarity function to find the question in this data set that's closest to the question about running a run on Silicon Valley Bank and so I'm going to run that I calculate the cosine similarity for all these and store it in a column called similarities and then I do sort values so I sort by the most similar right and so you notice when I sort the values by similarity right and then I look at the top question that it comes up with sure enough it finds the appropriate question so it has these ranked and then you notice the TV and movie recommendation thing is the bottom because that's very far in meaning from the question I answered and so we can apply the same concept so let's pretend my question is um hey what you've been up to if I run it this way you'll see that the top question that it matches is what have you been doing did you do anything this weekend so cool so that seems to work right there and then so all I got to do is in this case I want the top match so I'm going to say get the head so get the first row here and boom just like that I have the top answer and so that's how it's going to work in my voice enabled system this is what's going to be happening behind the scenes and so what we're going to do is take this video here where I created this chat GPT API interface where I can transcribe my voice and have it spoken back to me we're going to take this concept of cosine similarity in a knowledge database and add it to this demo here so this demo was kind of arbitrary I was just talking to chat GPT with my voice with the API with my voice but now I'm going to add a knowledge base and make it very specific to a use case so that I can talk about things in the present such as Silicon Valley Bank Run which happened you know what a couple weeks ago so we're injecting some additional knowledge in here and providing it some context to talk about and how do we know which context to inject well we just get the tokens that are most relevant to our questions right and we know there's a limit on the size limit on the number of tokens that can be sent to open ai's apis and so we need to rank here to find just the small subset of knowledge we need now with the gpd4 API which I now have access to you can actually pass in the entire podcast transcript and not even do this stuff and ask a question about that the problem is um we pass that many tokens like an hour-long podcast I measured it and it costs like quite a bit per question so you might be paying 25 cents to ask a question rather than a fraction of a cent there so that's the difference all right so that's how the text-based version of the question answering is working under the hood now what about the user interface and the voice cloning so let's talk about the user interface first and the chat GPT API which I discussed in video number nine so the source code to that part in the walkthrough is in that video and we're combining these Concepts together and so if you look under repositories or in the links in the last video you should be able to find it but chat TPT API whisper API voice assistant and you can follow part-time Larry on GitHub 4000 700 people are following my code and I was very happy with this particular project because hundreds of people have actually starred and cloned and made their own versions of this assistant that I released the source code to so the demo is here and there's people even reported a few different issues and so if you're having any trouble there's actually one thing that I'm fixing and I'll talk about that in a moment it's described in one of these issues the most common question was something about invalid audio format and so we'll talk about that in a second so we can do is copy this code or clone this code to your local machine your local python environment and so to make this project I just started by cloning this and then I took my other concept of finding uh question answers and word embeddings and combined it with this project so I'm going to open up what what it looks like when I've added these new additions to it and so instead of therapist.pi I now have advisor.pi and I've I'm just going to walk you through this code line by line and I've already done three different code walkthroughs of the different parts of this the only real new part is the voice cloning part and we're going to talk about that and also the images and I'm going to talk about how I generated those uh those images that represented are three different personalities so here's the code cloned locally let's talk about the changes I made to make this into a question answering for a financial advisor right so let me talk about all the changes I had to make so the first file I have is the config.pi and I just set a few constants here so the first thing you need to I'm not going to use my open AI API key you need to insert your own in the config file and then you see here we have an 11 Labs API key and so that's the secret sauce behind how we did the voice cloning here so 11 Labs is actually a startup and it allows you to upload different samples of and if audio files of voices and it's able to make them say whatever you want as you can see here I have I've done on my own voice you see I have a cardi B I have Jay-Z I have Samantha which we talked about right here and so let's just look at what this looks like and so they have an interface right here where you add voice and you name that voice and you drag a bunch of samples so about it says don't provide more than five minutes of samples and I showed you you can I'm not going to show you how to edit audio and stuff that's not what this channel is about but use whatever tools you have to chop up different samples of audio and you can upload them to an interface like this and create this voice and edit it and so forth and then once you have your voice samples uploaded you can actually click use next to it and yeah so this is the output I got from of my program before and I have my Jay-Z Voice already and you can test out what it sounds like with your voice samples like this with my boy Joe took in the field Museum tried to hit art institute but it was a loss chai Town's A vibe so much to Peep and explore strolling the streets stopped by Gibson's for a meal that couldn't be ignored got an insane burger for just 17 bucks that's a steal in NYU be out of luck overall it was a trip for the books exploring the city we had good times and good looks and so you can see how I can make it say whatever I want here so whatever I'm typing in uh this box but we want this to be dynamically generated and so what's cool about 11 Labs is they actually have an API and so if I go to my profile here you can see there's an API key and if you view that that's your API key you put that in the config.pi and I'll show you how to make API requests 211 Labs so you can send it arbitrary text with a computer program and have it return an MP3 file to you and then you play that using our user interface and that's what facilitates the communication and so if you look under resources here you need to figure out how to use the 11 Labs API and I played around with this this documentation is not very good right now but I figured out how to do it by just reading and trying things and so there's a text to speech API and so for each voice what you end up with is a voice ID so I have a voice ID for Samantha Ben and Jay-Z right there and then what I need to do is send it a specially formatted request and so the API request I'm actually making is this uh text to speech voice ID stream and so you give it a voice ID and your API key and you can give it some voice settings so you can make it sound very monotone and exactly like the voice but very monotone way or you can make it vary The Voice a lot so you saw a couple of times you'll you'll notice that my voices were kind of going a little bit wild so there's a little bit of Randomness which is nice but also it can kind of go off the rails a little bit so you can experiment with these different settings But ultimately the main gist of this and I'll show you the code is that you just you need to send a specially formatted post request with the python request Library you send it some settings for the voice and the voice ID e and you send it some text and the string you want it to actually say it's going to return a response and that response is actually a binary response and you can write that to a file on your your hard drive and you can actually output that also to gradio to an audio interface and that is what let me play that response back in the browser to the user or on my mobile phone to the user now note 11 Labs has a great technology and actually clones voices really quickly but it's not a charity of the sexual startup so there's a free plan here but you probably have to pay for the starter plan if you really want to use this a lot and if you really go crazy with it want to do all kinds of wild stuff maybe you bump up to some Creator plan and so forth but you can imagine there's going to be all types of use cases for this for like interview practice of practicing Spanish and other foreign languages there's going to be a lot of use cases for this stuff and so I was willing to pay for this just to try it out and I'm probably going to do some other stuff with so it's worth it for me I'm investing money in all of this stuff and I don't have any affiliation with any of this stuff this is just the one I found that worked and was easy to set up and use and I was really impressed with so I really like this one I was originally trying to use that resemble AI that was mentioned in the the the generated podcast video that I talked about in video one of this series it was resemble Ai and yeah screw those people because I tried to ask to use their thing and they're like talk to our sales person it's going to be three thousand dollars for the Enterprise license and then this one I could just sign up so 11 Labs forget resemble AI um so um so I have a variety of constants in here so these are commented out here but the actual active advisor I want to use so for Jay-Z here Jay-Z has an image that I've added to this images directory an advisor voice ID which comes from um from 11 lamps so I just specify what voice to use and then a custom prompt and so I'm adding for each personality I'm adding a little bit of extra information of the prompt so I'm going to say your answer should be a rap song in the style of JC for Samantha I said friendly female assistant for Ben I said a friendly midwesterner who says Kudos a lot so he says Kudos a lot so um those are my personalities there and then you can see my different uh images in the images directory so there's my Ben you need an old financial advisor some people don't like to work with young financial advisors I actually have an old bin and then my Jay-Z and my Samantha there and then the voice IDs can also be retrieved from the 11 Labs API so in the API documentation what you do is actually get a list of all the voices right and you pass it your API key and so here's what the request looks like I'm using software called insomnia here and so you make a request a get request to this endpoint you pass it a header so I'm in headers you pass the header XA XI API key and whatever your API key is put it there and then you'll get a list of all your different voice IDs right so you see Jay-Z it's right there and so I just got that voice ID right there and then I'm including it in my config file and so the avatars I'm using on the app and also in the thumbnail of this video those were created using a mid Journey which is trained on tons of different images and art and all kinds of stuff and yeah it's really amazing what it produces based on prompting and but it's also very controversial as you can imagine there's a bunch of uh there's a bunch of articles on this how the implications of this is this actually art it's putting some artists out of a job because it can make this stuff so fast and the way this works if you haven't tried it it actually happens in Discord so you type slash imagine here so I do slash imagine and I'm going to say a full photo uh yeah and so let me get another prompt like the one I used for um for Samantha here right and so what I can do not only can you generate whatever you want you can also say generate an image based on another image so the way I made these look like these people this is an image of Scarlett Johansson I had one of Jay-Z and also one from the the Ben's homepage or whatever and then I said generate a full imagine a full photo of a young female financial advisor wearing a Hawaiian shirt sitting at a bar natural lighting up light natural features photo realistic highly detailed cinematic lighting and then I pass it the image I say how similar I want to look to the image you can read about all these parameters this is a finance Channel not on our Channel I am starting an art Channel a part-time AI link below and yeah I'm gonna get to it eventually I promise but I'm gonna not only am I gonna be the number one Financial programming YouTuber I'm gonna be the number one music programming YouTuber believe it or not calling my shot right now um that's gonna happen so there's a prompt there and so you can see how weird I I look having all these images of Scarlett Johansson I generated and all these ones have been in a bar right here so I experimented with all these different uh things where it's a dude in a Hawaiian shirt so um pretty cool you can do there and so just to show you how this works I'm gonna do another one so I'm gonna do a slash imagine and then ask me for the prompt I'm gonna say full photo young female financial advisor Cinema bar natural lighting and it's based on this image and so you see it says waiting to start and so what this does is actually I guess call their Discord bot mid Journey bot and then it accepts my input and then it's going to eventually kick off and then it's going to return me an image as the output and yeah now you see why I was so bullish in December also part of the series I talked about accumulating Nvidia stock boom I made a lot of money on that so uh best performing stock in the S P 500 and I bought the bottom of it in October not to brag or anything thing so yeah cool look at this so now I have all these images here so it was able to generate that you see how quick that happened I generated all these images right there I can decide if I want to up sample one of them so I let's say I like number three there I can say make me a larger version of that right and Bam look at that I got a bigger version of that I can import that and so for uh for Ben's version I actually wanted a wide version of this and so I actually have a a wide version of this where it actually fills in the rest of the bar and so do that I actually use Dolly 2 which is by open Ai and I can actually take this if I wanted to fill in the rest of the room I could pass it this photo and then make it like a larger room that she's in and draw stuff around her and it can kind of autocomplete not only in words we can auto complete with art now so we could draw like the rest of the bar scene behind her which is which is crazy when you see that happen so anyway if you want to learn more about this you know there's some example prompts you can try this out yourself and there's tons of people discussing this topic so there's people that all they do is generate these images all day and so really cool stuff and so once I had those images I just copied them to a directory called images and then I linked them up in the config file here so I just said advisor image it's in the images folder and I say what which one to use so that's my config.pi but the meat of this application is actually in advisor.pi so let's take a look at that now to run this application locally on your desktop you need to have a python environment set up I'm this is going to be over an hour long video so I'm not going to talk about that again I've made over 200 videos on Python programming on this channel and I can't describe setting up python over and over again each time and so my videos are very cumulative like they build up over time and I get more and more advanced and so I'm going to trust that you know how to do that already and so I have an environment set up with python and what you do in Python is you install the necessary packages you need so I used all of these packages now uh pandas is the one we use for data frames uh open AI is the open AI package radio is the user interface package and when I use these packages like various Imports I did it also asked me to install a few things like SCI Pi scikit learn and plotly here so I added these to the requirements text and so the way you install these requirements is you do pip install requirements.txt and I'll add all the different python libraries that you need now the vast majority of this application I talked about in my last video number nine of this series and that was the radio user interface and so if we look at the bottom here you can see where I just create my radio widgets so a gradio is a user interface Library so gradio.app and it helps you build user interfaces for machine learning apps and the reason I use this you know I could have made a web application from scratch which I know how to do and we do this on this channel all the time but I like the audio recording widget that they had here for are getting input from your microphone and so I added one of those here so what I do here is Define a UI and so this is a bit I added there's a thing called radio blocks now which allow you to theme your UI and so what I wanted was that dark background I didn't have that in the last video and so this is how you set a variety of of parameters about your um your user interface so you can control the colors and the theme and all of that stuff so I gave it a dark background so with in Python you can Define this block of code and so I give it the name UI here and so what I'm doing here is creating all these radio widgets inside this block so I wanted an image input right here and so I just create a new gradio image called advisor I pass it my config advisor image which I defined here right and so I say create this widget give it a value of the advisor image and I gave it a width and height which is from my config files as well so if I wanted to be be a small version I can do that or if I want a very large version on a desktop I can do like a larger image so the reason I had it set to smaller is because I was showing it on a mobile phone but if I wanted to be large I can actually do that as well so the way I run this application I do gradio advisor.pi so once you have gradio installed you should have this radio command and I run it like that and you see how it tells me this application is available on localhost right here at Port 7860 so I take this URL and then paste in my browser and I can see what my interface looks like and so let's look briefly at my inputs so I have an image input I also have an audio input that I'm defining here I give it a source of microphone if I wanted a file upload as a way where you can upload files as well so my input is recording from the microphone and I can record it just like that and I can record it just like that and so it's able to take input from my microphone and then what we can do here is Define a button so I have this run button here right and I say when I click that button I want you to call a function and I'm calling this function transcribe and I'm passing it my audio input and so I'm going to take my input and then I'm going to write a function called transcribe that processes that audio input and after I process that audio input I'm going to return a couple of of values to the outputs and so I have two outputs so I want to input audio from my microphone from my voice and then I'm going to process it I'm going to transcribe it I'm going to send it to a language model and I'm going to call Voice synthesis API and all that stuff and at the end of the day what I want to return is number one a transcript of the conversation and number two an MP3 file containing the response in the voice of my advisor and that's going to Output into this audio widget so let's let's look at this transcribed function that's the most important function here and then also one one last before we go to the transcribed notice I have this debug equals true and what that does is it reloads the user interface anytime I make a change in my code here so if I did run number two here right if I change that you're going to notice this actually reloads right and so if I reload that automatically I didn't have to stop it and restart it now you notice it says Run 2 now and I can change it back to Road one it'll restart automatically and then there you go it's called run again so that's what that does the second parameter and this is also something new that I didn't use in the first video I said share equals true and so this is running on localhost on this machine but what I want is a publicly accessible URL and so this generates a URL for my application just like that and you can send this to yourself email it to yourself or whatever and the reason I wanted this public URL is so I can take my phone and I actually can load this as a mobile interface on my phone and that's actually what I was talking to in the demonstration so I wanted kind of a mobile app so I didn't want to coding an iPhone app takes forever but I wanted a mobile interface for this and actually works all those widgets work on this web interface here that's on the phone and I'm able to play those back on my phone as well and so that's the share parameter so let's talk about transcribe so this is the most important function of this entire thing and it does a lot of work I should have divided this up into multiple functions but I didn't so you're gonna have to live with that so the transcribe function as we discussed accepts an audio and put so it's getting audio from my microphone so I'm telling the input and the function to call when I click the button I click the button my audio input comes in here and this is actually an audio file right and so one of the Irish people ran into on the last video is that it didn't have an extension so this audio file doesn't have a file extension and so the fix for that was to rename it and just put dot wave at the end of it because it's a WAV file and then you just rename the file to it a file name with extension and I think it worked for me because the open AI API used to not check the extension and then then it did later and so that caused the error when other people ran it so you just need a file name with extension so I'm renaming that audio file right there okay so the next thing I do here is I read in that audio file that we have there and so the next thing we do is we call the open AI whisper API which I already discussed in the last video and the way you call that you just call openai.audio.transcribe and you pass it the audio file which we have here and what it'll return is this transcript dictionary and that's where we get the textual transcript of what we said into our microphone now one person said that they got an error open AI does not have the attribute audio and the reason for that is they've probably installed openai before before this new API was released so it's important to actually upgrade your libraries and so the way to do that if you had openai the package installed before this new API was released it's only you know a few weeks old you want to upgrade the package so you can do pip installed Dash capital u here pip install upgrade open Ai and that'll upgrade you to the latest version so that's probably why they didn't have this audio yet but new feature of open AI package right there so I call this and I get this transcript and I can look at the transcript text right here right and so I have a string representing the transcript text we talked about how to get these word embeddings and and so I include a little script here we could test this out how you get the embedding of a question and this came directly from The Notebook I had there and so if you want to run this a standalone this is how it works so you can actually run this here if you want to see just the logic of the questioning answering part and so what I'm doing is I have this data set in the data directory and we saved our question data set along with the embeddings in here and so you see those vectors those vectors I have the CSV file with the question answers and also the vector representations just in a plain CSV file and so what happens here is we actually load at the very beginning of here we're loading this data frame so we're using pandas to read the CSV file we have our data frame and then we're also loading the numerical representation so this is saved in a string here and so when we load it from the CSV file what I'm doing is calling numpy array and applying that and what I need to do there is store the embedding right here as a number otherwise it has quotations I'm marks around it and it's a string so now we have the question data frame that has all of our data in it and so if so I get the transcript of the question I speak into the microphone and then I call get embedding on the transcript and then I have this question Vector now that represents a vector of the question I asked and then I use that same cosine similarity concept to find the question in my data set here that's closest to what I asked in my microphone and then I sort by similarity so I find the most relevant question and then I get the best answer so the top answer is the first so this is the zeroth index of that sorted data frame and so I find the best answer and then all I do is call the chat GPT API which I discussed in the last video as well so we call openingi.chat completion.create we're using the new chat GPT 3.5 turbo model that came out and we pass it the messages and so I'm passing out a prompt here and so using the following text answer the question and then I pass it the question that I asked and then I give it my config and visor prompt so I any extra stuff I want to send it so since my advisor here is Jay-Z I'm also including the text uh your answer should be a rap song in the style of Jay-Z and then best answer is the context so that's the closest in my data set and so I'm not only am I giving it the question underneath the hood I'm saying answer install Jay-Z and I'm also giving it this Big Blob of text that contains the answer behind the scenes so I'm giving it some extra context and it also has its knowledge of language combined with some extra text I give it and I just send it a list of messages so this is a list here so at the very beginning we talked about this in the last video we start with the system role so the very first message in this conversation is we tell the uh the API what it is so we're telling chat gbt you're a financial advisor respond to all input in 50 words or less speaking the first person do not use the dollar sign write out the dollar amounts with full word dollars uh don't say you're an AI language model so I gave it a lot of very specific instructions at the beginning and so this is a list of just one at the beginning and as I add user questions I'm going to add more messages to this list and so when I speak into my microphone I transcribe it I create this prompt here and I send it as content and I append it to my messages list so I send the role of user and then I call chat completion create with the messages lists of messages in the conversation and then I'm going to get a response and that's the system message the one from the assistant right and so I'm gonna keep track of the conversation have some memory and so my messages I'm going to impend what the bot said back to me and then I'm gonna talk again and so what's nice it allows us to have this actual conversation with memory and that memory is stored in this list and in the future I'll talk about how to store this memory inside of a vector database as well so that we can have these really long conversations all right so we have these messages going back and forth and a conversation um chat TPT is telling us uh what the response is so generating this cool synthesizing this cool answer in text format based on what we asked it and so now that I have this text response from the chat GPT API how do I get the voice I call the 11 Labs API and so we remember the endpoint I talked about text to speech you give it a voice ID which we've defined in advisor voice ID in my config file and we also yeah it ends in stream so that's the URL and then we need to pass it a special payload and a special header and so the payload we're sending I'm calling it data here we need to send it the text we want to send so the text we want to send is the system messages content that came from chat gbt's API here and the response I'm replacing double quotation marks because their voice synthesis tends to say that as inches right if there's quotation marks so I had some weird responses there and so I uh replace double quotes with nothing so just strip out all the double quotes and I'm setting some voice settings for stability and similarity boost that controls how monotone or dynamic the voice is so you can experiment with those voice settings so now that I have a URL and a specially formatted data payload here what I do is use the python request library and I send a post request which we've discussed on this channel many times to the URL formatted here I pass it my API key and a header here and I pass it my data payload as a Json attribute here and I post it and I get a response and then once I get that response what I do is I'm actually getting that response it actually returns all this binary data and so what I'm going to do is output that data to a file and so what I want to do is open a new file and I'm going to call it reply.mp3 and then I'm just going to take response this r dot content and write that response content to a file just like that and from there I'm also keeping a text of my transcript that I'm going to return in the output so I'm going to return two values that correspond to my output that we specified here so at the beginning when we click the button we're passing out audio input and we're saying the outputs are text output which is the conversation transcript and also audio output which is this MP3 file and so since I want two outputs and I'm calling the transcribe function transcribe needs to return to items like so a tuple or a list here and so I'm returning one the chat transcript which is text and the output file name which is the actual file and so I'm returning text output audio output and when I return that boom it loads up into the text and it also loads up into this radio Audio widget and so yeah let's load it up in our browser here try it end to end real quick and I'm going to do it from the computer here and I'm going to say yeah you got any book recommendations for me and if I run that on here you're going to see it takes a little bit and so I did cheat slightly because it actually takes like 10 seconds to get this response but that really ruins a YouTube video whenever you have a long response like that and so you can see the transcript here that I sent it a bunch of context here and that's the answer okay okay and so if I play the audio MP3 file that was generated just now I'm not J Z but I'm here to help you out you asked for book wrecks so here's what I'm talking about the Panic of 1907 is a solid pick a book on financial history that'll make you tick the good life is another great read about a Harvard study on happiness and D but if you want TV there's yellow jackets or succession too and the Scream Six movie maybe it'll Entertain You the markets may be uncertain but there's still joy to be found in books and entertainment and let's not forget the importance of good relationships for they can make all of the difference so there you go some wise words from Jay-Z don't worry about the markets too much there's some great movies great literature out there studies on happiness all kinds of stuff to do out there so just relax a little bit and there you have it I didn't know what he was going to say there but it's a good note to end it on that's a tutorial and that's how you build a uh her like personality financial advisor just like this and if you watch my whole series you'll learn tons of open AI stuff this was a very fun video to make and I'm gonna post all the source code on GitHub for free github.com hacking the market subscribe to part-time Larry the best channel on python for finance take it easy everyone have a good rest of your week later
Info
Channel: Part Time Larry
Views: 14,885
Rating: undefined out of 5
Keywords: openai, python, vector embeddings, q and a, finance, eleven labs api, voice cloning, tutorial, voice sampling, chatgpt, api, whisper api
Id: Lsn_OR9Fr3s
Channel Id: undefined
Length: 47min 26sec (2846 seconds)
Published: Sat Apr 08 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.