7. OpenAI Whisper and GPT-3 - Voice Commands and Live Transcription

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
whisper by 100 shares of Apple and sure enough you see it's parsley action there and right there in my interactive brokers account in the bottom right hand side you see there is a market order for 100 shares of Apple stock and right there you see a pop-up indicating that 100 shares of Apple stock has been ordered I could also say whisper sell 27 shares of Nvidia and sure enough right there you see that 27 shares of Nvidia were just purchased at Market the market is currently closed right now I'm not risking any money on this whatsoever but I think it's a very interesting example so for this demo I was able to take open AI whisper that we use for speech recognition I was able to use the openai API to actually parse the natural language text that was transcribed from open AI whisper I was also able to use prompt Engineering in order to send a prompt to gpt3 to extract the named entities that I mentioned so Nvidia and apple were mentioned in my my text when I was speaking and I also extracted the number of shares in an action on whether to buy and sell and then finally I added the extra bit of figuring out how to record from my microphone and sending that sound over to open AI whisper to kick off the entire process combining that with my knowledge of interactive brokers I was able to actually place a trade with my voice Alright so let's break down how this application works and the different pieces of code that I had to write to put this all together to implement this idea so the first bit I have is the requirements and so these are the different python packages that are required to execute this program and so we have sound device here and wavio and scipy these bits were necessary for the microphone recording so the sound device is able to look at the microphone on your laptop for instance and uh notice all the details about the hardware and how to record a WAV file based on that secondly you'll see here I have openai whisper that we've installed in video number two of this series what that can do is take an audio file and transcribe it into text using speech recognition right here we have IBN sync and Nest async IO as I mentioned we're using async IO and the interactive brokers IBN sync library to execute trades against an interactive brokers account and then finally here we have open AI which is the python package for openai AI this is going to let us call gpt3 and so what we can do with gpt3 is use our prompt engineering techniques to extract name entities and numeric quantities and also if a word is a little bit fuzzy like buy and buy right there's all these words that sound similar like b-u-y b y and b y e all sound similar when I say them and so if whisper for some reason gets that wrong we can still use gpt3 to process this text and map it to a very specific action also as we learned in prompt engineering we can do things like sentiment analysis so we could take an entire sentence classify it as positive negative hawkish or dovish and things like that and so we have a lot of power at our fingertips when we take audio turn it to text and then use language processing to process that text and do cool things and return that response as Json so this application consists of three parts the first is the recorder so the recorder is what's responsible for recording audio from my microphone so you see I've imported those first two libraries a sound device and wavio here and I have a frequency and a duration now I want to give credit to where I got some of this from this guy John zolton contributed this to the openai whisper discussion on GitHub and so there's some open source code here he implemented a slightly different way and so this is where I got the idea for the microphone code so you see he's reading in messages and then he's writing it to a single WAV file here right and he's using multi-processing and just doing it inside of a one python file here so he's doing both the recording and the transcribing in a single file so he had a different purpose here I think he was trying to build a question and answering bot but I what I did was just take the microphone bit and then I kind of made it my own but I only needed a few lines here all I really wanted was a time stamp and I want to just do a recording and then write that recording to a WAV file so the duration in seconds is the length of time you want to record and so if I say I want to record for a 20 minute straight then I'm not going to have the file already for open AI whisper to transcribe and so we're doing is transcribing in small chunks right and so it'll write a five second wave file then another five second wave file and record my voice in order right and so if you look inside the recordings directory uh where I'm recording this you'll see there's just a bunch of five second chunks of wav files here and so this actually adds up to a lot of files that's how I did it right now maybe you can think of a more a clever way to do this and now once I have those recordings I need them to be transcribed and so we have the second part of the program which is the transcriber what that does is look at the recordings directory with all those wav files it loads open AI whisper which we've already discussed how to do I created a list here called transcribed and what this is going to do is keep track of the files that we've already transcribed once a file is already transcribed we're going to add it to this list and if it's in that list we don't need to do it again right and so in this Loop here what I'm doing is checking this recordings directory and I'm going to list the files in that directory sorted by their last modified time and so if there's any new files in that it's going to grab the last item in that list which is the latest recording and the file name and so if there's a new uh a new recording in this recordings directory and it's not in transcribed we haven't transcribed it yet then we know that we need to transcribe this file and so we just use the whisper code here to load the latest recording and a couple lines to convert it to a spectrogram and then decode it and then write it to a file and so this code right here is all in the open AI whisper documentation and so here I have a unique thing called the no speech probability and so what you'll notice sometimes is that a whisper if you don't say anything and it keeps recording and then you send these wav files with maybe a little bit of noise it's going to try its hardest to interpret that as text right it's it's actually trying to transcribe whatever you ascend and you'll see these hallucinations which is one of the slightly risky things about doing something like this and so if you're watching the transcription earlier I think there was a thank you in there so it just said Thank you randomly even though I didn't say thank you and now it's because there was I think there was probably a pause or some extra noise in there and it was like what is that maybe it's a thank you and since I think whisper how it works I think it's actually uh trained on all these YouTube videos and so every once in a while you'll see something weird like uh thank you for watching or something like that which I always say uh at the end of a video I sometimes say thanks and thanks for watching and things like that so you'll see some occasional weird things in there but what you'll see is it has a certain uh I believe it's a certain confidence of the probability that it's actually a real speech versus not real speech and so what I did here is use this no speech probability and there's this confidence level that it's speech or not speech and what I did is use this to filter it out and maybe you could optimize this threshold a little bit I think this worked uh pretty well and so if it is actually you know most likely speech then I write the text out and so I print the text on the screen just so I could show it in the demo and then I write that a text to a transcript file and so I append to this transcript file and so if you open up this transcript here you can see it's just appending lots of text to this transcript file and then after it's uh transcribed that one-way file it's appended to our transcribed list so it doesn't do it again and then the whole process starts over so I keep recording creep generating wav files and then keep transcribing them and appending them to this text file so once I've spoken all these words into the microphone and all those words have been transcribed into this text file as plain text how do we actually interpret that text and have it get translated to some real world action well that's where the broker comes into play this broker is going to take this text he's on the other line of the phone call and saying hey what are you saying let me take action based on this and so if you scroll down in the broker file what I have here is this function called run periodically I'm not going to do a whole lesson on they think it might take a half an hour to an hour so that's why I mentioned all my previous videos I've discussed all these topics before but what this is doing is running a function periodically a periodic function and so every second five seconds one minute we might to want to perform a particular piece of logic in this case I want to perform the function check transcript and since this is attempting to be near real time I want to do this very frequently so every second I want to execute this check transcript code and the main logic of this application is inside of check transcript so I've imported open AI because I'm going to be calling out to gpt3 and I've also imported the libraries for connecting to interactive brokers so while this application is running I actually have Trader workstation open and for those unfamiliar that's the client for interactive brokers and I can actually write computer code to execute trades and have it talk to the interactive brokers client right here so the bulk of the logic in this script is handled in the check transcript function here what I do is keep the elastic currents of a stop word or a command word I'm calling it the command word you'll see I have many consonants that are configured in this config file so if you go and config.pi here you'll see I have my API key you would have to use your own a transcript file I'm specifying to write the transcript to this text file I have a symbol table so right now just for this demo I'm just mapping a few different stocks to the appropriate symbol you could do further natural language processing or a big giant symbol table of all the stocks you want to be able to trade there's a variety of ways you could handle this for this I'm just hard coding a few different stocks to demo I have a command word called whisper so I said whisper by 100 shares of Apple stock you could change this command word to whatever you want and I also have the prompt I'm going to use for gpt3 which I'm about to talk about so what I'm doing here is finding the last occurrence of the command word whisper and so if I say whisper by 100 shares of Apple I'm taking what happened from whisper to what happened afterwards so finding the occurrence and getting the rest of the text so I'll know the command is buy 100 shares of apple right and then I keep track of that last occurrence so I don't parse the same command over and over again once I have that command I can send that over to gpd3 and I can engineer a prompt to parse that text and so how did I determine which prompt to use well I experimented in the open AI playground as we did in the prompt engineering video and so it takes a while to figure out how you want to parse this text how you want to talk to it to get it to return the right thing so what I did is say return the name of the company and the quantity in the text below if the text contains the words buy buy or Buy return buy for the action if the text contains a word sell sell or sale return sell for the action return the result in Json format you can modify this and change it however you want I started with a prompt like of find words that start like by or rhyme with by or things like that there's all kinds of different stuff you can do to try to handle these different edge cases and so since gpt3 is an auto completion engine as we discussed in the prompt engineering video as well you can give it a sample of how you want the output to be and so this is a very powerful technique as I showed you could send it that you want a Json response and what these structures should look like so that the language that's resp the response you get from tpt3 could actually be interpreted by a computer program so I'm saying give me a structured response and so I want an action I just want the company and I want the quantity just like this and when I click submit gpt3 is able to extract the named entity Nvidia it's able to extract the action which is sell but it interprets that as cell just like I told it to and I'm able to extract a numeric quantity I'm not saying I've covered every single possible Edge case here there's probably lots of things you can think of stuff where you'd have to make a more specific prompt but for the purposes of this demo I could send it a variety of things and it was working pretty well and so feel free to experiment with this and make it better as you know once I have a structured Json response python can easily take that response and convert it and store it in variables and do whatever I want with it and so you can see right here I've built my prompt here and so in my config file I just took the instructions I gave it returned the name of the company like that return Json format I send it in my output format just like that so you can adjust that if you want a more complex configuration and I also send in the command that came from my voice that came from the transcript once that was done I just send an API request to gpt3 it returns response I parse that response since it's Json I can load it right into a response dictionary I mapped the name of the company it returned to my dictionary here that had my stock symbols technically you can maybe figure out how to parse stock symbols I just did a big mapping like this so you could dynamically pull all the symbols tradable symbols in the S P 500 or NASDAQ 100 whatever generate this table here that's not a big deal and then so once that's done I just call interactive brokers I've done a tutorial on interactive brokers already so that's an hour-long tutorial if you want to learn about that so I got the symbol from our symbol dictionary that's in the config file I instantiated a new stock object if I wanted to use interactive brokers to trade options I could do an option object could do a Futures object Forex all these things I don't trade every single thing but here I'm just using stock for Simplicity and then I'm just doing a market order right here and so I'm doing a market order since I have an action for sell or Buy in my response dictionary I pass it over and I can also pass it the quantity for this Market order and then I just place the order just like that and like magic I'm able to programmatically interact with my broker hence the name interactive brokers so there you have it I've demonstrated a way where you could use openai to create your own voice commands to do whatever type of task you want and so in this case I've used it to trade stocks with my brokerage account you can feel free to run with this take the code create whatever you can think of I'm sure people have tons of creative ideas out there and I just share all the code for free on github.com hacking the markets knock yourself out take my code do something cool with it and I hope you learned a lot from this so uh thanks for watching see you in the next video thanks
Info
Channel: Part Time Larry
Views: 21,858
Rating: undefined out of 5
Keywords: openai whisper, gpt-3, voice commands, microphone, live transcription, python, interactive brokers api, automated trading, prompt engineering, named entity recognition
Id: hqJ2K3C8unA
Channel Id: undefined
Length: 15min 0sec (900 seconds)
Published: Sat Jan 14 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.